Abstract
The factorial survey is an experimental design consisting of varying situations (vignettes) that have to be judged by respondents. For more complex research questions, it quickly becomes impossible for an individual respondent to judge all vignettes. To overcome this problem, random designs are recommended most of the time, whereas quota designs are not discussed at all. First comparisons of random designs with fractional factorial and D-efficient designs are based on fictitious data, first comparisons with fractional factorial and confounded factorial designs are restricted to theoretical considerations. The aim of this contribution is to compare different designs regarding their reliability and their internal validity. The benchmark for the empirical comparison is established by the estimators from a parsimonious full factorial design, each answered by a sample of 132 students (real instead of fictitious data). Multilevel analyses confirm that, if they exist, balanced confounded factorial designs are ideal. A confounded D-efficient design, as proposed for the first time in this article, is also superior to simple random designs.
Introduction
The factorial survey (vignette analysis) is an experimental design in which the researcher combines varying descriptions of persons or situations (vignettes) which will be judged by respondents from a particular point of view. If each respondent is presented with a sufficiently large number of vignettes, then it becomes possible to estimate the weight assigned by each individual to the different vignette characteristics indirectly via respondent-specific regression analysis. An advantage of such decompositional procedures consists in the fact that the respondent has to judge concrete vignette descriptions as a whole without being forced to indicate the influence of each individual vignette characteristic explicitly. Judging concrete vignettes is also much closer to real judgment in daily life than answering comparably general and most of the time rather abstract questions, as is usual for survey research (cf. also Beck and Opp 2001:304). For this reason, the factorial survey allows a respondent’s opinion to be ascertained with higher reliability and higher validity than it is possible with more general single questions (cf. Alexander and Becker 1978:93). However, as the complexity of the research question increases (more vignette factors, i.e., more vignette dimensions, and/or more levels for the vignette factors), so does the number of possible vignette combinations (vignette universe). As a consequence, the number of vignettes to be judged by an individual respondent has to be restricted to an increasingly lower percentage of the whole vignette universe. If only a vignette sample can be judged by each respondent, then the question of an optimal sample becomes important.
Most introductions to vignette analysis are restricted to random designs (cf. Jasso 2006:343; Rossi 1979:179; Rossi and Anderson 1982:40-41), that is, to designs where the vignette sample (one sample for all respondents) or samples are drawn randomly from the vignette population. 1 If and as long as, for substantive reasons, multiple ratings per vignette are not required for a research question (examples are given, for instance, by Jasso 2006:343, 379-80), it is most common to randomly draw a vignette sample (vignette set) of the same set size for each participant. The vignette sets are assigned randomly to the participants. Each individual vignette set as well as the combined sample of vignette sets filled out by different respondents has, within the limits of sampling error, the same features as the fully crossed vignette universe: The vignette universe is orthogonal, that is, all main effects and interaction effects can be estimated uncorrelated, which implies that all effects can be estimated independently of all other effects (orthogonality per se includes all effects except the intercept), and the vignette universe is balanced, that is, “each level occurs equally often within each vignette factor, which means that the intercept is orthogonal to each effect” (cf. Kuhfeld 1997:2; Kuhfeld, Tobias, and Garratt 1994:545-46). Even if a single vignette set might strongly deviate from the central features (orthogonality, balance) of the fully crossed vignette universe, with an increasing number of respondents the combined sample of vignette sets nonetheless asymptotically approximates these features within the limits of a decreasing sampling error. Hence, the combined sample is a representative sample of the fully crossed vignette universe. Quota designs offer a possible alternative to random designs. The basic idea of quota designs is to represent, as far as possible, the central features of the fully crossed vignette universe by constructing only one or comparably few different vignette sets. As with random designs, the vignette samples are assigned randomly to the respondents. By using quota designs instead of random designs it is intended to optimize the efficiency (i.e., optimizing the precision by preserving the unbiasedness) with which the unstandardized regression coefficients of main and included interaction effects can be estimated.
A first systematic comparison between random and quota designs was carried out by Dülmer (2007), who also illustrated the expected differences by analyzing fictitious data. The comparison covered two different kinds of quota designs, namely, fractional factorial designs (cf. for instance, Gunst and Mason 1991) that had already been used by Alexander and Becker (1978) for vignette analysis and D-efficient designs (cf. Kuhfeld et al. 1994) that have been used for years in conjoint analysis. 2 A common property of both quota designs is that all participants have to judge the same vignette sample. Whereas fractional factorial designs are always orthogonal but not necessarily balanced, D-efficient designs try to optimize both of these features simultaneously. By somewhat relaxing the classical requirement of orthogonality, a D-efficient design can be found for every set size (which is not true for fractional factorial designs). A further quota design, first discussed for vignette analysis by Steiner and Atzmüller (2006:132-33), is the confounded factorial design. In this case, a number of different fractional factorial designs are used for data collection. Besides the confounded factorial design, the theoretical comparison by Steiner and Atzmüller (2006, cf. also Atzmüller and Steiner 2010) also includes random designs and fractional factorial designs. Their empirical analyses, which were carried out with a small respondent sample, are confined to the confounded factorial design. Hence, the behavior of the estimates (unstandardized regression coefficients and their t values) under different design conditions could not be compared empirically. A further way to generate a vignette design, up to now not discussed in the literature, consists in constructing a confounded D-efficient design. The aim of this procedure is to improve the efficiency for estimating unstandardized regression coefficients in comparison to random designs and simple D-efficient designs, in cases where no suitable confounded factorial design exists. 3
The aim of this contribution is to outline the basic ideas behind the main variants of random and quota designs and to shed some light on the expected differences regarding their reliability and internal validity. The conclusions drawn from these considerations will be illustrated afterward on the basis of real data. The main focus of this article is on research questions to assess the impact of vignette characteristics on respondents’ answer behavior. 4 The analyses will be carried out using a multilevel program that has become standard for analyzing hierarchical data. The use of a simple research question allowed each respondent to be presented with the completely crossed vignette universe. After the data for the completely crossed vignette universe were collected, a fixed number of reduced subsamples (fractions) of the completely crossed vignette universe were generated for each respondent, either randomly or by using a quota design. The answers for vignettes of the reduced subsamples were copied from the fully crossed vignette universe. The intended statistical comparison of random and quota designs is based on these fractions (vignettes plus corresponding respondent-specific answers) of the completely crossed vignette universe. An advantage of having the complete information from the full factorial design is that its estimators establish an empirical yardstick for evaluating the reliability and the internal validity of the designs that are included in the comparison. This is the first time that such a comparison has been carried out.
Basic Ideas, Main Variants, and Applicability of Random and of Quota Designs
Random Designs
In the literature, basically two proposals for generating simple random designs can be distinguished: In their introduction to the factorial survey, Rossi and Anderson (1982:40-41) recommend using a computer routine that generates the needed random samples from a variable list. The computer routine picks out a value (level) at random from the first vignette variable (characteristic), then a value of the second vignette variable, and so on through the last vignette variable. Each such cycle produces an additional vignette. The cycle is repeated until the number of vignettes for the specified set size has been reached. Thereafter, the whole routine has to be repeated until a vignette sample of the same set size has been produced for each respondent (simple random design with replacement). In a more recent introduction to the factorial survey, Jasso (2006:342-43) recommends drawing each vignette set randomly out of the fully crossed vignette universe, which makes it easy to draw simple random designs without replacement. Drawing without replacement might be preferable, especially for small vignette populations, since it guarantees that no respondent has to judge a vignette more than once.
The basic idea behind using simple random designs is to represent the vignette universe as accurately as possible with different vignette samples of the same set size. Each vignette set is, within the limits of sampling error, a reduced representative random sample of the complete vignette universe. Merging such random vignette samples again produces a random sample of the whole vignette universe. The combined vignette sample, however, has a much smaller sampling error than each individual vignette set (cf. Rossi and Anderson 1982:29-30). The higher the number of vignette sets, the closer the combined vignette sample approximates the central features of the fully crossed vignette universe: All main effects and interaction effects within the limits of a decreasing sampling error become asymptotically more orthogonal, and the levels within each vignette factor become asymptotically more balanced.
To what extent the estimators of a factorial survey can benefit from the smaller sampling error of the combined vignette sample, however, depends on the heterogeneity of the respondents’ answer behavior. Since respondents in general have to answer more than one vignette, the collected data are nested hierarchically: The answer behavior is embedded in the personal context of each individual respondent. For such hierarchically structured data, multilevel regression analysis (cf. Hox 2010; Raudenbush and Bryk 2002; Snijders and Bosker 2012) is the recommended choice. The mathematical equation system for a multilevel model with four vignette variables, no interaction effects, and completely heterogeneous answer behavior is given in the following (cf. also Hox 2010:11-13; Raudenbush and Bryk 2002:35; Snijders and Bosker 2012:74-75):
Vignette level (Level 1):
Respondent level (Level 2):
where
denotes a vignette,
denotes a respondent,
is the answer of the ith respondent about the jth vignette,
are the vignette variables 1 to 4,
denote a respondent’s unstandardized regression coefficient for the intercept (
denote the grand mean (average unstandardized regression coefficient) for the intercept (
denote the residuals (random terms) at the respondent level, that is, respondent-specific deviations from a respective grand mean, and
denotes the residual at the vignette level.
From the equation for the respondent level (level 2), it becomes clear that by including the u-terms in the regression equation each respondent can have his or her own regression equation, that is, an individual β for the intercept as well as for each of the four slopes. These respondent-specific βs deviate by a respondent-specific u-term from their respective grand mean (the reported “grand mean”
At the other extreme, that is, in cases where the answer behavior of the respondents is completely heterogeneous as in equation (1), the grand mean
Quota Designs
The basic idea behind quota designs is to represent the whole vignette universe with only one or relatively few different vignette samples which cover the central features of the vignette universe as accurately as possible. For this purpose, a complete knowledge of the statistical properties of the vignette universe is needed. In general, quota designs can be divided into the classical fractional factorial designs and the less well-known D-efficient designs (cf. Dülmer 2007:386). Both approaches are restricted to using the same vignette set for each participant. Fractional factorial designs are constructed by aliasing (cf. Alexander and Becker 1978:96-97; Gunst and Mason 1991:48; Steiner and Atzmüller 2006:126), that is, confounding main effects with higher order interaction effects within one vignette set. Since a perfectly aliased higher order interaction effect can no longer be separated statistically from the main effect (the correlation between the vignette variable and the interaction term is 1 or −1), one has to assume that the aliased interaction effect is negligible. If this is not the case, then the estimator for the main effect will be systematically biased. One of the main differences between fractional factorial design and D-efficient designs consists in the fact that by somewhat relaxing the classical requirement of orthogonality of vignette factors (including interaction terms that are assumed to have a nonnegligible impact), a D-efficient design can be found for every set size. Relaxing the requirement for orthogonality frequently allows the balance of a design to be improved, an aspect that is sometimes sacrificed by fractional factorial designs in order to preserve orthogonality (mutual uncorrelatedness between vignette variables of different vignette factors including nonnegligible interaction terms). Balance and orthogonality, however, are both important for optimizing the precision with which βs can be estimated (cf. Kuhfeld et al. 1994:545).
Further possibilities arise if different quota designs are combined. A confounded factorial design (cf. Atzmüller and Steiner 2010:132; Kirk 1995:587-664; Steiner and Atzmüller 2006:132-33) is a design where different fractional factorial designs are used for a survey. An advantage of using a confounded factorial design over a simple fractional factorial design is that by using more than one fractional factorial design, confounding interaction effects with set effects becomes possible. A higher order interaction term that has no variation within a vignette set of a fractional factorial design is perfectly aliased with a set effect. A set effect is the specific influence which can be caused by the context of the vignette set as a whole. The common effect of the higher order interaction effect and the set effect is captured by the intercept of a regression model estimated for the fractional factorial design. If the higher order interaction effect, for instance, of dichotomous effect-coded vignette variables is a positive constant (i.e., 1) within one fractional factorial design and a negative constant (i.e., −1) within another fractional factorial design, then including a set variable allows the common effect of the higher order interaction effect and the set effect to be captured across the two fractional factorial designs. Disentangling both effects statistically, however, is impossible because in this case the higher order interaction effect is perfectly confounded with the set effect. Furthermore, if a main effect of a vignette variable is perfectly positively aliased with an interaction effect between different vignette variables within one fractional factorial design, and perfectly negatively aliased with the same interaction effect within another fractional factorial design, then the common impact of the perfectly confounded interaction effect and the set effect can be captured by estimating a two-way interaction effect between the respective vignette variable on the one hand and the set variable on the other hand. By controlling for the set variable and the two-way interaction term between the vignette variable and the set variable, the estimated intercept, as well as the estimated main effect of the vignette variable are protected against potential systematic bias that otherwise might result from a chosen design. Finally, one might also opt for using a confounded D-efficient design. Choosing a simple D-efficient design over a simple random design is done with the intention to optimize the precision (by preserving the unbiasedness) with which respondent-specific βs of the vignette variables (including nonnegligible interaction effects) can be estimated. Combining the vignette sets of a simple D-efficient design, in contrast to a simple random design however, neither improves the balance of the levels of each vignette variable nor does it reduce mutual correlations between vignette variables (cf. also Dülmer 2007:393). This weakness of simple D-efficient designs can be overcome by using a confounded D-efficient design, which consists of different simple D-efficient designs of the same D-efficiency. Existing imbalances as well as correlations among vignette variables (including nonnegligible interaction terms) are leveled off for the confounded D-efficient design across different vignette sets. In this way, the combined confounded D-efficient design covers the central features of the fully crossed vignette universe better than the simple D-efficient design: The D-efficiency of the combined confounded D-efficient design is higher than the D-efficiency of a comparable, simple D-efficient design and it is at least as D-efficient as a combined simple random design. To what extent the estimators of a confounded D-efficient design can be improved by the comparably higher D-efficiency of the combined design, however, depends again on the heterogeneity of the respondents’ answer behavior: The higher the differences between respondent-specific
Fractional factorial designs
Vignette universes can be symmetrical or asymmetrical (cf. Addelman 1962:21): Symmetrical vignette universes involve factors which all occur with the same number of levels (for instance, three factors each having two levels), and asymmetrical vignette universes include factors with different numbers of levels (for instance, two factors with two levels and one factor with three levels). A central property of all vignette universes, independent of whether they are symmetrical or asymmetrical, is that each level of a vignette variable (factor) occurs within the fully crossed vignette universe equally frequently (balanced) and that the variables of different vignette characteristics are mutually uncorrelated (orthogonal). For symmetrical vignette universes, fractional factorial designs can sometimes be found that fulfill both criteria (balanced orthogonal designs; cf. Addelman 1962:23) within the limits of a reasonable set size. In other cases, one might also opt for a design with an unbalanced number of variable levels. A necessary and sufficient condition for orthogonality within a chosen fractional factorial design is that the levels of one vignette variable occur with each level of the other variables with proportional frequency (unbalanced orthogonal designs; cf. Addelman 1962:23). All fractional factorial designs guarantee that at least the main effects of each vignette variable can be estimated mutually uncorrelated for each individual respondent.
Sometimes it is possible to generate fractional factorial designs where at least each main effect is completely aliased (cf. Alexander and Becker 1978:96; Atzmüller and Steiner 2010:131; Gunst and Mason 1991:48; Steiner and Atzmüller 2006:126) with a higher order interaction effect (no aliasing of main effects, for instance, with two-way interaction effects). Among such designs are primarily designs where the number of levels is the same for all vignette variables. The main effects of other fractional factorial designs are frequently only partially aliased with higher order interactions. This means at the same time that they are aliased with several instead of only one higher order interaction. Most frequently affected are designs with a different number of levels per vignette variable (mixed fractional factorial designs, cf. Lawson 2002:228; Ryan 2007:264, 274-75).
Complete aliasing means that, because the affected variables are mutually perfectly correlated, their respective effect can no longer be separated statistically. An estimated main effect therefore also captures the whole influence of the interaction effect with which the main effect is perfectly aliased. A partially aliased main effect, corresponding to the correlation between the vignette variable and a respective interaction term, captures only a part of the influence of the affected interaction effect. This, however, holds for all interaction effects with which a main effect is partially aliased. To retain the interpretability of an estimated main effect, aliasing is only permissible under the condition that the aliased interaction effects are at least negligible relative to the affected main effect (cf. Gunst and Mason 1991:41-42).
In order to choose between different fractional factorial designs of the same set size, knowledge of a design’s resolution might be helpful: For Resolution III designs only main effects can be estimated uncorrelated, for Resolution IV designs main effects are also uncorrelated with first-order interaction effects (i.e., two-way interactions between different vignette variables), whereby some two-way interactions are aliased with each other. A Resolution V design allows estimating all main effects as well as all first-order interaction effects mutually uncorrelated (cf. Kuhfeld 2010:58; McLean and Anderson 1984:40; Ryan 2007:169-70). If for a given set size fractional factorial designs of different resolutions are available, then the higher resolution design should be preferred in cases where first-order interaction effects cannot be excluded on the basis of a priori knowledge. Interaction effects of higher order will be found very rarely in social science (Kirk 1995:627; Louviere 1988:40) and in practice are mostly seen as negligible.
Constructing balanced fractional factorial designs for symmetrical vignette universes can be done on the basis of the modular arithmetic (division with integer remainder whereby the remainder determines the assignment to a vignette set, cf. for instance, Kirk 1995:590-94; McLean and Anderson 1984) applied to equation systems for the vignette variables. How to produce an unbalanced fractional factorial design for asymmetrical vignette universes is described by Addelman (1962, cf. also Backhaus, Erichson, Plinke, and Weiber 2000:575-76). However, fractional factorial designs are usually generated in practice via computer programs like SPSS (“orthogonal design”) or SAS (PROC FACTEX, Macro %MktOrth or %MktEx), or via ready-made construction plans to be found in the literature (for instance, Gunst and Mason 1991, whose construction plans also inform about the interaction effects that can be estimated by using a particular design). 7
Confounded factorial designs
A confounded factorial design (cf. Atzmüller and Steiner 2010:132; Kirk 1995:587-664; Steiner and Atzmüller 2006:132-33) can be constructed by dividing the whole vignette universe into distinct fractional factorial designs of the same set size (cf. also Gunst and Mason 1991:42-44, 49-50). 8 The resulting fractional factorial designs each have to be replicated by the same factor. Thereafter, the sets have to be assigned randomly to the respondents. Whereas aliasing is concerned with effects within a fractional factorial design, confounding interaction effects with set effects is done across different fractional factorial designs (each distinct fractional factorial design is called a set). A higher order interaction effect which cannot be estimated within a fractional factorial design now becomes perfectly confounded across distinct fractional factorial designs with a set effect. Since perfectly confounding means that both effects cannot be statistically separated, only those higher order interaction effects that are assumed to be negligible should be confounded with a set effect. Including the set variable in the multilevel model, however, allows the common effect to be estimated of the confounded higher order interaction term and the set variable with which the higher order interaction effect is perfectly confounded. Furthermore, if, and as long as, the confounded higher order interaction effect not anticipated by the researcher is at least plausible, and the researcher has no plausible theoretical explanation for a set effect, he or she has good reasons to assume that the set effect is indeed negligible compared to the confounded interaction effect. In this case, the common effect of the confounded interaction effect and the set effect as captured by the set variable gives at least a rough hint about the magnitude of the confounded interaction effect. This information is useful for improving the design for further surveys with the factorial survey. If, on the other hand, the confounded interaction effect is not plausible and the researcher finds a good substantive argument for a set effect, the latter interpretation might be chosen as plausible explanation. Finally, if no plausible explanation can be found, the researcher should leave the question open for further research. Another limitation of confounding an interaction effect between different vignette variables with a set effect across distinct fractional factorial designs is that the set variable has no variability within a vignette set. Therefore, even confounding does not allow the affected respondent-specific interaction effects to be estimated (even if the set variable has no, or only a negligible effect). Instead, including such set variables (and interaction terms with set variables) serves the purpose of estimating all theoretically relevant main and interaction effects of interest without any aliasing (cf. Steiner and Atzmüller 2006:126).
For asymmetrical vignette universes especially, where usually no ready-made designs exist (cf. Atzmüller and Steiner 2010:134), it quickly becomes extremely difficult or even impossible to find a suitable confounded fractional factorial design where the set size is not too large to be judged by respondents. As a consequence, partial confounding of interaction effects with set effects is sometimes unavoidable (cf. Atzmüller and Steiner 2010:133-34). 9 As with constructing confounded factorial designs, constructing partially confounded factorial designs always requires that theoretically relevant main and interaction effects will be confounded/partially confounded only with those effects that are assumed to be negligible (cf. Steiner and Atzmüller 2006:125, 132-33).
In order to construct relatively simple confounded fractional designs, Kirk (1995:587-659) is recommended reading. The designs that can be found there, however, are not always suitable for vignette analysis: the reason being that some of them have too few degrees of freedom for estimating all main effects, including the intercept, via respondent-specific regression analysis. For analyzing factorial surveys via multilevel analysis, this is necessary in order to test whether all respondent-specific
D-efficient designs
The use of quota designs that have been discussed up to now is restricted by mathematical rules of divisibility applied to the number of levels per vignette variable. Sometimes no fractional factorial design might exist for a reasonable maximum number of vignettes per respondent. By relaxing the classical requirement of perfect orthogonality, a D-efficient design can be generated for every set size.
The reason for modifying the classical criteria for constructing quota designs is, according to Kuhfeld et al. (1994:545), that orthogonality is only a secondary goal that has to be subordinated to the primary goal of minimizing the standard error of the parameter estimates. Since balanced orthogonal designs allow all effects (intercept, main effects, and nonnegligible interaction effects) to be estimated uncorrelated, they represent the vignette universe most adequately. With such designs as reference, D-efficiency is a standardized measure that takes into account orthogonality and balance. The formula for D-efficiency (cf. also Kuhfeld et al. 1994:547) is given by:
where ND
denotes a design’s set size,
D-efficiency measures the goodness of a selected design relative to a balanced orthogonal design. For designs where only qualitative variables are included, a D-efficiency of 100 at least provides a rough reference for the generated design, even if such a design may be far from being possible for a given research question. If quantitative variables that are not standardized orthogonally contrast coded have to be included, then D-efficiency is no longer restricted to a maximum of 100. A design comparison, however, is possible as long as the same coding is used (cf. Kuhfeld et al. 1994:548-49). Since increasing the set size does not necessarily result in higher or at least the same D-efficiency (i.e., D-efficiency does not follow a monotone rising function), it is recommended to compare designs with different set sizes (cf. Dülmer 2007:394).
Generating D-efficient designs requires computer programs like SAS (ADX “Optimal Design,” PROC OPTEX, or the Macro %MktEx) or the conjoint value analysis (CVA) module of Sawtooth Software. 10 Since nonexhaustive search algorithms are used, a computer program may fail to find the optimal design, even if the search algorithm is carried out several times (cf. Kuhfeld et al. 1994:547; Sawtooth Software 1997–2002:7-10). Repeated searches are therefore recommended, especially for complex designs. D-efficiency will be computed automatically by the programs mentioned above.
Confounded D-efficient designs
Constructing confounded D-efficient designs is mainly motivated by the intention to improve the precision (minimizing the standard error) for estimating unstandardized regression coefficients in comparison to simple random as well as simple D-efficient designs in cases where no suitable confounded factorial design exists or at least cannot be found. Generating confounded D-efficient designs is a two-step procedure: In the first step, it is recommended to generate a number of D-efficient designs and to select one to be used for the survey. By using a D-efficient design as a starting point, it is ensured that, compared to simple random designs, the precision with which respondent-specific βs can be estimated is optimized. The purpose of the second step is to further optimize the D-efficiency, this time across different vignette sets, by adding D-efficient designs of same D-efficiency that are constructed manually on the basis of the selected simple D-efficient design.
Constructing such additional designs can be done by permuting the assignment between the levels of the vignette variables with their respective characteristics. Adding such D-efficient designs of same efficiency across different vignette sets not only increases the balance of the vignette variables but at the same time also reduces the correlations between the vignette variables. If sufficient such different simple D-efficient designs of same D-efficiency are constructed, then maximal possible D-efficiency will be reached across the combined sample: The levels of each vignette variable appear exactly equally frequently (balanced), and the variables of different vignette characteristics are uncorrelated (if interaction terms were included when the D-efficient design was generated, then the same applies to the interaction terms too). By further increasing the number of different D-efficient designs, the confounded D-efficient design will even show the same statistical characteristics as the fully crossed vignette universe (i.e., vignette variables and all interaction terms are balanced and orthogonal). Since balance and orthogonality are only reached across different D-efficient vignette sets, the combined confounded D-efficient design only reaches exactly maximal possible D-efficiency under the condition that each simple D-efficient design of which the confounded D-efficient design consists, is answered by the same number of respondents (ND in equation (2) denotes in this case the total number of vignettes judged by the respondents). How much the estimators of a confounded D-efficient design can benefit from the fact that the combined confounded design possesses a higher D-efficiency than a simple D-efficient design depends again on the heterogeneity of the respondents’ answer behavior: The more homogeneous the answer behavior of different respondents (i.e., the more u-terms can be fixed), the more estimators of a confounded D-efficient design benefit from the fact that the pooled data matrix of the confounded D-efficient design is (at least nearly) balanced and orthogonal.
Applicability
After introducing the basic ideas and main variants of random and quota designs, it is important to discuss, at least in short, the applicability of quota designs in cases where the vignette universe includes logically impossible combinations of vignette characteristics. Logically impossible combinations, for instance, will generally exist when education and occupation are important characteristics of a described fictitious vignette person (cf. Alves and Rossi 1978:545; Jasso and Rossi 1977:642; Nock 1982:104). Since a minimum level of education is required to be qualified for certain occupations, one has to remove vignettes with logically impossible combinations from the fully crossed vignette universe (cf. Jasso 2006:343). Excluding such vignettes from simple random designs in general only increases the correlations between affected variables, but by doing so, fractional factorial designs generally also lose orthogonality between affected variables as well as frequently between affected and unaffected variables (cf. Dülmer 2007:390). Hence, fractional factorial designs and confounded fractional factorial designs would not be applicable any longer. Choosing a D-efficient design might under such conditions be a viable alternative. Although excluding logically impossible combinations will always reduce the D-efficiency of a chosen design, the loss will sometimes be very small for D-efficient designs (cf. Kuhfeld et al. 1994:551). Furthermore, the existence of logically impossible combinations might frequently also prevent a suitable confounded D-efficient design from being found.
Reliability and Internal Validity
Reliability
Designs that allow unstandardized regression coefficients to be estimated with a lower standard error and for that reason with a higher precision produce ceteris paribus more reliable results than other designs. Therefore, the key to understanding the reliability of the results is the formula for estimating the standard error of
where
Besides the reliability of a respondent’s answer behavior (observed error variance), a
The higher the set size, the higher the number of degrees of freedom for estimating the error variance for a given number of vignette variables and the lower the estimated standard error for a respondent-specific
The remaining two components are functions of the two factors in the denominator of equation (3): The higher the variation of the vignette variables and the lower the coefficient of multiple determination among the vignette variables, the lower ceteris paribus the estimated standard error of a respective
If for a given set size no balanced orthogonal design exists, then D-efficiency will be maximized for fractional factorial designs exclusively with respect to orthogonality. Perfect orthogonality reduces the multiple coefficient of determination among the vignette variables to zero, whereby the right-hand factor in the denominator of equation (3) reaches its maximum of 1. Under the same conditions, search algorithms for D-efficient designs try to optimize both factors of the denominator simultaneously, whereby the request for perfect orthogonality is relaxed. Consequently, it becomes clear that for a given set size fractional factorial designs as well as D-efficient designs allow the individual βs to be estimated with higher reliability than an average set of a simple random design.
The comparisons up to now have focused on individual vignette samples and not on the combined vignette sample. Adding the same fractional factorial designs or the same D-efficient designs changes neither the degree of balance nor the correlations among the vignette variables. Hence, each quota set will show the same D-efficiency as the combined sample of all used vignette sets. The situation changes if different unbalanced fractional factorial designs, different D-efficient designs, or different vignette sets of a simple random design are added. If for the unbalanced variables (factors) of an unbalanced fractional factorial design the assignment between a vignette characteristic and its numerical representation (indicator variables for the nominal scaled variables) is permuted in order to generate further designs, then the combined vignette sample can also reach the maximal D-efficiency of 100 (for designs exclusively including qualitative variables). This requires, however, that each vignette set is judged by the same number of respondents. The same procedure can also be applied to D-efficient designs, although the correlations among the vignette variables make the task in this case somewhat more demanding. Furthermore, adding different vignette sets of a simple random design at least asymptotically increases the balance and uncorrelatedness of the combined sample. Therefore, with an increasing respondent number, the D-efficiency of the combined vignette sets asymptotically approaches 100 within the limits of the sampling error. The only design that for each individual vignette set, as well as for the combined sample of all completely judged vignette sets, possesses the maximal D-efficiency of 100 remains, however, the balanced fractional factorial design (the same, of course, also applies to a confounded factorial design consisting of different balanced fractional factorial designs). From this point of view it is an ideal design that produces the most reliable results.
How much the estimators of a factorial survey can benefit from the fact that the combined vignette sample (pooled data matrix) of a random or a confounded design have a higher D-efficiency than a single vignette set depends on the heterogeneity of the respondents’ answer behavior. Heterogeneity means that the respondent-specific estimated βs significantly differ across the respondents. Such unexplained context effects are modeled in multilevel analysis by including random terms that capture respondent-specific deviations from the grand mean of the intercept or the slope of a vignette variable. If no significant context effects exist, then multilevel analysis ends up with conventional OLS-regression. Under this condition, the unstandardized regression coefficients are estimated on the basis of the pooled data matrix where the D-efficiency of the factorial survey exclusively corresponds to the D-efficiency of the combined sample of all vignette sets. In this case, simple random designs with an increasing number of vignette sets within the limits of a decreasing sampling error easily produce more reliable results than unbalanced fractional factorial designs and D-efficient designs. This especially applies to factorial surveys where a relatively low set size is used. If on the other hand the answer behavior of different respondents is very heterogeneous, then the estimated γs (posterior means) are much more strongly based on the D-efficiency of the single vignette sets. Since unexplained context effects caused by unmeasured respondent-level variables are very likely to occur in factorial surveys, the estimators of simple random designs can only partly benefit from the fact that the design is becoming asymptotically more orthogonal and more balanced. Especially for small set sizes with a high D-efficiency and a relatively heterogeneous answer behavior, both fractional factorial designs and D-efficient designs will allow respondent-specific βs to be estimated with higher reliability than simple random designs (cf. Dülmer 2007:395-96, 405). The same also applies to a comparison between confounded factorial designs or confounded D-efficient designs on the one hand and simple random designs on the other.
Internal Validity
High reliability is necessary but not sufficient for high validity (cf. for instance, Neuman 2006:196-97). This at least applies to the reliability of the measuring instrument (i.e., the chosen design), the focus of the present article, and to internal validity. Accordingly, even a highly reliable estimator might suffer from a high systematic bias and for that reason might have low internal validity. Although each individual vignette set of a quota sample reaches a higher D-efficiency and therefore the respondent-specific βs can be estimated with a higher reliability than an average set of a random vignette sample, both fractional factorial designs and D-efficient designs have in general a somewhat higher susceptibility to systematic biases caused by interaction effects between vignette variables that were not expected in theory and so were not included when the design was generated. If for a fractional factorial design a main effect is perfectly aliased with a higher order interaction effect, then the estimated main effect captures the whole influence of the interaction effect. So, if an unexpected interaction effect should have a nonnegligible effect (due to perfect aliasing this cannot be tested afterward), then the main effect with which the interaction effect is perfectly aliased will suffer from a corresponding bias. The internal validity of the results is endangered. For a D-efficient design, the influence of a not-included but nonnegligible interaction effect will be distributed corresponding to its correlation with the vignette variables across several main effects. Therefore, the bias that results for an individual main effect is less severe for a D-efficient design than for a fractional factorial design. 12 The fact that at least subsequent testing for unexpected higher order interaction effects is frequently impossible for fractional factorial (perfect aliasing) and D-efficient designs probably explains why such designs are sometimes seen as having a somewhat lower internal validity than random designs. An advantage of simple random designs and confounded designs is that they at least allow for the impact of such interaction effects to be tested across different vignette sets. But such tests—although at least possible—also have certain limitations.
For simple random designs, all main and interaction effects of a vignette set are randomly aliased. If a nonnegligible interaction term is not included when the vignette sets are generated, then the respondent-specific β of a vignette variable will, depending on the set-specific correlation with the interaction term, be either under- or overestimated, or almost correctly estimated. Since the aliasing structure of each vignette set of simple random designs is generated randomly, with an increasing number of respondents, the set-specific error across all included vignette sets and within the limits of the sampling error will asymptotically approach zero. Therefore, a simple random design allows an unbiased estimation of the γs of all included vignette variables. So, although the differences in the aliasing structure of the vignette sets of a simple random design contribute to an increase in random errors, they will not cause a systematic bias of the estimated γs. 13
However, if an unanticipated interaction effect is nonnegligible in comparison to a respective main effect, then it becomes impossible to interpret conditional effects (a conditional effect is the main effect of one vignette variable under the condition that the other vignette variable of the interaction term reaches a specific value that the researcher is interested in, cf. Friedrich 1982; Thome 1991). In contrast to fractional factorial designs and D-efficient designs, with simple random designs it is possible, at least in principle, to test afterward for virtually all interaction effects. This, however, strictly speaking only applies to situations where the answer behavior of the respondents empirically justifies fixing the respective random components. The test of whether such a random component is needed or not (cf. Hox 2010:47) is based on the number of individual respondents for which the randomly selected vignette set allows a respective interaction effect to be estimated. For simple random designs, this is generally only possible for a part of all respondents. It follows that, the lower the number of such respondents, the less reliable the significance test will be for all between-respondent variance components included (listwise exclusion). If none of the individual vignette sets allows an interaction effect to be estimated, then the γ can only be estimated without a variance component, that is, across vignette sets, by assuming that the
From the start, the confounded factorial design guarantees that the estimated γs of the vignette variables (including the intercept) are protected against a potential systematic bias that otherwise might be caused by a chosen design. If a suitable confounded factorial design exists which allows all important main and interaction effects to be estimated without any confounding, then respondent-specific
However, if an unanticipated interaction effect captured across vignette sets via a set variable turns out to have a nonnegligible size in comparison to a respective main effect, then the unanticipated interaction effect
14
(as well as the conditional effects) will not be biased (provided that no or only a negligible set effect exists) but the t value again might be misleading. Depending on whether the γ of a main effect is estimated with a random component or not, the t value of the
The gain in internal validity of confounded factorial designs over simple random designs consists in the higher D-efficiency of each individual fractional factorial design of which a confounded factorial design consists (this applies to each individual set comparison except the one where a fractional factorial design or sometimes even a D-efficient design is part of the simple random design)
15
; at the same time, including all set variables needed to protect the estimated γs of the vignette variables of a confounded fractional factorial design against design-related potential systematic biases also reduces the variance components of the respondent-specific
If no suitable confounded factorial design can be found, then a confounded D-efficient design may be used instead. The individual vignette set of such designs show—like D-efficient designs, fractional factorial designs, and confounded factorial designs—a higher D-efficiency than an average vignette set of simple random designs. The gain in internal validity of the confounded D-efficient design over D-efficient designs and fractional factorial designs is that the risk of design-related systematic biases of the reported
Empirical Design Comparison Conducted Using the Example of the Four Inglehart Items
Operationalizations and Data
In order to illustrate the expected differences in the reliability and internal validity between different designs, a parsimonious design is most suitable since each respondent can be presented with the fully crossed vignette population (full factorial design). One of the main advantages of this procedure is that the estimates from the full factorial design can be used as the benchmark for the intended design comparison. Based on the four items that are used to measure Ingleharts’s materialist and postmaterialist value orientations, such a parsimonious full factorial design can easily be generated.
According to Inglehart (1994:290-91), materialist and postmaterialist value priorities can only be measured adequately via a ranking procedure. This assumption has been criticized by authors like Bürklin, Klein, and Ruß (1996) as theoretically inadequate. If a political system is not generally assumed to have limited capacities for problem solving, then there is no trade-off relationship between the two items “protecting freedom of speech” and “fighting rising prices” that would justify a ranking procedure. The factorial survey makes it possible to avoid such restrictions by presenting respondents with different vignettes with a list of the four Inglehart items. The task of the respondents could be to indicate how much they would like to be governed by a party for which the listed items are either not so important or very important. The factorial survey also reduces the problem of response sets (in many cases response sets can be easily identified), frequently observed in simple rating procedures, which have been criticized also for this reason by Inglehart (1997:116-17) as an unsuitable alternative for measuring both value orientations. 16 For illustration purposes, an example vignette with an introduction to the task is given in Table 1.
Example Vignette for Measuring Inglehart’s Value Orientations with Introduction.
The vignette universe of four items with each having two levels (“important” and “not so important”) consists of 16 (= 24) vignettes. To prevent possible order effects from causing systematic bias to the estimators, the vignette order of all questionnaires was randomly selected (cf. also Jasso 2006:343; Rossi and Anderson 1982:33). The paper-and-pencil interviews with the full factorial survey (completely crossed vignette universe), which was presented to each participant were carried out on October 16, 2006, during the first session of two identical methodological courses designed for students of the Faculty of Economic and Social Sciences at the University of Cologne. In total, 137 students participated in the interviews. Five questionnaires were not sufficiently filled out and had to be excluded from analysis. Of the remaining 132 participants, 72 were female and 60 male.
The empirical comparison includes, besides the full factorial design, two random and two quota designs. The required random and quota designs were produced by dropping vignettes from the full factorial design. For analyzing random and quota designs, these subsamples/fractions of the vignette universe, including the corresponding fraction of respondent-specific answers from the full factorial design, were used. Therefore, the statistical comparison of random and quota designs is based on these fractions of the complete data set. The complete knowledge about the answers from the full factorial survey has the advantage that it allows an empirically based yardstick to be established for the empirical comparison of the reliability and internal validity of different designs. By using this setting, all possible other sources of disturbances not related to a selected design itself are eliminated.
Since Inglehart’s (1977:28-29) value types are assigned on the basis of the two items that a respondent ranked highest out of the four items, it is assumed that interaction effects are negligible. Therefore, all reduced designs included in the comparison are generated as main effect only designs. The reported D-efficiency in the following will refer to this model. Nonetheless, only those designs that should be especially robust against the influence of not included interaction effects between vignette variables will be compared. In order to reduce the likelihood of comparing outliers and at the same time to get a measurement for the stability of the results, each of the four reduced designs was produced 50 times, whereby the required vignette sets were each time randomly assigned to the respondents. Therefore, the reported results for each reduced design will be based on 50 multilevel regressions.
The first quota design included in the comparison is a confounded factorial design. This design was constructed by dividing the vignette universe along the value of the highest interaction term (X 1·X 2·X 3·X 4) into two sets of eight vignettes. Since, with regard to the main effects, each of the resulting half-fractional factorial designs is orthogonal and balanced, both set 1 and set 2 already possess a D-efficiency of 100. Dividing the vignette universe along the product term of the four vignette variables ensures that both vignette sets are of Resolution IV (cf. Ryan 2007:170). Table 2 gives an overview of the confounding pattern of the confounded factorial design.
Overview of the Aliasing and Confounding Pattern of the Confounded Factorial Design.
Note: The equals sign in the table indicates which interaction terms are perfectly aliased with an X variable or another interaction term within a respective vignette set. If an interaction term is perfectly positively aliased with an X variable or another interaction term within vignette set 1 (for instance,
a
I = intercept; “−” stands for “−1,” “+” stands for “+1.” In order to construct both half-fractional factorial designs (set 1 and set 2), the vignette universe has to be divided along the highest order interaction term
The first vignette set was assigned randomly to half of the 132 respondents; the second vignette set was assigned to the remaining respondents. 17 To get different assignments, this procedure was repeated 49 times. Since for a half-fractional factorial design only 8 instead of 16 vignettes are needed, the remaining 8 vignettes were excluded from the respective analysis. In this way, the number of vignettes included in a single multilevel analysis was reduced to 1,056 (132 respondents × 8 vignettes).
If for a given set size no suitable confounded factorial designs can be found, then reliability and internal validity can be optimized instead via a confounded D-efficient design. This condition is met for a given set size of 10 vignettes. For this set size, the maximal D-efficiency for a D-efficient design is 97.032 (cf. also Dülmer 2007:394, 398). Two different examples of such designs are presented in Table 3.
Balance and Correlations of Two Possible D-efficient Designs of Set Size 10.
Note: The D-efficiency of both designs is 97.032 (search algorithm: Modified Federov which is the most reliable, cf. Kuhfeld et al. 1994:548). Balance refers to the ratio of the levels of the dichotomous vignette variables (i.e., the ratio of “−1” to “1”). Design 1 consists of the vignettes 1, 2, 3, 4, 5, 6, 7, 8 of set 1 and 7, 8 of set 2. Design 2 consists of the vignettes 1, 4, 6, 7, 8 of set 1 and 1, 4, 5, 6, 7 of set 2 (the numbering refers to Table 2, first column). The confounded D-efficient design (D1–D8) used for design comparison consists of the following vignettes (numbering again refers to Table 2, first column):
D1: Set 1: vignettes 1–8, Set 2: vignettes 7 and 8;
D2: Set 1: vignettes 1–8, Set 2: vignettes 3 and 4;
D3: Set 1: vignettes 1–8, Set 2: vignettes 5 and 6;
D4: Set 1: vignettes 1–8, Set 2: vignettes 1 and 2;
D5: Set 1: vignettes 3 and 4, Set 2: vignettes 1–8;
D6: Set 1: vignettes 5 and 6, Set 2: vignettes 1–8;
D7: Set 1: vignettes 7 and 8, Set 2: vignettes 1–8;
D8: Set 1: vignettes 1 and 2, Set 2: vignettes 1–8.
A main effect only model for a confounded D-efficient designs consisting of D1–D4 (or alternatively for D5–D8) would already reach a D-efficiency of 100 across the four selected designs. By permuting the assignment between the four Inglehart items and the variables X 1 to X 4, one could further increase the number of designs used for a confounded D-efficient design.
D-efficient design 1 is less balanced than D-efficient design 2 (X
1 is unbalanced for both designs, X
2 only for design 2) but compensates for higher imbalance by having lower correlations between the vignette variables. In accordance with classical fractional factorial designs, the D-efficient design with the lower correlations was given priority over the more balanced design. Based on D-efficient design 1, seven further D-efficient designs of the same efficiency were constructed by switching the coding for the two levels of one or more vignette variables. Therefore, the combined confounded D-efficient design consists of eight D-efficient designs of the same D-efficiency. For the combined confounded D-efficient design consisting of eight D-efficient designs, not only does the main effect only model have a D-efficiency of 100 but so does the model that includes all possible interaction effects besides the main effects. Therefore, the confounded D-efficient design consisting of eight D-efficient designs is, like the vignette universe, fully balanced and orthogonal. Exactly the same applies to the collected data only as long as each of the eight designs is answered equally often. To distribute the eight D-efficient designs as evenly as possible across the 132 respondents, four of the eight D-efficient designs were used 16 times, and the remaining four 17 times (
The only respondent characteristic that is included in the multilevel analysis is gender. The decision to include a respondent characteristic essentially goes back to an assumption of Steiner and Atzmüller (2006:123-24): Confounding set effects (caused by differences between vignette sets) with respondent effects, as is the case for random designs, would have a negative impact on the estimators for respondent characteristics. Since vignette sets are randomly assigned to respondents, this fear seems to be unwarranted for the reason that, with an increasing number of respondents, both effects will, within the limits of sampling error, be asymptotically uncorrelated. Hence, the estimators for the respondent level should at least not suffer from a systematic bias.
All predictor variables included in the analyses were effect coded. 18 Code −1 was used if a political goal was “not so important” for a fictitious governing party, code 1 if the goal was “very important” for the party. A respondent’s gender was coded −1 for males and 1 for females. The rescaled answer scale from the vignettes ranges from 0 (not at all) to 8 (very strongly). 19 The set variable included in the confounded factorial design was coded −1 for the first half-fractional factorial design and 1 for the second half-fractional factorial design. In our case, systematic set effects can be excluded because all respondents answered the fully crossed vignette universe. Hence, the effect that is captured by a respective set variable exclusively measures the impact of the interaction effects between vignette variables that are perfectly confounded with this set effect. All multilevel analyses were carried out with the multilevel program HLM 6.
Empirical Results of the Example Comparison
Each multilevel regression model is based on equation (4), except for the predictor variables the same terminology is used as in equation (1), whereby the set variable (Set) was only included in the multilevel regression model for the confounded factorial design. Since all five estimated variance components turned out to be highly significant, none of them could be fixed (i.e., dropped from the equation for the multilevel model).
Vignette level (Level 1):
Respondent level (Level 2):
The empirical results of the design comparison are presented in Table 4. The first column contains the estimators for the main effect model of the full factorial design. Since 50 separate multilevel analyses were estimated for each of both random designs and for each of both quota designs, the reported results for these cases also include the respective standard deviations. Two further columns were needed for the confounded factorial design in order to document the estimators of the main effect of the set variable and its interaction effect with each of the four vignette variables (mean effect: set variable) as well as their respective standard deviations.
Empirical Comparison of the Estimated Coefficients of Different Designs.
Note: Coding of gender: male = −1, female = 1. The effect of the set variable for the intercept is the main effect of the set variable. The effect of the set variable of the vignette variables is the interaction effect between a respective vignette variable and the set variable. The (Pseudo-) R 2 for each level has been calculated according to the simplified formula of Snijders and Bosker (1994:350-54). All multilevel models were estimated via restricted maximum likelihood.
The estimated γs are documented in the first block (rows 1–6) of the table. Since for the full factorial design all correlations between the vignette variables and their interaction terms are exactly zero and since only respondents who judged all 16 vignettes were included, the reported
The confounded factorial design is the only reduced design for which the magnitude of the interaction effects that are perfectly aliased with the main effects can be estimated via respective set variables across respondents (this applies as long as no or only a negligible set effect exists). Such set variables prevent the estimated main effects from being systematically biased by potentially nonnegligible interaction effects in cases where the response rate for different vignette sets (i.e., different fractional factorial designs) somewhat differs.
According to Table 2, each of the four Inglehart items is perfectly aliased within a vignette set with a three-way interaction term between the remaining three Inglehart items, in vignette set 1 positively (for instance,
The first systematic differences between the four reduced designs can be observed with respect to the standard deviations for the
If the same γ is estimated with a lower standard error, then the t value as the quotient of both estimators will increase. As a consequence, a
In multilevel analysis, the estimated sum of the squared deviations of the
The average coefficients of multiple determination computed according to the simplified formulas proposed by Snijders and Bosker (1994:350-54) can be found immediately below the χ2 values. Including the four vignette variables in the multilevel model explains 36.8 percent of the vignette level variance of the full factorial design. All reduced designs show very similar values. Big differences between the designs become visible for the respondent level: According to the full factorial design, gender accounts for 1.3 percent of the level 2 variance. With an average of 2.5 percent explained variance, the confounded factorial design comes closest to this value. This small difference, however, can be traced back to the additional inclusion of the set variables at level 2. The remaining respondent level R 2s are on average 3.3 percent for the confounded D-efficient design, 9.7 percent for the simple random design of set size 10, and 13.2 percent for the simple random design of set size 8. The reason for the higher percentage of explained variance is that differences between the means of the vignette variables of different vignette sets lead to variation in the mean of the dependent variable between respondents. Including the vignette variables in regression analysis explains these differences between respondents—the level 2 coefficient of multiple determination increases. Since both half-fractional factorial designs of the confounded factorial design are balanced, no differences between the means of the vignette variables exist across either set. Although such mean differences also exist for the confounded D-efficient design, optimizing D-efficiency reduces these differences in comparison to simple random designs. Hence, including all vignette variables in multilevel analyses of quota designs either cannot increase the R 2 values of the respondent level or only to a lower degree than for a simple random design.
A final look at the number of iterations shows that due to their more complex error structure where unique set effects are confounded with unique respondent effects, random designs need on average more iterations than quota designs until the maximum likelihood function for estimating multilevel regression converges.
Conclusion
The purpose of this contribution was to compare random designs and quota designs with respect to their reliability and their internal validity. The reliability of a design depends on its D-efficiency and therefore on the precision with which a respondent-specific β can be estimated. One strength of quota designs is that for a given set size they yield more reliable estimators for the respondent-specific βs than an average set of a simple random design. However, to what extent the estimators of a quota design can benefit from this characteristic, for hierarchically structured data like those of factorial surveys, depends on the heterogeneity of the respondents’ answer behavior. If no significant differences exist between the respondent-specific
However, even a highly reliable estimator might be heavily biased and for that reason may be highly invalid. One strength of simple random designs is their low susceptibility to systematic biases as might be caused for fractional factorial designs and for D-efficient designs by their aliasing structure. Since the aliasing structure of simple random designs varies randomly across different vignette sets, respondent-specific βs will—depending on the set-specific correlations between the predictor variable and not included but nonnegligible interaction terms—either be systematically under- or overestimated, or almost correctly estimated. However, the estimated βs will show within the limits of the sampling error no systematic bias across all respondents.
The confounded factorial design is the only design that provides real protection against systematic biases that otherwise might be caused by the aliasing structure of a design. For a given set size, it also yields a higher D-efficiency than the simple random design. All in all, the balanced confounded factorial design is the most D-efficient design with the highest internal validity and for that reason represents an ideal reduced design. For complex research questions, and/or in cases where vignette variables with different numbers of levels have to be included, a confounded factorial design will rarely be found (cf. also Steiner and Atzmüller 2006:144). For such situations, one may instead opt for a confounded D-efficient design, which can be generated for every set size. Since generating a confounded D-efficient design is rather time consuming for very complex research questions with a high number of vignette variables and/or a high number of levels per vignette variable, for such cases one might in practice continue using simple random designs. The same applies to situations where impossible vignette combinations exist that prevent a suitable quota design from being found. Finally, using fractional factorial designs or D-efficient designs may only be recommended in cases where aliased interaction effects (as well as set effects that are aliased with these interaction effects within the vignette sets of the chosen design) can be assumed to be negligible relative to the affected βs that will be estimated in the multilevel model. This recommendation, however, is restricted to situations where relatively small vignette sets have to be used and where, besides the intercept, a high number of slopes have to be estimated with their own random component (cf. also Dülmer 2007:406).
Footnotes
Acknowledgements
I thank the anonymous SMR reviewers for their constructive comments and suggestions.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
