Abstract
Discrete-option multiple-choice (DOMC) items differ from traditional multiple-choice (MC) items in the sequential administration of response options (up to display of the correct option). DOMC can be appealing in computer-based test administrations due to its protection of item security and its potential to reduce testwiseness effects. A psychometric model for DOMC items that attends to the random positioning of key location across different administrations of the same item is proposed, a feature that has been shown to affect DOMC item difficulty. Using two empirical data sets having items administered in both DOMC and MC formats, the variability in key location effects across both items and persons is considered. The proposed model exploits the capacity of the DOMC format to isolate both (a) distinct sources of item difficulty (i.e., related to the identification of keyed responses versus the ruling out of distractor options) and (b) distinct person proficiencies related to the same two components. Practical implications in terms of the randomized process applied to schedule item key location in DOMC test administrations are considered.
The multiple-choice (MC) item has been a mainstay of educational measurement despite its known limitations. The transition to computer-based measurement has made it possible to address some of these limitations through alterations of the MC format. One potentially appealing alternative is the discrete-option multiple-choice (DOMC) item (Foster & Miller, 2009). Unlike traditional MC items, where the item stem and response options are presented simultaneously, under the DOMC format, examinees are shown the stem and then are presented and respond to response options one at a time. An incorrect response to a DOMC item occurs if the examinee either (a) selects a presented distractor option or (b) fails to select a presented keyed response, regardless of how many overall options the examinee is administered. A correct response to a DOMC item only occurs when the examinee avoids selecting any presented distractor option and ultimately selects the keyed response(s). (Commonly a subsequent additional response option will be administered with a certain specified probability so that the respondent will not be aware whether the final given response was correct or incorrect.) Important to the current application, the sequence in which DOMC response options are presented is randomly determined and for a given item will vary from one administration to the next.
Although the DOMC format is related to other format types (e.g., multiple-correct MC items, true-false items), its differences from the traditional MC format seems worthy of psychometric attention. Empirical results suggest that DOMC items tend to be more difficult than MC items (Eckerly, Smith, & Sowles, 2018). There are at least a couple of reasons that have been given for why such psychometric changes may reflect increases in validity. First, by having examinees evaluate one response option at a time, the DOMC format is believed to reduce the influence of testwiseness on test performance (Kingston, Tiemann, Miller, & Foster, 2012; Papenburg, Willing, & Musch, 2017; Willing, Ostapczuk, & Musch, 2015). Specifically, respondents are less able to arrive at a correct answer only through ruling out distractor response options, as they also need to recognize the keyed option when presented. Second, by varying the presentation of the item across examinees and reducing the exposure of distractor options, the format also helps protect item security and thus potentially reduces concerns over test compromise.
The goal of this article is to develop a psychometric model that can be used to better understand measurement differences under the DOMC format. One of the unique psychometric advantages of DOMC is the opportunity to evaluate the distractibility of incorrect response options independent of the correct (keyed) responses. In traditional MC items, only the relative attractiveness of distractors against the correct response can be determined. Thus, it is always challenging to determine whether a difficult MC item is difficult because of attractive distractors, a low attractiveness of the correct response, or some combination. This feature is also reflected in item response models typically applied to MC items (specifically, “divide-by-total” models, see Thissen & Steinberg, 1986). By contrast, under DOMC, it becomes theoretically possible to distinguish conditions where an item is difficult due to attractive distractors as opposed to an inability to identify the correct response.
At the same time, there are aspects of the DOMC format that need further exploration. Eckerly et al. (2018) highlight the role that key location can play in defining the difficulty of DOMC items. Naturally, when the keyed response occurs later in the sequence of scheduled options, more work is required of the examinee in achieving a correct response. Such differences may raise equity concerns to the extent that pure randomization of key location across items is not successful in balancing average key location across administrations. In addition, it is unclear how the nature of the proficiencies measured by MC and DOMC items may be altered by the change in format. In this paper, the proposed model is examined in application to two empirical data sets. Each data set permits the examination of the relationship between the same items administered under both DOMC and MC formats. The first data set comes from an information technology (IT) certification test currently administered using the DOMC format for which data from a separate population are available under the MC format. The second data set was collected from online respondents to a test of Harry Potter trivia in which each examinee was administered some subset of the items under the DOMC format and other items under the MC format.
Psychometric Model for DOMC Items
As noted above, a previously highlighted feature of DOMC items concerns the anticipated effect of key location. For DOMC items with just one keyed response, key location refers to the position in the sequence of presented options where the keyed option is scheduled to appear, irrespective of whether the key is ever actually administered to the candidate. For an item with four response options, this value could be 1, 2, 3, or 4. As one of the empirical data sets also contains items with multiple keyed responses (where all keys must be selected as correct by an examinee to obtain a correct response for the item), the last key location was more generally defined as the location in which the last of the keyed options is scheduled to appear. Although it has been observed that difficulty of the DOMC item generally increases as the last key location becomes later, it is conceivable that the effect of this feature may vary both across items and examinees. For example, a reduced effect of key location might be expected for items in which the distractors are not attractive. Examinees may also demonstrate a similar source of variability with respect to the key location effect. It might be anticipated that the ability to rule out distractor options could at some level be distinct from the ability to select a keyed response, in which case, a key location effect may show variability across examinees.
To account for each of these potential sources of variability, a psychometric model that is sensitive to both item and examinee variability in key location effects was considered. The model can be viewed as implying that each presented response option in a DOMC item is another “step” in a sequence of steps that must be followed to arrive at a correct response. The number of such steps is determined by the scheduled location of the last keyed response, denoted as
where
where
The model in Equation 1 will be referred to as the key location (KL) model for DOMC items. Special cases of the KL model can be considered where
The use of exponent parameters to account for the effect of key location emphasizes the fact that a correct score on a DOMC item can be viewed as the outcome of a sequence of conjunctively interacting processes, specifically, a sequence of correct responses (i.e., correct rejections of distractor options, and correct selections of keyed options) up through the last keyed response. Prior psychometric models (see, for example, Samejima, 2000) have similarly accounted for conjunctively interacting subprocesses through an exponent parameter. Examining key location effects may also be informative in regard to how students solve MC items. Items with smaller
Interestingly, while both
Model Estimation
The authors focus here on the most general KL model, as the special cases can also be estimated within the same general framework. The estimation approach is fully Bayesian, with priors specified so as to allow the data to strongly inform both examinee and item parameters. The general model includes two examinee parameters (
The authors implement an MCMC algorithm for the model using WINBUGS 1.4. The software implements a Metropolis-Hastings sampling algorithm in which 4,000 adaptive iterations are used to tune proposal distributions, and a subsequent 6,000 iterations are used to determine a multivariate posterior distribution of the model parameters. The authors apply the Gelman–Rubin criterion using five simulated chains to evaluate chain convergence. In all cases, the posterior means of the model parameters and their associated posterior standard deviations define the estimates and their precisions.
Real Data Illustrations
Two real test data sets are considered involving DOMC items. Importantly, each of these data sets involves the same items administered in both DOMC and MC formats.
IT Certification Exam
The first data set comes from an operational IT certification test. Item response data for two 59-item forms administered under the DOMC format are considered. Due to an overlap of items across forms, a total of 83 unique DOMC items were analyzed. Most (54) of the items are single-keyed items, 24 had two keyed responses, and five had three keyed responses. All items with two or fewer keyed responses had four response options; the items with three keyed responses each had five response options. As a result, the latest key location varies from one to four for single-keyed items, from two to four for two-keyed items, and from three to five for three-keyed items. In estimating the KL model, a concurrent calibration strategy was applied, with a total of 648 examinees providing item response data. Thirty-five of the 83 items had responses from all 648 examinees; the remaining 48 items were administered on just one of the two forms and thus have response data from approximately 324 examinees each.
In fitting the KL model in Equation 1, the authors simulated five chains out to 10,000 iterations. The Gelman–Rubin R2 was effectively 1 for each of the three item parameters (a, b, and s) across all items, suggesting successful convergence. An overall comparison of the general model against the three restricted models—that is, the constant exponent model (
Deviance Information Criterion Results for General Model and Comparison Models, IT Certification DOMC data.
Note. Value in bold identifies lowest DIC. IT = information technology; DOMC = discrete-option multiple-choice; KL = key location; DIC = Deviance Information Criterion.
IT: Interpreting Item Parameter Estimates
Supplemental Appendix A reports the DOMC item parameter estimates for the 83 DOMC items from the IT certification test. Table 2 shows the corresponding descriptive statistics of the item parameter estimates. It is apparent that all three parameters vary substantially and capture unique aspects of the items; the
Descriptive Statistics, DOMC Item Parameter Estimates, IT Certification Test Data (N = 83 Items).
Note. DOMC = discrete-option multiple–choice; IT = information technology; M = Mean; SD = Standard Deviation.
The authors focus here on the

Illustration of change in predicted item probabilities from earliest to latest last key location for a hypothetical respondent (
To further validate the item parameter estimates of the KL model, a regression model in which the logit of the DOMC p value is regressed against all three of the KL item parameter estimates was considered. Table 3 displays the results. As anticipated, both
Regression of Logit of DOMC Item Difficulty (p Value) on DOMC Item Parameter Estimates, 83 DOMC Items, IT Certification Data (
Note. DOMC = discrete-option multiple-choice; IT = information technology.
IT: Interpreting Person Parameter Estimates
Similar to the item parameter estimates, the two examinee parameter estimates (
Descriptive Statistics, DOMC Person Parameter Estimates, IT Certification Test Data.
Note. DOMC = discrete-option multiple-choice; IT = information technology; M = Mean; SD = Standard Deviation.
Figure 2 provides an illustration of the key location effect for two examinees of different

Illustration of change in predicted item probabilities from earliest to latest last key location for two hypothetical examinees with
In a similar way to how

Scatterplot of
The relevance of both person parameters to performance on the IT certification test in terms of observed sum score can be examined through a multiple regression analysis in which the person parameter estimates function as predictors. Table 5 displays the results of a multiple regression analysis in which the logit of the examinee proportion correct score on the DOMC test is the studied outcome. As seen in the table, both
Regression of Logit of DOMC Proportion Correct Score on
Note. DOMC = discrete-option multiple-choice; IT = information technology.
The relevance of key location to item performance, as well as the variability of its effects both across items and examinees, highlights a unique feature of DOMC items relative to MC items. The distinguishability of these proficiencies can also be related to concerns regarding testwiseness in MC items. Specifically, testwiseness effects on MC items might be viewed as an examinee using
These results also have some practical implications regarding the DOMC format. From a test equity perspective, the results raise questions about the implications of purely random scheduling of key location. Eckerly et al. (2018) have already demonstrated the potential for such a process to render an overall test that varies in difficulty across examinees; however, the likelihood that key location effects may also vary across persons and items (as reflected by
Harry Potter Trivia Data
A second DOMC data set is based on participants administered MC Harry Potter trivia items, for which an underlying latent trait might be viewed as Harry Potter knowledge. Individuals were recruited for participation online. For this test, participants were also administered a mixture of MC and DOMC items; a total of 21 items are analyzed, each item being administered to a respondent in either the MC or DOMC format. All 21 items consisted of a stem followed by five response options, and all were single-keyed items. Ultimately, the test administration design entailed two forms, one for which 10 of the items were DOMC, the other 11 MC; the second form administered the same items but with a complete reversal of format such that the 10 DOMC were administered as MC, and the 11 MC as DOMC. The form administration was randomly determined. The analysis considers responses provided by 2,488 respondents.
The same models considered for the IT data were applied to the DOMC responses of the HP data. Table 6 displays the model comparison criteria. Although the KL model (including variability in both
Deviance Information Criterion Results for General Model and Three Comparison Models, Harry Potter Trivia DOMC Data.
Note. Value in bold identifies lowest DIC. HP: Item Parameter Estimates. DOMC = discrete-option multiple-choice; DIC = Deviance Information Criterion.
Table 7 displays descriptive statistics for the KL item parameter estimates. Consistent with the DIC results, the authors notice not only a much lower mean
Descriptive Statistics, DOMC Item Parameter Estimates, Harry Potter Trivia Data (N = 21 Items).
Note. The
Similar results for the HP data were observed as were seen for the IT data when regressing the logit of the DOMC item p values on the corresponding item parameter estimates, as shown in Table 8. Importantly, as with the IT data, the authors see the DOMC difficulty is statistically affected by both the
Regression of Logit of DOMC Item Difficulty (p Value) on DOMC Item Parameter Estimates, 21 DOMC Items, Harry Potter Trivia Data (
Note. DOMC = discrete-option multiple-choice.
To further illustrate interpretation of item parameter estimates and the type of heterogeneity captured, two example items were considered: Items 15 (
HP: Person Parameter Estimates
Table 9 displays descriptive statistics for the person parameter estimates from the HP analysis. Consistent with the reduced variability seen in the
Descriptive Statistics, DOMC Person Parameter Estimates, Harry Potter Trivia Data (N = 2,488 Persons).
Note. DOMC = discrete-option multiple-choice.
Consistent with this interpretation, the scatterplot of

Scatterplot of
As noted above, the design of the HP data permit a more formal comparison of ability estimates for DOMC in comparison to MC. To further explore this issue, the authors fit the traditional two-parameter logistic model (2PL) to the binary item scores from each of the MC and DOMC tests. The resulting correlation between
Regression of Latent Proficiency in DOMC (
Note. DOMC = discrete-option multiple-choice; MC = multiple-choice.
Potential Implications for Administration of DOMC Items
As implied earlier, one of the implications of randomizing key location across items is the potential for variability in overall test difficulty across administrations. Quite clearly, examinees administered a DOMC test with later key locations will be at a disadvantage to those administered the test with earlier key locations. Following the KL model in Equation 1, such effects are further complicated by the presence of both person- and item-related factors, as reflected by
One advantage of the KL model is that the authors can explore the expected test score implications of the randomized key schedule as a function of both the item and person parameter estimates. Specifically, based on the item parameter estimates of each analysis, the implications of randomized key location can be studied for a hypothetical respondent at a given level of
The true scores vary across Ω
k
for a fixed respondent and test. Consequently, a true score variance can be quantified for a hypothetical respondent at
where variance is calculated across Ω
k
and can be approximated by a sufficiently large number of simulations k = 1, . . ., K. A large
One of the advantages of this approach is that it can also be used to quantify the implications of alternative key scheduling approaches. A constrained randomization schedule that seeks to balance the distribution of key location across hypothetical administrations was considered here. As noted, each form contains 38 single-key, 17 double-key, and four triple-key items in which the key locations are 1 through 4, 2 through 4, and 3 through 5, respectively. A constraint is considered in which for the single-keyed items, 9, 10, 10, and 9 items have key locations 1, 2, 3, and 4, respectively; for the double-keyed items, 5, 6, and 6 items have last key locations of 2, 3, and 4, respectively; and for the triple-keyed items, 1, 2, and 1 items have final key locations of 3, 4, and 5, respectively. Even subject to these constraints, a large number of permutations of items to key locations is possible. Equations 2 and 3 were evaluated under these constraints and compared against a completely randomized schedule.
Based on 1,000 simulated administrations, a true score, a true score variance, and a 95% interval of true scores can be defined at a level of

Illustration of 95% true score intervals under randomized and constrained randomization schedules, Forms 1 and 2, information technology certification test, kappa = 1.5.
Discussion
The DOMC item is a promising item type for computer-based measurement. The article sought to better understand psychometric characteristics of the DOMC versus MC formats as observed in two real DOMC administrations. One advantage of the DOMC format is that by administering response options individually, it becomes possible to more effectively evaluate two distinguishable sources of item difficulty that are otherwise inseparable under the MC format, specifically, difficulty related to identification of keyed options as opposed to difficulty in not selecting distractor options. This separation can also be studied on the person side, where proficiencies related to the identification of correct responses (
To the extent that incorrect responses on DOMC items can occur in either of two ways (i.e., not selecting a keyed response OR selecting a distractor), the correctness of a DOMC item score can be viewed as the outcome of conjunctively interacting processes, namely, successes in ruling out distractors AND successes in selecting keyed responses. This feature of DOMC items is emphasized in the proposed model. By contrast, under the MC format these processes are often believed to interact disjunctively, including the possibility that a correct response on an MC item may be arrived at solely by ruling out distractors (and selecting the one remaining response). This feature of MC items is frequently viewed as undesirable and a contributing factor to “testwiseness.” From another perspective, it becomes possible to perform well on an MC test even with a low
The emphasis on the conjunctive interactions of processes related to the sequential administration of response options motivated the consideration of key location effects in the form of exponent parameters. However, it would also be possible to consider key location as a covariate, possibly with varying effects by item and examinee, in predicting parameters in traditional item response theory (IRT) models, such as the 2PL and 3PL models. Attempts to fit such models to the IT data are reported in the supplemental appendix, but the general KL model described in the article still emerged as statistically superior.
Some additional issues regarding the KL model seem important to consider. Alternative estimation methods beyond MCMC might be considered. The authors chose an MCMC approach following in part from work on related models involving the estimation of exponent parameters (e.g., Bolfarine & Bazán, 2010). Within the MCMC approach, a better understanding of the implications of alternative priors, and the relative strengths of different priors, could be studied. Some simulation analyses related to the current estimation approach are reported in the online supplemental appendix.
Other issues relate more to applications and interpretation of KL parameter estimates. The person proficiency
A second application of the proposed model demonstrated in this article concerns its use in understanding the implications of a purely randomized key location schedule. Although randomization of key location should yield comparability in the distribution of key locations across a large number of administered items, the analyses in this paper suggest that pure randomization has the potential to yield inequities for tests of moderate length, inequities that become more consequential for respondents of high
A third application of the model relates to insight into differences between the DOMC and MC formats. The authors find that effects related to key location significantly explain both variability in difficulty changes (when the same item is modified from MC to DOMC format) as well as variability in examinee proficiency changes (when the same examinee is administered both MC and DOMC tests). Specifically, DOMC items with a large s tend to be disproportionately more difficult under DOMC, while examinees with a large
While this article has focused on key location as a factor to attend to in protecting test equity, other factors could also be examined. For example, while the analyses in this paper assumed a random equivalence of the difficulty of distractors, the distractor options for a fixed item will generally vary to some extent in terms of their distracting ability. This may well contribute an additional source of item administration variability independent of the key location effect. Further study (and alternative models) may be needed to effectively account for such occurrences where present. For example, as noted in the article, the distractor difficulty of Harry Potter trivia Item 19 appears largely driven by the difficulty of one particular distractor (as opposed to all of them), a characteristic that may well be important to address. Finally, different perspectives on measurement that might disagree with the notion of equity as discussed under the proposed model were recognized. It might be contended that to the extent that the same completely randomized key location schedule is applied for all examinees, differences in the actual scheduled key locations should not be viewed as source of inequity. Clearly, more study of the DOMC format would appear useful.
Supplemental Material
supplementary – Supplemental material for A Psychometric Model for Discrete-Option Multiple-Choice Items
Supplemental material, supplementary for A Psychometric Model for Discrete-Option Multiple-Choice Items by Daniel M. Bolt, Nana Kim, James Wollack, Yiqin Pan, Carol Eckerly and John Sowles in Applied Psychological Measurement
Footnotes
Acknowledgements
The authors gratefully acknowledge David Foster and Sarah Toton for access to the Harry Potter trivia data and comments on an earlier draft of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
