Abstract
We present a novel application of a generalized item response tree model to investigate test takers’ answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated with large-scale mathematics test items. We also describe how the estimated results can be used to study the benefits of answer change and to further detect potential academic cheating.
1. Introduction
Multiple-choice assessment is a practical device to evaluate test takers’ performance and progress in educational measurement. A common multiple-choice format requires test takers to select the best answer from a number of alternatives. After having made an initial decision, test takers may subsequently identify an alternative option as the best answer. Hence, test takers may either retain their original answer or switch to an alternative answer (Milia, 2007).
A test taker’s answer change behavior has intrigued educational researchers and practitioners for quite some time. Originally, researchers have been motivated by the question of whether changing initial answers would increase students’ test scores (van der Linden & Jeon, 2012). According to conventional belief, changing original answers is likely to lower students’ overall test scores. However, a number of empirical studies have contradicted this popular belief, showing that most answer changes can be categorized as being from wrong to right (WR) and that the majority of examinees improve their test scores via answer change behavior (e.g., Benjamin, Cavell, & Shallenberger, 1984; Crocker & Benson, 1980; Foote & Belinky, 1972; Mathews, 1929; McMorris et al., 1991; Papanastasiou & Reckase, 2008; Vispoel, 1998).
Numerous studies have also been conducted to develop a more in-depth understanding of test takers’ answer change behavior. For instance, researchers have shown that answer change gains depend on the characteristics of individual test takers; for example, test takers with moderate to high ability levels earned minor to moderate gains from answer change, while low-ability test takers earned nominal gain (McMorris, DeMers, & Schwarz, 1987). Other researchers discussed how answer change behavior is also related to test takers’ differential confidence level (e.g., McMorris et al., 1987), attitudes (e.g., Friedman & Cook, 1995), or cognitive styles (e.g., Bjorklund, 1989; Millman, Bishop, & Ebel, 1975). For example, field independent examinees are better able to isolate parts from their whole in order to analyze these parts and they are more effective in answer changes than field-dependent examinees who perceive the parts from the whole (Friedman & Cook, 1995). Reflective examinees might expend more time to answer items initially but be better at correctly changing answers when compared to impulsive examinees who minimize their time to answer items (Friedman & Cook, 1995).
It is noteworthy that many of these earlier studies on answer change are based on classical test theory. That is, simple number-correct scores and marginal counts (or proportions) of wrong to write (WR) or right to wrong (RW) changes have commonly been utilized. This classical approach, however, overlooks the issue that examinees can have different ability levels and test items can have different item properties (van der Linden, Jeon & Ferrera, 2011). Hence, an improved approach should take into account the examinees’ ability levels and individual item properties.
Recently, van der Linden, Jeon, and Ferrera (2011) and van der Linden and Jeon (2012) proposed a sophisticated item response theory (IRT) approach to model test takers’ item review and answer change process. Their model was formulated based on the assumption that examinees first produce initial answers on all items; once completing a first pass, examinees review the initial answers to either confirm or change them. To model the two-stage item review process, van der Linden et al. utilized a regular IRT model for the initial answers and employed a fixed-ability logistic regression model for the final answers, given the initial responses.
In the current study, we propose a new modeling approach to investigate examinees’ answer change behavior. The main idea is to model a sequential answer change process based on a tree structure; the process of making an initial answer is represented as a top node and the process of making a change (or no change) is represented with two intermediate nodes. The outcomes of the answer change process are modeled as the leaves of the tree (or as terminal nodes). Note that our proposed approach is based on a generalized item response tree model (e.g., Jeon & De Boeck, 2015). Item response tree models have received increased attention in educational measurement and have been utilized in a variety of situations, for example, for modeling response styles (Plieninger & Meiser, 2014), missing/omitted item responses (Debeer, Janssen, & De Boeck, 2014), sensitive survey questions (Boeckenholt, 2013), and slow/fast intelligence (DiTrapani, Jeon, De Boeck, & Partchev, 2016; Partchev & De Boeck, 2012). Our contribution is to provide a unique application of the tree approach to explicate the answer change process.
Our approach is similar to van der Linden et al.’s (2011, 2012) method in that the examinees’ answer change behavior is modeled as a sequence of test-taking activities: (1) giving initial answers and (2) changing or keeping these initial answers. Unlike van der Linden et al.’s method, which is based on one fixed ability, we allow that other, somewhat different latent traits are involved in the answer change process than in the initial response. In addition, we posit that the two series of activities (making initial answers and changing/keeping the initial answers) are organically related to each other. Hence, the two activities and their latent traits should be modeled and estimated simultaneously, in one single step.
For empirical illustration purposes, we will apply the proposed model to analyze a large-scale assessment data set that includes answer change. We will demonstrate that the proposed tree model is a reasonable representation of test takers’ answer change behavior. We will further show that the data analysis with the proposed approach provides new information that can improve our understanding of the answer change process. Finally, we will illustrate how the proposed approach can be utilized (1) to investigate whether or not changing an answer is beneficial to the examinees’ score and (2) to detect irregular answer change patterns (from WR) that may indicate academic cheating.
2. Modeling Item Response-and-Change Behavior
Test takers may change their initial answers during an exam. For paper tests, optical scanners can be set to detect answers to multiple-choice items that have been erased and replaced by new answers. For computerized tests, such information is readily accessible by checking the log-files. The proposed methodology can therefore be applied to both paper and computerized testing.
To model the item response-and-change behavior in multiple-choice assessments, we first defined initial responses as the first responses a subject provides and final responses as the responses a subject submits. If more than one change is made on an item, multiple intermediate responses exist between the initial and final answers. Here we concentrate on modeling only the final answer to illustrate the proposed approach more clearly within a simplified setting. An extension of the methodology that allows for multiple answer changes is discussed in Section 4.
If there is a change, the final responses are different from the initial responses. If there is no change, the final responses are the same as the initial responses. Response changes necessarily imply that the initial responses are reevaluated. Item reevaluation may or may not have occurred if there is no change. The possible scenarios of no change are not identifiable from the data and thus collapsed into one category in the proposed approach. We focus on making inferences based on what is observable (initial and final responses) without making assumptions about what may have happened in between. We can speculate and propose interpretations, but the model formulation itself relies only on the observed responses being correct or incorrect.
2.1 An Extended Item Response Tree Model
Suppose all initial and final responses are coded as correct or incorrect. We then observe four possible combinations of the initial and final responses: initial wrong response and final wrong response (WW), initial wrong response and final right response (WR), initial right response and final wrong response (RW), and initial right response and final right response (RR).
Note that the WW outcome can imply three potentially different scenarios: (a) The earlier response is not reviewed, (b) the earlier response is reviewed and confirmed, and (c) the earlier response is reviewed and changed to a different wrong answer. Also, the RR outcome can have two possible cases: (a) The earlier response is not reviewed and (b) the earlier response is reviewed and confirmed. Since these potential cases are either unidentifiable (e.g., [a] and [b]) or has no impact on correctness (e.g., [c]), we collapse them into the WW and RR categories, respectively.
We view these four observed patterns as the outcomes of the item response-and-change behavior and model them utilizing a tree structure. This is illustrated in Figure 1.

A response tree for initial and final answers. There are four possible outcomes: (1) represents a wrong answer (WW), (2) presents an initial wrong answer being changed to a right answer (WR), (3) represents an initial right answer being changed to a wrong answer (RW), and (4) represents a right answer without change (RR).
The tree consists of three nodes (circles) with each node having two branches (one-directional arrows). In principle, the tree diverges through branches until it reaches leaves (four observed outcomes). The tree can be expanded if multiple response options are used and if more than one change is incorporated. Extensions to multiple responses options and multiple changes are discussed in Section 4.
In the most general model formation, the probability of the left or right branch for a node depends on a latent trait and item parameters. The latent trait for Node 1 can be seen as the ability to get the item right when the item is first reviewed. The latent trait for Node 2 can be seen as the ability to make a correct change when the initial response is wrong. The latent trait for Node 3 can be seen as the ability to make no change when the initial response is right.
In this most general case, all three latent variables are assumed to be differentiable. For example, information processing can be automated for the initial responses but controlled for the final responses. It is important to note that respondents do not need to be aware of whether their earlier responses are correct or incorrect to further differentiate Nodes 2 and 3. Latent variables for Nodes 2 and 3 can still be differentiable because different state of minds or abilities can play a role. For example, in rectifying an incorrect response (Node 2), respondents’ ability in a reflective state (which may be different from an initial state) and/or self-criticism ability may be involved, whereas in keeping or changing a correct response (Node 3), respondents’ nervousness about the responses being final and/or uncertainty about their earlier answers may be involved. These are of course speculative interpretations and other interpretations are certainly possible. These speculations fulfill only the role of explaining why the three latent variables do not need to be the same. Whether the latent variables can be differentiated is an empirical decision.
In contrast to the most general case, not all three latent variables may be distinguishable from each other (or perfectly correlated with each other). It is also possible that the latent variable for Node 1 is distinguishable from the latent variables for Nodes 2 and 3, while Nodes 2 and 3 are not differentiable because they are based on the same final response ability. The above two simpler cases can be formulated as constrained versions of the most general model with three node-specific latent traits. The item parameters can also be postulated to be equal across nodes similar to the latent traits for these simpler cases. An optimal model structure can be empirically determined based on model fit comparisons (which will be illustrated in Section 3.2). Hence, from now on, we will concentrate on describing the most general case with a latent trait and a set of item parameters per node.
2.2 Mathematical Formulation
The outcome of Node 1, denoted by
where superscript (1) stands for the first node,
For Node 2, the outcome variable
where superscripts (2) and (3) stand for the second and third nodes conditional on the initial wrong and right responses, respectively. Specifically,
We note that the issue of guessing can be somewhat different depending on the node. Node 1 outcomes represent correct and incorrect answers to multiple-choice items; hence, it may be reasonable to consider guessing. For the other nodes, however, one may assume that subjects have good reasons for changing or not changing an answer, so that guessing may be unlikely to happen. The possibility of including guessing parameters in all or some of the nodes may also be empirically tested. 1 Obviously, the sample size requirement should also be taken into account when the inclusion of guessing parameters is considered.
To model the probability of the tree outcomes (WW, WR, RW, and RR), denoted by Zpi hereafter, we use the fact that Zpi can be recoded with a set of node-specific outcomes (
For instance,
Observe that the node-specific probability with a missing observation (NA) does not contribute to the probability of observing Zpi.
Based on Equations 5
through 8, the likelihood function for the general model can be written with
where
where Tmk is the kth node response that is related to
To specify the latent distribution
Simpler models that we considered earlier can be formulated by considering relevant constrains on
where the linear predictor
2.3 Identification and Estimation of the General Model
Since the proposed model can be parameterized as a simple-structure multidimensional model, the model is identified by complying with conventional identification constraints for simple-structure multidimensional models. Specifically, we fix the mean and variance of the latent distribution
The node-specific response vector
As indicated earlier, the proposed model described in Section 2.2 is formulated based on generalized item response tree modeling. For a detailed discussion and other applications of the item response tree approach, we refer readers to Jeon and De Boeck (2015).
3. Empirical Application
Here we provide an empirical example to show that the proposed model is a reasonable representation of the item response-and-change behavior. We will illustrate two potential uses of the proposed model, for (1) computing answer change gains and (2) detecting irregular WR changes that may indicate academic cheating.
The described model was applied to a large-scale, paper-and-pencil mathematics assessment that was created and administered by a large testing company. The test consisted of 43 test items and was applied to 3,900 eleventh-grade students. Students’ original answers and changed answers were detected by using a high-precision optical scanner. Only the final changes (if any) on individual items were utilized to apply the proposed modeling approach (as discussed in Section 2). We begin by providing descriptive results on answer change patterns in the data.
3.1 Descriptive Results
Among the total of 3,900 examinees, 2,528 students changed their initial answers for at least 1 item, while 1,318 students kept their original answer for all items. For those who changed their initial answers, the average sum score of the original answers (with standard deviation in parenthesis) was 17.92 (7.33), while the average sum score of the final answers was 18.63 (7.66). The average gain in the sum scores was 0.70 (1.63). For those who kept their initial answers, the average sum score of their answers was 15.74 (7.13).
In addition, the observed counts and proportions of original and changed responses in the data can be found in Table 1. Based on the table, it is clear that only a small amount (about 1.7%) of answer changes were made in this assessment. There were more WR changes (about 1.2%) than RW changes (about 0.5%). Figure 2 depicts the average marginal proportions of WR and RW changes for each of the 43 items and 50 random students.
Observed Counts and Proportions (in Parentheses) of Original and Changed Responses
Note. W = incorrect; R = correct responses.

Average proportions of wrong to right and right to wrong changes across items (left) and persons (right).
The results suggest that the proportions of WR and RW changes were quite small and more WR changes were made than RW changes across items and across persons. The small proportion of answer change seems to suggest that applying a two-parameter model to Nodes 2 and 3 outcomes may be challenging. However, as shown in the simulation study (Section 3.3), obtaining relatively accurate and reliable discrimination parameter estimates was nonproblematic.
3.2 Estimated Results
Now we fit the proposed model (the most general case presented in Equation 10) to the data which assume that each of the three nodes involves a latent trait and a set of item parameters. The proposed model converged and yielded a log likelihood of −111,861.7; Akaike information criterion (AIC) of 224,245.5; and Bayesian information criterion (BIC) of 225,881.6 with 261 parameters. This model fit was clearly better than a model with two latent variables (Nodes 2 and 3 are combined; log likelihood of −112,164, AIC of 224,846.1, and BIC of 226,469.7 with 259 parameters) and a model with a unidimensional latent structure that assumes only a single latent variable for the three nodes (log likelihood of −112,612.7, AIC of 225,741.5, and BIC of 227,358.8 with 258 parameters). This result suggests that the three latent traits are indeed needed to represent the item response-and-change behavior in the data.
Based on the general model, the estimated correlations between the three latent variables are as follows:
The estimated correlation matrix shows that the latent trait to initially solve items correctly (
In addition, we found that allowing for a different set of item parameters per node led to a better fit than assuming equal item parameters across nodes (log likelihood of −118,512.9, AIC of 237,203.8, and BIC of 237,761.7 with 89 parameters). These results support our assumption that each tree node is associated with a different set of latent trait and item parameters. Figure 3 depicts the item parameter estimates for the three nodes. The estimated item slope parameters for Node 2 tend to be greater than those for Nodes 3 and 1, while the relationship between the Node 1 and Node 3 slope parameters appears unstructured. The estimated item intercept parameters for Node 3 tend to be greater than those for Nodes 1 and 2 in that order. These results tell us that the perceived item properties can be different when test takers revisit their earlier correct or incorrect answers. Specifically, items tend to be harder when initially wrong responses were corrected (Node 2) and easier when initially correct responses were maintained (Node 3) than when the items were initially solved (Node 1).

Estimates of the item slope (α) parameters (left) and the item intercept (β) parameters (right) for Nodes 1, 2, and 3, from the proposed model.
The standard errors for the item parameter estimates tend to be smaller for the initial responses (Node 1) than for the final responses (Nodes 2 and 3). Specifically, for the item slope estimates, the standard errors ranged from .03 to .07 for Node 1, from .13 to .28 for Node 2, and from .12 to .27 for Node 3. For the item intercept parameter estimates, the standard errors ranged from .03 to .06 for Node 1, from .14 to .45 for Node 2, and from .22 to .89 for Node 3. The smaller standard errors for Node 1 can be explained by the (by definition) larger number of observations (see matrix 4).
In summary, for Node 2, the items seem to be more difficult (based on the intercept estimates) and also more informative for the underlying Node 2 latent trait than they are for the underlying Node 3 latent trait (based on the discrimination estimates). This result suggests that answer changes from incorrect to correct (i.e., Node 2 outcomes) may provide relatively more interesting information.
3.3 Simulated Results
A simulation study was conducted to validate the utilization of our proposed model for the empirical analysis of answer change data. We first generated 100 data sets based on the proposed model with the parameter estimates reported in Section 3.2 as data generating values (with N = 3,900, I = 43). Figure 4 summarizes the item parameters’ estimated bias for each of the three nodes. Each point in the figure represents the bias for the corresponding parameter.

Estimated bias for the slope (α) parameters (left) and the intercept (β) parameters (right) for the proposed model.
The result suggests that the proposed model shows generally good parameter recovery. For the α parameters, the bias ranged from −.010 to .010 for Node 1, from −.044 to .026 for Node 2, and from −.072 to .056 for Node 3. For the β parameters, the bias ranged from −.008 to .009 for Node 1, from −.012 to .084 for Node 2, and from −.189 (one of the two outliers) to .005 for Node 3. For the covariance parameters (three nonredundant off-diagonal elements of the covariance matrix), the bias was estimated as .009, .004, and −.036 for the (2, 1)th, (3, 1)th, and (3, 2)th (row-column) elements, respectively. Although there seems a systematic bias for the β parameters, the size of that bias is minimal.
To further validate the use of our proposed model for the empirical data, we evaluated whether the simulated data, generated based on the estimated parameter values and latent trait estimates from the proposed model, can accurately recover some key answer change characteristics of the original data. Specifically, we evaluated the characteristics of the simulated data in terms of (1) the mean (standard deviation) of the sum scores of the initial and final answers for answer keepers, (2) the mean (standard deviation) of the sum scores of the initial answers for answer keepers, and (3) the mean (standard deviation) of the gain scores for answer changes. The original data’s answer change features were reported in Section 3.1.
Table 2 confirms that the simulated data satisfactorily mimic the main answer change features of the original data, suggesting the proposed model is a reasonable representation of the analyzed answer change data. Note that the standard deviations (but not the means) of the sum scores are slightly underestimated, which can be expected based on a small shrinkage of the latent trait estimates.
Comparison of Answer Change Features Between the Original Data and the Simulated Data (Generated Based on the Proposed Model)
aFor the simulated data, the reported values are the means over 100 simulated data sets.
3.4 Directional Test Information
Next we evaluate the proposed model’s measurement precision for the three latent traits using item and test information. Information can be defined as an indicator of the degree to which the reported score from a test (or item) differentiates real differences in the latent trait of interest (Lord, 1980; Reckase, 2009). For a unidimensional two-parameter model, the item information can be expressed as a function of θ as follows:
where
For multidimensional models, we can define the item information matrix, following Segall (1996), with kth diagonal elements
and (kth, lth) off-diagonal elements
Here we focus on the diagonal elements in Equation 15 and express them using directional derivatives, following Reckase (2009, p. 121):
where ∇ν denotes the directional derivative in the direction of the vector of angles (
where αik is the slope parameter for item i in dimension k. The partial derivative with respect to θk is equivalent to the derivative to the direction of the kth axis, in which we have
and the directional test-level information can be defined as a sum over item specific directional information as follows:
In effect, Equation 20 can be seen as the kth diagonal element of the Fisher information matrix defined in Equation 15 (which is the reciprocal of the variance of maximum likelihood estimate of θk).
We plot a unidimensional slice of the test information function (20) for each

Unidimensional slice curves of the information surface as a function of
The figure suggests that each latent trait shows maximum information at somewhat different locations of the respective latent score continuum. For
3.5 Two Potential Uses of the Proposed Model
We apply the proposed model to compute the answer change benefits and to detect irregular answer change patterns. We compare the results from the general model (with three node-specific latent traits) with the simpler model that assumes a single latent trait (but a different set of item parameters) across nodes.
3.5.1 Computing answer change benefits
Based on the general model, the expected benefit of answer change can be defined by using expected scores for the initial responses
Then the expected benefit of answer change
This item-level expected benefit can be plotted as a surface function in a three-dimensional

Item-level expected benefits as a function of
In the first subplot for
When the local independence assumption holds (a common assumption for IRT models), the test-level expected benefits

Test-level expected benefits as a function of
The test-level expected benefits show a similar pattern to that of the item-level expected benefits. Observe that a number of students do derive a negative benefit of answer change across large ranges of
We compute the estimated answer change gains by plugging the estimated latent scores

Test-level estimated benefits from the proposed model (proposed) and the simpler model with a single latent trait (simple).
The estimated benefits
3.5.2 Detecting irregular answer change patterns
van der Linden and Jeon (2012) outlined a statistical procedure for detecting potential academic cheating. Specifically, an unusually large number of WR answer change can be identified comparing a theoretical distribution of a WR change with the actual occurrence of WR changes. The theoretical distribution of WR can be constructed as independent Bernoulli trials with a probability
where α is the predefined significance level (e.g., α = .001, .01, .05). The critical value
We applied this statistical test described above to detect irregular WR change patterns. Based on the general model, no one was flagged as possibly cheating at three different significance levels (α = .001, .01, .05). However, based on the simpler model assuming a single latent variable, we found that several students were flagged (77, 46, and 23 students at α = .05, .01, .001, respectively). This result is not unexpected, given that a WR-specific latent variable
From a cheating detection perspective,
This analysis shows that the identification of cheaters depends on the model that is used and also on the interpretation of a model component (in this case, the Node 2 latent variable). The former issue (model dependence) underlines the importance of model validity and the second issue (model interpretation) illustrates that one cannot automatically jump from a statistical result to a conclusion regarding the intention of a test taker.
4. Discussion
Examinees’ answer change behavior has intrigued educational researchers for quite some time. In recent years, the increased utilization of computerized adaptive testing, which provides easier access to answer change data, has motivated researchers to undertake a more rigorous analysis on answer change.
In this article, we proposed a novel statistical approach for modeling the examinees’ answer change process. The proposed approach was based on a generalized item response tree model, utilizing a tree structure to represent a sequence of activities that occur during a test taker’s answer change process. Specifically, the process of the initial answer is modeled as the top node in the tree, the process of changing (or not changing) one’s answer is represented with the intermediate nodes, and the outcomes of the sequential process are the leaves (or terminal nodes) of the tree. Each node is allowed to have its own set of item parameters and its own latent trait.
The results of an empirical application showed that each tree node involved a different set of item parameters and latent trait, confirming that the decision made at each node was governed by a distinct, decision-making process. For instance, the difficulty level of items was generally higher when initially wrong answers were revised to a correct answer (Node 2) than when initially correct answers were maintained (Node 3). The fact that the item parameter estimates differ depending on the node suggests that the item properties can be perceived differently when the initial responses and the final responses (based on the initial responses) are produced. This has an important implication for the practice of item analysis; that is, the standard assumption of measurement invariance may not hold (within subjects) when answer change is possible and actually occurs during the test.
In addition, we found that the latent trait to correct initially wrong answers (Node 2) and the latent trait to correctly maintain initially right answers (Node 3) were only moderately correlated with the latent trait underlying the initial answers (with correlation of .57 and .35, respectively). The Nodes 2 and 3 latent traits were also moderately correlated with each other (correlation of .49). The Nodes 2 and 3 latent traits are the “change” traits that are involved in making changes to initial answers, while the Node 1 latent variable is the regular ability trait that underlies the initial answers.
It is noteworthy that existing methods that have been utilized for studying answer change have neglected the possibility that multiple latent traits can be involved if the possibility of answer change exists. By utilizing a tree approach that can model a sequential decision-making process, we were able to differentiate the change traits from the initial answer latent trait. Commonly, answer changes are ignored and the final responses are used to estimate the ability. Our results show that in the presence of answer change the final responses reflect multiple latent traits, and without the possibility of answer change a somewhat different ability is being measured. We also found that the ability underlying the initial answers is positively correlated with the ability to correct one’s incorrect responses and the ability to stay with one’s correct answers, while the latter two abilities are different from the former and codetermine the final response. This differentiation is a unique and highly relevant finding that has not been discussed previously in the literature. The correction ability and the ability of not being distracted by the possibility to change one’s initial answer are potentially broader abilities that may play a role in other situations as well. For example, being reflective about one’s initial understanding and a willingness to give a second thought to that earlier understanding may tell us about some aspects of students’ learning styles. This is of course only a hypothesis that can be further investigated.
Note that test takers’ answer change behavior has typically been ignored in analyzing item response data, although answer change is a natural phenomenon that commonly occurs in testing situations. Our study showed that focusing on the ability estimated from item response data (without considering answer change) may prevent us from finding interesting other sources of ability-related information contained in the answer change latent traits.
Another contribution of the present study to educational measurement is to provide a unique application of the item response tree approach (e.g., Boeckenholt, 2012, 2013; De Boeck & Partchev, 2012; Jeon & De Boeck, 2015). By taking advantage of the flexibility of the generalized item response tree modeling framework, we can extend the proposed model to deal with more complex answer change scenarios. For instance, it is possible to take into account multiple answer changes by including extra nodes for such items. In this case, depending on how many changes are made to each item, the size (or the number of nodes) of a tree will vary across items. We can also extend the model for polytomous item responses by allowing for the top node to have as many branches as the number of response options. The node outcomes can then be represented with a polytomous item response model, such as a partial credit model, a graded response model, or a nominal response model. Furthermore, person-level covariates can be incorporated in the tree model in order to test whether latent trait levels can be explained with relevant person characteristics (e.g., gender, family background, cognitive styles, confidence). For instance, the ability to correctly revise initial wrong answers (latent trait for Node 2) may depend on students’ field dependency, while the ability to keep initially correct answers (latent trait for Node 3) can be explained by their confidence level.
Finally, although the proposed approach was illustrated within the context of educational testing, the model can also be applied to explain answer change behavior in psychological and behavior assessments more generally. For instance, in answering sensitive survey items, some people try to revise their earlier answers to present themselves more positively. The proposed model can be applied to capture subjects’ answer change behavior in relation to their social desirability tendency.
