Abstract
In this article, the authors describe how multiple indicators multiple cause (MIMIC) models for studying uniform and nonuniform differential item functioning (DIF) can be conceptualized as mediation and moderated mediation models. Conceptualizing DIF within the context of a moderated mediation model helps to understand DIF as the effect of some variable on measurements that is not accounted for by the latent variable of interest. In addition, useful concepts and ideas from the mediation and moderation literature can be applied to DIF analysis: (a) improving the understanding of uniform and nonuniform DIF as direct effects and interactions, (b) understanding the implication of indirect effects in DIF analysis, (c) clarifying the interpretation of the “uniform DIF parameter” in the presence of nonuniform DIF, and (d) probing interactions and using the concept of “conditional effects” to better understand the patterns of DIF across the range of the latent variable.
Introduction
One of the primary aims of measurement research is to develop and identify valid sets of items that measure a specific latent variable. Much research in measurement, particularly within item response theory (IRT), focuses on differential item functioning (DIF) as it can be a major threat to the validity of a scale or set of items. DIF occurs when the probability of a response on a specific item is dependent on some external factor even after conditioning on the latent trait. For instance, it is crucial in educational assessment that a mathematics test is equally valid for all students taking the test. This means that controlling for a student’s math ability, there should be no other factors, which increase or decrease the probability of getting an item correct.
Zumbo (2007) deemed the “Third Generation” of DIF research to be an era where the authors investigate and aim to understand why DIF is occurring rather than just detect or correcting for it. One of the primary aims of this article is to provide a more intuitive understanding of uniform and nonuniform DIF using a multiple indicators multiple cause (MIMIC) model and the concepts of indirect effects, direct effects, and interactions commonly used in mediation and moderation. The authors believe that this type of framework may help researchers to understand DIF in a way that would facilitate thinking about why or how DIF occurs, by focusing on DIF as a process rather than a nuisance.
Uniform and Nonuniform DIF
DIF can occur in two primary ways: uniform and nonuniform DIF (Mellenbergh, 1982). Figure 1 gives four examples of DIF. Early definitions of uniform and nonuniform DIF emphasized the mutually exclusive nature of these two types of DIF (Ackerman, 1992; Mellenbergh, 1982; Millsap & Everson, 1993; Narayanan & Swaminathan, 1996). For example, Ackerman (1992) defined uniform DIF as when the “ICCs for the different groups differ by only a horizontal translation (i.e., they are parallel but not coincident).” Alternatively, nonuniform DIF is when the ICCs are nonparallel. Using these definitions of uniform and nonuniform DIF, Panel a in Figure 1 is the only item, which would fit the definition of uniform DIF, and all other panels would have nonuniform DIF, because the ICCs are not parallel. The authors rely on definitions of uniform and nonuniform DIF such that they are mutually exclusive. In addition, the authors describe how to interpret the parameter which in models without nonuniform DIF is used to describe uniform DIF, in the presence of nonuniform DIF. This parameter is what differentiates Panels b, c, and d in Figure 1.

Item characteristic curves when (a)
MIMIC Models for DIF Testing
A number of methods have been developed for detecting and investigating DIF including Mantel–Haenszel tests (Holland & Thayer, 1988), multidimensional approaches (e.g., SIBTESTs; Shealy & Stout, 1993), area methods (Raju, 1988, 1990), logistic regression (Swaminathan & Rogers, 1990), multiple group IRT and structural equation models (Jöreskog, 1971), and MIMIC models (B. O. Muthèn, 1985, 1989; B. O. Muthèn, Kao, & Burstein, 1991). For reviews of these methods, see Magis, Bèland, Tuerlinckx, and De Boeck (2010) and Zumbo (2007). In this article, the focus will be on the method of using MIMIC models for studying DIF. MIMIC models are characterized by the inclusion of independent variables (or causes,

An example of MIMIC model with three observed indicators (
MIMIC models were first used to study DIF by B. O. Muthèn (1985), who showed that a typical two-parameter IRT model assumes no direct path between an independent variable (cause) and an individual item response (indicator). A path between an independent variable and an item can be included in a MIMIC model. If this direct path is found to be statistically significant, this suggests that for two people matched on the latent variable, the item is easier for one of those people. This is an indication of uniform DIF.
The accuracy of MIMIC models for detecting DIF has been investigated by Woods (2009) and Woods and Grimm (2011) and others. Researchers have also compared MIMIC methods to other DIF detection methods such as the Mantel–Haenszel test, SIBTEST, and likelihood-ratio tests based on multiple group IRT models (Finch, 2005; Woods, 2009; Woods and Grimm, 2011). Simulation studies have found that the MIMIC method works similarly to other tests, but requires smaller sample sizes than other methods to detect uniform DIF (Woods, 2009).
The MIMIC approach is not immune to many of the issues common to most DIF testing techniques. Scale purification is a procedure for detecting which items among a set of items have DIF (Lord, 1980). This process is incredibly important, as in scale development researchers often have a set of items and they want to detect which items within the set have DIF, rather than focusing on a single item. Wang, Shih, and Yang (2009) studied the performance of an iterative purification method with MIMIC models and found this method worked well for identifying items with DIF. However, like many other scale purification methods, when there are too many items with DIF, the scale purification methods have Type I error rates that are too high to be acceptable (Shih & Wang, 2009; Wang et al., 2009). Shih and Wang (2009) proposed a method for identifying a short anchor (a small set of DIF free items). This method seems to perform better than iterative purification methods when the proportion of items with DIF is large. Indeed only one DIF free item was needed to control Type I error; however, longer anchors increased power to detect DIF. In addition, Shih and Wang (2009) showed how a slight alteration of the iterative methods proposed by Wang et al. (2009) could be used to identify anchor items (rather than identifying DIF items). In comparison with other DIF detection methods, MIMIC methods are less susceptible to DIF in anchor items (Finch, 2005).
Both MIMIC models and multidimentional IRT models can be used to relax some of the more stringent assumptions of traditional IRT models, such as unidimensionality (Lee, Bulut, & Suh, 2016; Wang et al., 2009; Zumbo, 2007). In a MIMIC model, by allowing individual items to load on multiple latent variables, multidimensionality is allowed, while testing for DIF. There are many instances where researchers attribute DIF to multidimensionality. Cheng, Shao, and Lathrop (2016) proposed a method for understanding DIF using multidimensional MIMIC models. If there are indicators of a latent construct, which the researchers believe to be causing DIF, a MIMIC model can be estimated including that latent construct and testing whether there is DIF above and beyond this additional latent construct. Alternatively, if there is no remaining DIF this would support the claim that the additional latent construct explains all (or most of) the DIF (Cheng et al., 2016). The Mantel–Haenszel test and SIBTEST are not appropriate for multidimensional modeling because they are designed for only one latent trait (Bulut & Suh, 2017); however, a multidimensional SIBTEST has been developed (Stout, Li, Nandakumar, & Bolt, 1997).
MIMIC models are can be used for continuous, categorical, or mixes of continuous and categorical outcomes. Throughout this article, the authors provide examples and equations for dichotomous outcomes only. Because MIMIC models are estimated in a structural equation modeling framework, model fit indices are available for these models. However, many model fit statistics are only valid for models with continuous outcomes (e.g., root mean square error approximation (RMSEA), standardized root mean square residual (SRMR), comparative fit index (CFI), Tucker–Lewis index (TLI); Yun, 2002). Other measures such as information criteria can be used to compare models with dichotomous outcomes (Kang & Cohen, 2007).
Aims and Structure
In DIF studies, researchers are often concerned about what characteristics of an individual (e.g., gender, race, socioeconomic status) might lead to uniform or nonuniform DIF. However, applied researchers seem to pay little attention to models of how independent variables affect measurement and do not include DIF as part of their model. By viewing MIMIC models as mediation models, it becomes clear that there are multiple ways an independent variable can affect an item response, not all of which undesirable.
The purpose of this article is to show that MIMIC models for uniform and nonuniform DIF analysis can be conceptualized as mediation and moderated mediation models. Mediation and moderation models provide a framework for understanding the process through which some effect occurs and for modeling contingencies in those processes. An advantage of understanding DIF in the context of mediation and moderation analysis is that it provides us with an opportunity to investigate the mechanisms through which independent variables influence measurements. What the authors find is that DIF is one of these processes. In addition, conceptualizing within a mediation and moderation framework allows us to apply useful concepts and ideas from the mediation and moderation literature to improve the understanding of DIF. This includes (a) appreciating uniform and nonuniform DIF as direct effects and interactions, (b) understanding the implication of indirect effects in DIF analysis, (c) revising a conventional interpretation of the uniform DIF parameter in the presence of nonuniform DIF, and (d) using the concept of probing interactions to better understand the patterns of DIF.
The remainder of this article is structured as follows: In “MIMIC DIF Models as Mediation and Moderated Mediation Models”, it is shown that how MIMIC models for uniform and nonuniform DIF analysis can be understood within a mediation and moderation framework. In the subsequent section (“Applying Ideas of Moderation and Mediation to DIF”), how some important concepts and ideas from the mediation and moderation literature can be applied to improve understanding of DIF will be discussed in detail. Throughout “MIMIC DIF Models as Mediation and Moderated Mediation Models” and “Applying Ideas of Moderation and Mediation to DIF” sections, the analysis of a data set to show how MIMIC DIF analysis can be done and to provide a concrete example for the advantages of using mediation and moderation ideas to interpret DIF are discussed. The article will then end in “Discussion” section with some concluding remarks.
MIMIC DIF Models as Mediation and Moderated Mediation Models
To illustrate how MIMIC DIF models can be understood within a mediation and moderation framework, the authors use a data set, which explored cohort differences in intelligence testing on samples of children, aged 12 years to 14 years, from Estonia (Must & Must, 2014).
1
The first cohort was collected during 1933 to 1936 (
Estonian National Intelligence Test: Arithmetic.
Original items are from Haggerty, Terman, Thorndike, Whipple, and Yerkes (1921) Scale A, Form 2, Edition 2. Items were translated to English for ease of understanding in this article. Items were administered in Estonian.
For 10 of the 16 items, the 2006 cohort performed better than the 1933/1936 cohort. The exceptions where the 1933/1936 cohort outperformed the 2006 cohort include Items 5, 9, 10, 11, 12, and 13. These data, here, are used to explore cohort differences in latent arithmetic ability as well as the potential for DIF. There are a variety of counterexplanations for why the later cohort outperformed the earlier cohort, and these are discussed in Must and Must (2013). These analyses are not to be taken as novel theoretical findings, but rather an example used for showing how to conduct DIF analysis. Two items were selected to demonstrate how to test for DIF with MIMIC models: Items 5 and 10. These items were selected because they show interesting patterns of DIF, as will be seen in later sections. The focus of this article is on the estimation of DIF within a single item; however, when these analyses are done in empirical data, the authors recommend researchers should use the methods proposed by Shih and Wang (2009) or Wang et al. (2009) as described in “MIMIC Models for DIF Testing” section. Alternatively, researchers can use substantive knowledge to inform which items to explore for DIF. The authors did not conduct any scale purification for these analyses. Data analysis was done using Mplus Version 8.1 (L. K. Muthèn & Muthèn, 1998-2011). For this analysis, maximum likelihood estimation with robust standard error estimates was used. The model was unidimensional, assuming that all items loaded onto a single latent variable, which the authors call arithmetic ability. Select input files are included in the appendices to aid implementation.
Mediation Model for Uniform DIF
In the example the authors are concerned with whether there are cohort differences on the probability of correctly answering an item. One reason that there may be differences in the probability of responding is differences in “true” latent ability. However, there may be concern that for a specific item there are cohort differences in the probability of responding that are not attributable to “true” differences in latent ability. For this reason, the authors add a direct path between the cohort variable and the item response, as represented in Figure 3a.

(a) A MIMIC model for uniform DIF as a mediation model and (b) a MIMIC model for nonuniform DIF as a moderated mediation model.
For notation, the authors use
First, the model for Path a is specified with the latent trait
where the regression coefficient
Estimating Equation 1 with the arithmetic data and allowing Item 5 to have uniform DIF provides an estimate of
Second, a model for Paths b and c,
where
Estimating Equation 2 for Item 5,
The influence of
A similar analysis can be conducted on Item 10, which is another item where the 1933/1936 cohort performed better than the 2006 cohort. The results showed that
By plugging in Equation 1 as
Equation 3 shows how the effect of
Uniform DIF examples for Item 5 and Item 10 can be used to generate estimates of indirect effects. The estimated indirect effect of cohort on Item 5 through latent arithmetic ability is
Moderated Mediation Model for Nonuniform DIF
It may be possible that some items on the questionnaire provide more information about the latent abilities of the individuals in one of the cohorts compared with the other. This means that the items’ ability to discriminate among people of different latent abilities may depend on which cohort those people come from. For instance, as a latent ability increases, a child’s probability of correctly answering the item might increase faster if that child is in the 1933/1936 cohort compared with the 2006 cohort. This type of effect is described as nonuniform DIF. To test this hypothesis, the authors allow the path between
This revised MIMIC model is also a moderated mediation model. Modifying the regression model for Paths b and c by including the interaction between
Note that Equation 4 includes the interaction between
Estimating Equations 1 and 4 for Item 5, an estimate of
As will be discussed more in depth in “Probing Conditional Effects” section, the coefficient for
The authors conducted the same analysis, allowing for nonuniform DIF for Item 10 only. The estimated cohort difference on the latent trait is 0.09 (
Just as the authors did with the models for uniform DIF, Equation 5 can be combined with Equation 1 to get information about indirect and direct effects,
Equation 7 is equivalent to a moderated mediation model that shows that the indirect effect of
For instance, in the arithmetic example with Item 5, the indirect effect is
Applying Ideas of Moderation and Mediation to DIF
In “MIMIC DIF Models as Mediation and Moderated Mediation Models” section, the authors discussed that MIMIC models for studying uniform or nonuniform DIF can be viewed as mediation or moderated mediation models, respectively. Specifically, the uniform DIF parameter
In this section, how some of the additional ideas from the mediation and moderation literature can improve the understanding of DIF will be discussed.
Null Direct Effects
In mediation analysis, a null direct effect implies that the cumulative effects of the other ways (not through the mediator) in which an independent variable (
Similarly, a zero uniform DIF parameter value (
Indirect Effects
An indirect effect in Figure 3a (Paths a and b) represents the effect of the independent variable (
It is discussed in “Mediation Model for Uniform DIF” section that the effect of
Both in the presence and absence of a direct effect (or uniform DIF), it may be worthwhile to examine and understand the indirect effect. Consider the uniform DIF MIMIC model. For the sake of simplification,
Suppose
Consider now if
If
Finally, consider the case where
Consider the examples of DIF analysis for Items 5 and 10. Each item had marginal group differences on the item response probabilities that deviated from the rest of the items. However, it is seen that Item 5 suffered from uniform DIF, but Item 10 did not. These items seemed similar when examining marginal group differences on item response probabilities. However, the mechanism by which those marginal group differences arose are quite different.
The above exercise clearly explains why it is important to take into account potential group differences in the latent trait
Symmetry of Moderation
The model expressed in Equation 6 allows the effect of
Note that Equation 8 shows that the effect of
Equation 9 can be useful for conceptualizing nonuniform DIF. The coefficient for
An important implication is that in the presence of nonuniform DIF (or with a significant
Probing Conditional Effects
The authors have discussed that nonuniform DIF can be understood as an interaction between the latent variable of interest and an external grouping factor in predicting the probability of response on a given item indicated by the coefficient
In moderation analysis, once a significant moderation effect is found, researchers often apply probing methods (e.g., Hayes & Matthes, 2009; Spiller, Fitzimons, Lynch, & McClelland, 2013). These methods can be used to understand where the group-specific item characteristic curves cross. Specifically, it can be solved for the point along
It is useful to know the point of
Discussion
In this article, the authors have discussed how the MIMIC models for studying uniform and nonuniform DIF can be conceptualized within the mediation and moderation framework. Specifically, it is shown how estimating and testing a direct effect in a mediation model aligns with a test of uniform DIF and how a test of an interaction effect in a moderated mediation model aligns with a test of nonuniform DIF. A benefit of conceptualizing DIF within the mediation and moderation framework is that useful ideas and methods from mediation and moderation literature can be applied to improve the understanding of DIF. Typical DIF studies are often exploratory and researchers who wish to study DIF tend to act as if they have no prior information about what type of DIF they might expect. However, understanding DIF in the mediation and moderation context may help researchers apply their substantive knowledge in such a way as to develop more directed hypotheses about DIF for particular items. By developing specific hypotheses about DIF, researchers can transition into a more confirmatory study of DIF.
An additional benefit of conceptualizing DIF within the mediation and moderation framework is that it is clarified that the coefficient
Throughout this article, authors have discussed the MIMIC models for exploring DIF, however, the ideas in this article generalize to other models as well. The MIMIC model is equivalent to the two-parameter logistic model (MacIntosh & Hashim, 2003; B. O. Muthèn et al., 1991). When a one parameter logistic model is desired, the MIMIC model can be adjusted so the relationship between
Throughout this article, the analyses were described using a general link function, but for the data example a logistic link function was used. When a probit link is used, a normal-ogive MIMIC model can be constructed, which is equivalent to a normal-ogive logistic model. B. O. Muthèn (1985) showed how to derive the parameters for the normal-ogive IRT model from a normal-ogive MIMIC model. MacIntosh and Hashim (2003) followed up on this work deriving the standard errors for the parameters for the normal-ogive IRT model from the standard errors in the normal-ogive MIMIC model. The MIMIC model can be used for either logistic or normal-ogive link functions.
In “MIMIC Models for DIF Testing” section, authors discussed previous research on selecting anchor items and scale purification using the MIMIC model. These methods, described by Wang et al. (2009) and Shih and Wang (2009) have only investigated detection methods for uniform DIF. The nonuniform DIF MIMIC model is slightly more recent (Woods & Grimm, 2011), and no research has yet to explore how best to go about selecting anchor items or conducting scale purification using MIMIC models for nonuniform DIF. Future research should examine how best to do this type of analysis, particularly when some items have uniform DIF and others have nonuniform DIF.
The authors are not the first to discuss some of the connections between mediation analysis and DIF. Cheng et al. (2016) proposed applying mediation analysis to understand how uniform DIF occurs. Specifically, Cheng et al. (2016) proposed to introduce additional mediators in a MIMIC model to explain the process of uniform DIF; however, the authors did not acknowledge that a MIMIC model could already be seen as a mediation model as discussed in this article. Adding extra mediators in a MIMIC model for uniform DIF as suggested by Cheng et al. (2016) would result in extending a single mediator model to a multiple mediator model in the framework. The additional mediators can be used to test whether the uniform DIF (or a direct effect of
An additional benefit of MIMIC models is that multiple independent variables can be included in the model, as was briefly discussed in “Null Direct Effects” section. Including multiple indepedent variables would allow the researcher to estimate indirect and direct effects for each independent variable. However, it is important to remember that indirect and direct effects are scaled by the independent variable (i.e., they are interpreting with respect to a one unit change in the covariate). This means that direct effects through different independent variables may not be directly comparable in terms of magnitude.
Finally, the authors would like to mention that though conceptualizing uniform and nonuniform DIF MIMIC models within a mediation and moderation framework can be very useful, they caution against using this framework to make causal inferences without thoroughly investigating the assumptions needed to do so. There is a growing literature on how to make valid causal inferences, which may be additionally complicated by the use of a latent variable. Researchers interested in making causal inferences should consult the literature on causal mediation analysis (e.g., Coffman, MacKinnon, Zhu, & Ghosh, 2016; Imai, Kelle, & Tingley, 2010; Robins & Greenland, 1992).
Footnotes
Appendix A
Appendix B
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant DGE-1343012.
