Abstract
Dual item response theory (IRT) models in which items and individuals have different amounts of measurement error have been proposed in the literature. Any developments in these models, however, are feasible only for continuous responses. This article discusses a comprehensive dual modeling approach, based on underlying latent response variables, from which specific models for continuous, graded, and binary responses are obtained. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are discussed for all the resulting models. Simulation results suggest that the proposal is quite feasible. A practical illustration is given with an empirical example in the personality domain.
Keywords
In the psychometric models commonly used in typical-response (personality and attitude) measurement, such as linear factor analysis (FA), the graded-response model (GRM), and the two-parameter model (2PM), items are characterized by two types of parameter: location and discrimination. Individuals, however, are only characterized by one location parameter (position on the trait continuum). Theory and evidence, however, suggests that this modeling is insufficient (Tellegen, 1988). Just as items generally differ in their sensitivity at differentiating between individuals with different trait levels, individuals also generally differ in the sensitivity of their responses to the different item locations. Some respondents are largely insensitive, and their response patterns are almost random. At the opposite extreme, some individuals respond with high consistency, leading to response patterns that approach Guttman patterns (Ferrando, 2004, 2013; Fiske, 1968). If this scenario is accepted, then a “dual” modeling (see Fiske, 1968) in which both items and persons differ in terms of discriminating power seems to be the most plausible approach to fitting typical responses.
Dual models of the type discussed above have been discussed in the literature since the 1940s (Mosier, 1942), although the purposes of these discussions and the terminology used are often quite different (see, for example, Ferrando, 2004). A review, however, suggests that these models can be divided into two main families. The models in the first family are Thurstonian (TMs), which model individual discrimination (or individual error) as random fluctuation around a central trait level (Ferrando, 2004, 2007, 2009; Levine & Rubin, 1979; Lumsden, 1980). Models in the second family are multiplicative models (MMs), which model individual discrimination as a person slope that functions multiplicatively with the item slope (Ferrando, 2014, 2016; Lubbe & Schuster, 2016, 2017; Strandmark & Linn, 1987).
Both TMs and MMs were initially considered only for binary responses, and, in this format, both lead to very similar outcomes and interpretations. Extension to more continuous formats, however, is more complex. Although the person discrimination parameter in TMs has the same interpretation in any format (as discussed below), the person slope in the MMs can also be thought to model individual differences in response scale usage (Ferrando, 2014) or proneness to extreme responding (Lubbe & Schuster, 2016, 2017) in the case of continuous or graded formats. In this sense, MMs are less specific and more difficult to interpret than TMs (e.g., van der Maas, Molenaar, Maris, Kievit, & Borsboom, 2011).
From an applied point of view, there are feasible procedures for fitting dual MMs for binary (Ferrando, 2016), graded (Lubbe & Schuster, 2017), and continuous (Ferrando, 2014; Lubbe & Schuster, 2016) responses. However, to date this is not the case for dual TMs, for which a full feasible procedure has only been proposed for continuous responses (Ferrando, 2013). Dual TMs for binary and graded responses have been considered intractable in practice (Lumsden, 1980; Torgerson, 1958), and only restricted versions in which item discrimination is constant appear to exist at present (Ferrando, 2004, 2007, 2009).
The main purpose of this article is to propose a comprehensive, item response theory (IRT)-based, dual TM approach that can be used with binary, graded, and continuous typical-response items. The resulting specific models are denoted as DTCRM (continuous responses; the already existing model), DTGRM (graded responses), and DTBRM (binary responses). For the DTGRM and DTBRM, the practical limitations mentioned above are overcome by using an underlying variables approach (UVA, Muthén, 1984), which makes the processes of fitting and scoring these models quite feasible in practice. So, the present proposal is mainly applied, and practical procedures are proposed for (a) calibrating the items and assessing model-data fit and appropriateness, (b) estimating the person parameters (scoring), and (c) assessing the precision with which the individual parameters are estimated. To the best of the author’s knowledge, the UVA-based developments that lead to the DTGRM and DTBRM are new contributions, and so are the specific procedures that are proposed for calibrating the items and scoring the individuals (although they are indeed specific applications of more general, well-known procedures). Finally, the analytical results concerning the precision of the individual parameter estimates also seem to be new.
The DTCRM: A Review
Consider a test made up of
where Ti is the momentary trait (or perceived trait) value of respondent i when answering item j, and bj is the momentary (perceived) location of item j on the trait continuum.
For a given respondent i, consider first the distribution of Ti over the test items. This distribution is assumed to be normal with mean θ
i
and variance
From the conditions discussed so far, it follows that the conditional distribution of Xj for fixed θ
i
and
The expressions in Equation 3 can be interpreted, respectively, as the expected mean and variance of Xj across all respondents with person location θ
i
and PDD
The conditional expected score in Equation 3 is a direct function of the weighted person-item distance λ j (θ i –β j ). When θ i > β j , the expected score is above the 0.5 response scale midpoint (i.e., 0.5), and when the person location matches the item location, the expected item score is the midpoint. So, as proposed above, β j can be interpreted as a standard IRT difficulty index: It is the point on the trait continuum that marks the transition from the tendency to disagree/not endorse the item to the tendency to agree/endorse it.
At this point, it might be of interest to compare the expectations in Equation 3 to the expectation derived from the linear MM for continuous responses (Ferrando, 2014; Lubbe & Schuster, 2016).
As in the DTCRM, the expected response in the MM Equation 4 is also a direct function of the person-item distance. However, the person parameter ξ i in Equation 4 (assumed to be positive) acts as a moderator that amplifies or reduces the impact of this distance on the expected item response. So, for large ξ i , a small positive distance leads to an expected response that goes toward the upper end of the item response scale. This functioning was initially intended to model individual discrimination (in terms of sensitivity to the person-item distance). However, it might also well reflect idiosyncratic responding (proneness to extreme responding) or lack of cognitive effort. In contrast, in the DTCRM modeling, the PDD does not affect the extremeness of the expected response but does affect its consistency. Thus, when both PDD and IDD are small, so is the conditional variance in Equation 3, which means that the observed score is close to the expected score.
By recalling now that the marginal mean and variance of θ are assumed to be 0 and 1, respectively, it follows that the marginal mean and variance of Xj over the entire population of respondents are
And the product-moment correlation between the scores on items j and k is
where
is the correlation between the scores on item j and θ (standardized loading in FA terminology).
The simple linear model reviewed in this section assumes that Xij is bounded whereas Ti and bj are not. So, the model cannot be strictly correct and must necessarily be considered as an approximation. More in detail, the assumptions above imply that (a) the item response function is nonlinear rather than linear, and (b) the conditional distributions become more asymmetrical and with decreased variance toward the ends of the scale. In most practical applications, however, especially in personality measurement, the linear model as an approximation is expected to work reasonably well (see Ferrando, 2002, for a discussion).
The DTGRM and DTBRM
Consider now that the observed item score Xj is a categorical variable, and assume that (a) there is a latent response variable Yj that underlies Xj, and (b) the following model holds for Yj
where Ti and bj behave as in Equation 2. Equation 8 is the same model as Equation 1 without the midpoint intercept term and with the variance of Yj fixed to 1, which means that the scale parameter λ j is now a standardized loading α j (see Equation 7). This variance restriction is because, in contrast to Equation 1, the origin and scale for Yj are now undetermined. In the standard UVA modeling (e.g., Muthén, 1984), this indeterminacy is usually solved by assuming that the marginal distribution of Yj is normal with zero mean and unit variance. In the present modeling, the unit variance assumption has already been adopted. As for the normality assumption, the marginal distribution of Equation 8 for a fixed item is that of the sum of three independent variables (θ, ε, and ω; see Equation 2), of which θ and ε are normal, and ω follows a Pearson type-VII distribution (see Ferrando, 2007). For practical purposes, the resulting distribution is close enough to normal for this assumption also to be used. The mean of Yj, however, cannot be assumed to be generally zero. In effect, the marginal mean and variance are given by
And the product-moment correlation between the latent scores on items j and k is
where α j is the product-moment correlation between the latent scores on item j and the central trait level θ.
The relation between Yj and the observed score Xj is now assumed to be a step function governed by a threshold mechanism. The most usual scoring schemas for categorical variables are considered here: 0 and 1 for the binary case, and integer values 1, 2, . . . for the graded-response case. With this schema, the mechanism is
for the binary case, and
for the graded-response case with c response categories. From this modeling, it follows that the product-moment correlation between Yj and Yk is the tetrachoric correlation between Xj and Xk in the binary case, and the polychoric correlation between Xj and Xk in the graded-response case.
We turn now to the IRT modeling implied by the UVA described so far. In the DTBRM, the probability of endorsing item j for fixed θ
i
and
where Φ is the cumulative distribution function (CDF) of the standard normal distribution. To see the role of the person parameters in Equation 13, note that θ
i
determines which score (0 or 1) is the most probable for this respondent. As for the PDD, when
In the DTGRM, the probability of scoring k in item j for fixed θ
i
and
In Equation 14, the person location θ i determines the response category that has the greatest probability of being endorsed by respondent i. As for the role of PDD, consider a respondent whose person location is between δjk– 1 and δjk. As the PDD approaches zero, the probability of endorsing category k increases, whereas the probability of endorsing the remaining categories decreases. So, the process of responding becomes more deterministic. At the opposite extreme, as the PDD increases, the probability of responding in different categories becomes progressively more undifferentiated. This way of working contrasts with that of MM for graded responses (Lubbe & Schuster, 2017). As in the linear case, the person slope parameter in the MM modifies the expected response, so that low slope values imply that the response is more likely to lie in the middle categories whereas with large values it is more likely to be in the outer categories (see Lubbe & Schuster, 2017, for details). Again, this might reflect either person discrimination or idiosyncratic responding.
In the literature, the DTBRM in Equation 13 is Lumsden’s (1980)“Two-parameter 3 model,” which he considered to be the most general model intended for binary items. If the PDDs are equal for all respondents (i.e.,
Fitting the Dual Thurstonian Models (DTMs)
The general approach proposed for all the models considered is a conventional two-stage conditioned procedure (McDonald, 1982) with a first calibration stage in which the item parameters are estimated, and a second scoring stage in which estimates of the person locations and the PDDs are obtained for all the individuals. In addition, a multifaceted approach is proposed for assessing the appropriateness of the fitted model.
Item Calibration
The three models can be fitted by using a limited-information FA approach with additional identification restrictions. A unified approach is proposed in which items are calibrated by fitting the unidimensional FA model to the appropriate inter-item correlation matrix: Product-moment (DTCRM), tetrachoric (DTBRM), and polychoric (DTGRM). The basic approach is standard, so specific estimation procedures and discrepancy functions will not be discussed here, although some discussion is provided in the example.
The main estimates obtained by fitting the FA model are the standardized loadings α and the corresponding standardized residual variances. Now, for all three models, the following result is obtained (see Equations 7 and 9).
Equation 15 means that the inter-item correlation matrix does not contain sufficient information to separately identify the average PDD and the IDDs. In the dual MMs, this problem is settled by fixing the mean person slope to 1 (Ferrando, 2014). This constraint, however, cannot be used here, because all the IDDs must be greater than 0. So, the identification approach proposed in this case is based on the use of a marker variable. The “best” item (i.e., the item with the largest standardized loading) is chosen as a marker and so treated as if its IDD was zero. Then, relative to this scaling, the average PDD is estimated as
where
We turn now to the item location parameters. In the case of continuous scores, they can be estimated directly from the marginal means (see Equation 5). In the case of binary scores, conventional fitting of the 2PM using the UVA approach will provide estimates of the transformed location parameters δ j in Equation 13. However, β j cannot be identified separately from δ j because the origin of Yj is undetermined. Now, in principle, β j does not need to be identified separately to obtain individual PDD estimates at the scoring stage. However, it can be by assuming that items are categorized at a common threshold of 0 in Equation 11. This fixes the origin of Yj and provides a plausible interpretation (see Lubbe & Schuster, 2017): negative values of Yj lead to denial while positive values lead to item endorsement.
In the graded-response case, the β j s can be identified by extending the above rationale in the way proposed by Lubbe and Schuster (2017). If the number of categories is even, the middle threshold is fixed to 0 for all the items. If it is odd, the sum of the two central thresholds is fixed to 0. Again, β j does not need to be identified to obtain PDD estimates in the DTGRM. However, identification is useful for interpretative purposes because, as occurs with the DTBRM and the DTCRM, it also provides a single item location in the graded response case.
Scoring
In the original linear model, Ferrando (2013) proposed using maximum likelihood (ML) to estimate the person parameters. Experience suggests that ML estimation is generally feasible but also prone to giving some very large PDD estimates if the test is short or the item locations are not evenly distributed. This problem becomes worse in the DTGRM, and more so in the DTBRM.
To overcome the problem above, the approach proposed here is to use Bayes expected a posteriori (EAP, Bock & Mislevy, 1982) estimation for all the models considered. This procedure has two main advantages. First, it uses the mean PDD estimate obtained in the calibration stage to center the prior distribution. So, the “ensemble biases” (Mislevy, 1986) phenomenon of shrinkage toward an inappropriate central value is avoided. Second, it ensures that the person estimates (especially
From a modeling point of view, the most important issue in the EAP estimation process is the choice of the prior distributions. In our proposal, the prior for θ is set as standard normal by default, but estimated distributions via quadrature can also be used as input (Mislevy, 1986). As for the PDDs, they are variances, so their most appropriate prior is the scaled inverse χ2 distribution (Novick & Jackson, 1974). Because only the prior mean is estimated at the calibration stage, the prior variance for
For each individual, the output of the EAP procedure consists of the θ
i
and
Finally, an empirical marginal reliability estimate can be obtained by averaging the squared PSDs in the sample of N individuals (Brown & Croudace, 2015):
Provided that the PSDs remain relatively uniform, the marginal reliabilities in Equation 18 are representative of the overall precision of the estimates in the population of respondents.
Assessing Model Appropriateness
In all the models proposed here, calibration consists of fitting a unidimensional FA model. So, model-data fit and appropriateness at this level can be assessed by using standard procedures. Appropriate fit of the FA model, however, is necessary but not sufficient, because the DTMs cannot be distinguished from the corresponding normative (i.e., constant PDD) models in terms of their implied inter-item correlation matrices. So, further procedures are needed to decide whether the more flexible but also more parameterized dual TM provides a non-negligibly better fit than the corresponding model with constant PDD.
The approach proposed has been systematically used in previous related models, and is based on a likelihood ratio (LR) statistic. For a single respondent i, let
Statistic Λ i is a descriptive normed index with values in the range 0 to 1. Values close to 0 indicate that the dual TM provides a substantially better fit than the corresponding standard model. As for si, under very restrictive conditions, it could be considered to be a value randomly drawn from a χ2 distribution with one degree of freedom. And, by further assuming experimental independence between respondents, the sum Q = Σsi asymptotically approaches a χ2 distribution with N degrees of freedom (see Ferrando, 2013). However, for this being so, the likelihoods must be evaluated at their ML estimates whereas here they are evaluated at their EAP estimates. To sum up, Q has been proposed as the overall index for assessing whether the DTM fits better than its standard counterpart but acknowledge that it cannot be used as a strict inferential measure and that the reference distribution is at best only an approximate guide. The behavior of the statistic is assessed below via simulation.
Substantive and Practical Considerations
The DTMs are not only more flexible than their normative counterparts but also more complex and potentially prone to producing unstable parameter estimates. Therefore, the conditions in which the proposed models are appropriate and expected to work well in practice need to be discussed.
Analytical expressions for the θ
i
and
In comparative terms, the θ
i
estimates are expected to be substantially more reliable than the
We turn now to the potential advantages of using the DTMs. To start with, they provide additional information about the consistency of the respondent’s answering behavior via the PDD estimate. This information, in turn, can be of use in individual assessment or in exploratory person-fit research (see Conijn et al., 2016). Furthermore, it has been hypothesized that the PDD is related to the relevance and degree of clarity and strength with which the trait is internally organized in the individual (Traitedness; for example, Markus, 1977; Reise & Waller, 1993; Taylor, 1977; Tellegen, 1988). Evidence based on previous TM-based applications or related indices suggests that the PDD estimates can be effectively used to reflect traitedness (LaHuis, Barnes, Hakoyama, Blackmore, & Hartman, 2017; Reise & Waller, 1993).
As for the role of PDDs in individual assessment, the accuracy with which the person locations are estimated is better assessed with the DTMs (provided they are correct). Indeed, the analytical expressions for the θ i PSDs provided in the appendix are
where φ is the density of the standard normal distribution, Pijk = P(Xjk|θ
i
,
Finally, for psychometric and conceptual reasons, the PDD are expected to have a moderating role in validity assessment. First, as discussed above, the person location estimates of the less discriminating individuals are less reliable and, from basic attenuation theory, the unreliability of the score estimates attenuates the validity coefficient (e.g., Lord & Novick, 1968). Second, those individuals for whom the trait is relevant are expected to be more likely to display a stronger correspondence between trait self-description and external trait-relevant variables (Markus, 1977; Paunonen, 1988). For both reasons, those individuals with smaller PDDs would tend to be the most predictable although, in practice, the differential validity effects are expected to be modest at best (Ferrando, 2004, 2013).
The relatively low degree of reliability of the
Simulation Studies
Experience with the DTCRM suggests that the limited-information procedure proposed in this article works quite well in the simple linear case. For this reason, two simulation studies that focused on the new approaches proposed here as well as on the main differences with the original DTCRM proposal were undertaken. More specifically, the study considered only the DTGRM, which is the most general of the two UVA-based models. Due to space limitations, the complete studies as well as the tables of results (Tables A1(a), A1(b), and A2) are presented in the appendix, and only a summary is provided here.
The first study assessed the extent to which the approaches proposed here provide appropriate item calibration results and, above all, acceptable individual estimates of the two types of person parameter. Thus, results are presented at two levels: calibration and scoring. In the first calibration stage, the main aim was to check that data generated by the DTGRM did in fact behave like an FA model at the correlational level and that items could be well calibrated by fitting Spearman’s model to the inter-item polychoric correlation matrix.
The scoring part of the study is of more interest because now Bayes EAP estimates are used instead of the ML estimates originally proposed. The focus here was on the appropriate recovery of the “true” individual parameters and on the accuracy of the individual estimates.
The calibration results suggested that the FA model provided a good fit in all cases and appropriate recovery of the item parameters. The scoring results were also positive. For both θ
i
and
The second study aimed to assess the behavior of the LR Q statistic proposed above when the likelihoods are evaluated at their EAP estimates. Two situations were considered: H0, in which the correct model was the standard GRM with constant PDD, and H1 in which the correct model was the DTGRM. The results suggested that (a) the statistic allowed the correct model to be distinguished in all conditions and (b) power increased with test length, as expected. However, the statistic was conservative, and, under H0 it provided values systematically smaller than the chi-square expectations. This point is discussed further below.
Illustrative Example
Ferrando (2013) illustrated the functioning of the DTCRM with an instrument known as CTAC, a Spanish acronym for “Anxiety Questionnaire for Blind People.” The CTAC (Pallero, Ferrando, & Lorenzo-Seva, 1998) is a 35-item test that measures anxiety in situations related to visual deficit and which is intended to be used in the general adult population with severe visual impairment. The response format is 5-point Likert-type and, in the population for which the test is intended, the distributions of the item scores are generally unimodal and not extreme. This result suggests that, “a priori,” both the DTCRM and the DTGRM may be appropriate (see Culpepper, 2013). So, the results they provide can be compared for illustrative purposes. In Ferrando’s (2013) example, the CTAC was fitted in a sample of 352 respondents. Here a far larger sample of 760 adults collected from various centers belonging to the Spanish National Organization of the Blind (ONCE) is used.
The unidimensional FA model was fitted to the product-moment (DTCRM) and polychoric (DTGRM) inter-item correlation matrices by using robust unweighted least squares (ULS) estimation as implemented in the FACTOR program (Lorenzo-Seva & Ferrando, 2013). Appropriateness and goodness of fit were assessed at this stage by using a multifaceted approach that includes (a) conventional goodness-of-fit assessment, (b) equivalence testing as proposed by Yuan, Chan, Marcoulides, and Bentler (2016) (only available at present for the continuous model), and (c) measures of strength and replicability of the solution as well as closeness to unidimensionality. For both models, the results are in Table 1. They are clear and can be summarized as follows: The fit is quite acceptable by all standards, the solutions are strong and replicable, and the data are essentially unidimensional. As expected, the results for the continuous and the graded models at this stage are very similar
Calibration Results.
Note. RMSEA = root mean square error of approximation; Ts-RMSEA = T-size root mean square error of approximation; CFI = comparative fit index; Ts-CFI = T-size comparative fit index; GFI = goodness of fit index; z-RMSR = root mean square of residuals; ECV = explained common variance (ECV measures closeness to unidimensionality); G-H = generalized H index (G-H measures strength and replicability of the solution); LRT = likelihood ratio test.
The results for the LR test are at the bottom of Table 1. Even with the limitations of the Q statistic discussed above, they are quite clear, and more so given the conservative behavior of the test. In both the continuous and graded case, they suggest that the DTM is more appropriate than the corresponding normative model.
The calibration results for both models are now summarized. The standardized weights (α) ranged from 0.58 to 070 (DTCRM) and 0.62 to 0.74 (DTGRM), which are quite acceptable for personality items. The product-moment correlation between the α estimates produced by both models was .99. So, as expected, (a) the two sets of weights were in close agreement, and (b) the αs based on the polychoric correlations were slightly larger than those based on the product-moment correlations. The most accurate item was the same in both cases (Item 15) with an estimated α of .70 (DTCRM) and .74 (DTGRM). By using this item as a marker, E(
As for the locations, the range of β j values was (–0.87, 1.40) in the DTCRM and (–0.72, 1.32) in the DTGRM. In both cases, they were evenly distributed around 0, with means of 0.14 (DTCRM) and 0.17 (DTGRM). The correlation between both sets of estimates was 0.995.
EAP person estimates for the DTCRM and the DTGRM were obtained next. In both cases, the prior for θ was standard normal and the prior for σ2 was inverse χ2 with Scaling Parameter 3, and five degrees of freedom (see the appendix). Table 2 shows a summary of the accuracy of the estimates based on the marginal reliabilities in Equation 18 as well as on empirical split-half estimates.
Reliability of the Person Estimates.
Note. ρ-PSD = PSD-based marginal reliability; ρ-S-H = split-half reliability; EAP = expected a posteriori.
To sum up, there is a high degree of agreement between (a) the results obtained from the DTCRM and the DTGRM, and (b) the PSD-based and the empirical split-half reliability estimates. In both models, the reliabilities of the person locations are those expected in a good personality test, whereas those of the individual σ2 estimates are lower but would be acceptable for many purposes. Finally, the product-moment correlations between the estimates produced by both models were 0.98 for the central locations and 0.86 for the PDDs.
The results of two respondents based on the DTGRM are now compared to illustrate the role of the PDD in individual assessment. The person location estimate of respondent no. 704 was
Finally, the role of PDD is illustrated as a moderator in prediction. Because no external variable was available, the “internal” split-half schema mentioned above was used: The raw scores in the first half were taken as the “predictor” and the raw scores in the second half as the “criterion.” Moderated Multiple Regression (e.g., Baron & Kenny, 1986) results showed that the moderating effects of the PDDs were significant: The F statistic value used to judge the increment of R2 was 26.98 with df values of 1 and 757. The post hoc analysis was as follows: two subgroups were formed using Cureton’s (1957) 27% rule. The upper group contained the 205 respondents with the lowest PDDs (i.e., the most discriminating respondents), and the lower group contained the 205 with the highest (i.e., the least discriminating respondents). For the upper group, the split-half correlation was r = .94. For the lower group it was r = .75. Overall, the results suggest that the PDD estimates are useful in moderate prediction and are in the expected direction: The validity relations are stronger in the subgroups with the most discriminating respondents.
Discussion
Conventional psychometric modeling of typical-response measures considers lack of item discrimination as the sole source of measurement error. An alternative view stated by Lumsden (1978) was that items “are perfectly reliable” and that within-person variability (i.e., PDD) is the sole source of error. The view taken here is that both items and persons are sources of error and that the amount of error generally varies over persons and over items. This flexible scenario is thought to be the most plausible one for measurement in this domain. The problem, however, is that an analytically tractable modeling of this type does not exist at present for the most common item formats.
This article proposes a comprehensive approach to fitting dual models to continuous, graded, and binary item scores. The UVA on which it is based allows for a unified treatment of the models, so in all cases item calibration is performed via a FA of the inter-item correlation matrix with some additional restrictions. Simplicity and feasibility are possibly the main advantages of the proposal, as calibration is only slightly more complex than in standard models, whereas EAP scoring is quite similar to scoring a bidimensional model with independent factors.
The present proposal is a new, wide-scope development, and, as such, there are many points that require further research. At the methodological level, further simulation studies are needed mainly to establish the minimal conditions under which the dual models are expected to work well and provide reasonably accurate estimates for most of the individuals. Furthermore, future improvements in the approach proposed could be envisaged. For example, standard errors for the parameter estimates obtained under constraints, mainly
Experience suggests that proposals such as the present can be used in practice only if they are implemented in widely available (and preferably free) programs. At present, work is on progress on an R program that will implement all the procedures proposed here, and which, hopefully, will soon be available for interested readers.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project has been possible with the support of Ministerio de Economía, Industria y Competitividad, the Agencia Estatal de Investigación (AEI) and the European Regional Development Fund (ERDF) (PSI2017-82307-P).
