Abstract
We present a monotonic polynomial graded response (GRMP) model that subsumes the unidimensional graded response model for ordered categorical responses and results in flexible category response functions. We suggest improvements in the parameterization of the polynomial underlying similar models, expand upon an underlying response variable derivation of the model, and in lieu of an overall discrimination parameter we propose an index to aid in interpreting the strength of relationship between the latent variable and underlying item responses. In applications, the GRMP is compared to two approaches: (a) a previously developed monotonic polynomial generalized partial credit (GPCMP) model; and (b) logistic and probit variants of the heteroscedastic graded response (HGR) model that we estimate using maximum marginal likelihood with the expectation–maximization algorithm. Results suggest that the GRMP can fit real data better than the GPCMP and the probit variant of the HGR, but is slightly outperformed by the logistic HGR. Two simulation studies compared the ability of the GRMP and logistic HGR to recover category response functions. While the GRMP showed some ability to recover HGR response functions and those based on kernel smoothing, the HGR was more specific in the types of response functions it could recover. In general, the GRMP and HGR make different assumptions regarding the underlying response variables, and can result in different category response function shapes.
Keywords
A recent flexible approach to response function estimation involves replacing the linear predictor of standard item response models—the two- and three-parameter logistic (2PL and 3PL; Birnbaum, 1968) and generalized partial credit (GPC) models (Muraki, 1992)—with a monotonic polynomial (MP; Falk & Cai, 2016a, 2016b; Liang, 2007; Liang & Browne, 2015). Although there are other flexible models (Duncan & MacEachern, 2013; Miyazaki & Hoshino, 2009; Ramsay & Wiberg, 2017), the MP approach is promising for several reasons. Variants of the MP approach that use the expectation–maximization (EM) algorithm to maximize the marginal likelihood (EM-MML; Bock & Aitkin, 1981) can be used with multiple groups, facilitating their use in scaling, linking, or tests of differential item functioning (Falk & Cai, 2016a; Feuerstahler, 2016). Under conditions of high information, the approach has been shown in simulations (Falk & Cai, 2016a, 2016b; Feuerstahler, 2016) to improve recovery of response functions and item fit on par with kernel smoothing (Ramsay, 1991) and smoothed isotonic regression (Y.-S. Lee, 2007). With EM-MML estimation, the MP approach can be used with missing data and can be used on the same test as standard items rather than requiring all items to be estimated nonparametrically.
The main purpose of this manuscript is to define the monotonic polynomial graded response (GRMP) model, formed by replacing the linear predictor of the graded response model (GRM; Samejima, 1969, 1972) with an MP. This development is important, given the popularity of the GRM for psychological assessments. Indeed, some advocate use of a nonparametric model for personality or psychopathology data to reveal how certain items behave differently than what is expected (Meijer & Baneke, 2004).
In addition, previous MP-based models considered logistic building blocks to be monotonically increasing. Here, we propose a modification to the MP parameterization that retains monotonicity, but allows for boundary response functions (BRFs) of the GRMP to be either all increasing or all decreasing for any given item. Such a case is useful when some items are reverse coded and may be useful to multidimensional extensions of the model.
We also seek to enhance interpretation of MP-based models by discussing features of the GRMP as they relate to other variants of graded models such as the heteroscedastic graded response model (HGR; Molenaar et al., 2012), expanding upon a derivation of the model using the underlying variable approach (Bolt, 2005; Takane & de Leeuw, 1987) and providing conversion from logistic to probit parameterizations of the model. Although MP-based models lack a discrimination parameter, this work results in a quantity that may signify how closely related the item is to the latent trait.
We compare the GRMP with the monotonic polynomial generalized partial credit (GPCMP) model (Falk & Cai, 2016a) using data from the Patient Reported Outcomes Measurement Information System (PROMIS; Hansen et al., 2014). Next, the GRMP, GPCMP, and HGR are compared on data from the Synthetic Aperature Personality Assessment (SAPA) project (Condon, 2018). Although the original HGR model considered only a probit link function and directly maximized the marginal likelihood (Bock & Lieberman, 1970) using Mx (Neale et al., 2002), we implement both logistic and probit variants using EM-MML. A test of HGR with EM-MML is important as otherwise the HGR is not feasible for a long test. The two best fitting models from this latter example (logistic HGR and GRMP) are then compared in a small simulation study.
The remainder of this manuscript is organized as follows. In the Proposed Item Model section, we present the GRMP, its alternative parameterizations, and briefly discuss the relationship between the GRMP and other models. In the Empirical Examples section we present both empirical examples. In the Simulations section, we present two small simulation studies. We end in the Discussion and Conclusion with remaining challenges on the use of MP-based item models, and promising avenues for future research.
The Proposed Item Model
GRMP Model
Consider
where

Example boundary response functions and category response functions.
To obtain BRFs for the GRMP, we replace the term
where
Use of Equation (3) results in more flexible BRFs (bottom-left of Figure 1), and Equation (2) is again used to construct CRFs in a completely analogous manner (bottom-right of Figure 1).
The GRMP is a heterogeneous model (Samejima, 2008, 2010), as BRFs for a single item do not follow the exact same shape. Bends occur in the same locations along
MP Parameterization
The coefficients,
where
Item Parameters From Example Item Response Functions.
Alternative Representation and Relationship to Other Approaches
In a vein analogous to the relationship between categorical factor analysis and item response theory (Takane & de Leeuw, 1987), additional insight into the GRMP can be obtained by considering other related ways of deriving or representing the model. For instance, Falk and Cai (2016b) considers that underlying observed responses to item
where
In this article, we suppose that
Then, let
where
Falk and Cai (2016b) provide a similar expression to Equation (9), except with a lower asymptote and for dichotomous items. These authors were concerned with priors to prevent
Further conversion to the parameterization in Equation (9) is also possible (Table 2). If we constrain the variance of
where
Example Monotonic Polynomial Coefficients From Different Parameterizations.
Equation (7) resembles a polynomial regression, with a zero intercept and the additional constraint that the polynomial is monotonic. Description of the model may also rely on this analogy: Had none of the quantities in Equation (7) been latent and the polynomial had no special constraints, we could use standard software for linear regression to obtain the coefficients. The GRMP is modeling a nonlinear but monotonic relationship between the latent variable,

Relationship between latent trait and underlying response for GRMP and HGR, with density for three conditional distributions.
Differences between the GRMP and other models are also easier to see. For example, Molenaar et al. (2012) present BRFs for the HGR equivalent to
where
We assume that
Finally, in the standard normal ogive model, there is an item parameter (
Empirical Examples
Example 1: PROMIS
The purpose of the first application was to compare the GRMP and GPCMP. We were particularly interested in (a) whether use of an MP (greater than
Data
We used responses to sixteen 5-category Likert-type items from 3,605 daily smokers for measuring hedonic benefits of tobacco smoking from PROMIS (Hansen et al., 2014). These data overlap with that analyzed by Falk and Cai (2016a), who found better fit (according to Akaike information criterion [AIC]) with the GPCMP using at least
Estimation and fitted models
Ten final models were estimated: five for the GRMP and five for the GPCMP. We first describe the GRMP. Three models were estimated in which the MP order was the same for all items (
An additional two models employed a stepwise approach using either AIC or Bayesian information criterion (BIC), which we refer to as AIC-step and BIC-step, respectively. In particular, these models started with all items at the lowest order polynomial
The GPCMP model was based on Falk and Cai (2016a). We used starting values of −.5 for
We used Bayes modal estimation coupled with the EM algorithm (Mislevy, 1986), with integrals evaluated using rectangular quadrature from −5 to 5 in .1 increments. A standard normal
Results
Whether MP-based models improved fit depended slightly on the information criterion (Table 3). The AIC-stepwise model always fit best according to AIC, followed by either the
Obtained AIC and BIC for PROMIS Data.
Note. Two best fitting models for each row are in bold. AIC = Akaike information criterion; BIC = Bayesian information criterion; AIC-step = AIC-stepwise model; BIC-step = BIC-stepwise model; PROMIS = Patient Reported Outcomes Measurement Information System; GRMP = monotonic polynomial graded response; GPCMP = monotonic polynomial generalized partial credit.

Category response functions for PROMIS data, GRMP (thick, solid, and light) and GPCMP (thin, dashed, and dark) AIC-stepwise models.
Finally, we compared estimates of
Example 2: SAPA
In Example 2, we continue to compare the GRMP and GPCMP and add the HGR model. Thus, we were interested in whether the probit or logistic variants of the HGR would yield better fit than the GRMP, and whether similar CRF shapes would be observed.
Data
We examined 16,127 responses to ten 6-category anxiety items from the HEXACO Personality Inventory-Revised (HEXACO-PI-R; K. Lee & Ashton, 2018). The data were collected as part of the SAPA project (Condon, 2018; Condon & Revelle, 2015). We chose anxiety rather than the entire emotionality dimension to ensure that MP-based models are less likely to pick up on any local dependence that may arise from a multidimensional measurement instrument. The SAPA project employs a planned missing data design and responses to items are sparse within this large sample: only 2,666 participants on average completed each of the items we examined here (83% missing data). Finally, the anxiety scale included five reverse-keyed items, allowing us to test the GRMP with negatively loading items. In what follows, such items were reverse coded only when comparing the older MP parameterization of the GPCMP against the GRMP.
Fitted models
The same GRMP and GPCMP models were fit as in the first example. For the HGR (Molenaar et al., 2012), a probit or logistic link function can be used in Equation (13), which we refer to as HGR-P and HGR-L, respectively, with optionally scaling the entire contents of the HGR-L by 1.702 to ensure that item parameters are approximately on the same metric as the HGR-P. We programmed CRFs for the HGR-P and HGR-L and used the createItem() function of the mirt package (Chalmers, 2012) for EM-MML estimation with numerical derivatives.
4
Starting values following supplementary code from Molenaar et al. (2012) were used, with
Although the HGR-P could be checked against Mx (Neale et al., 2002) with code by Molenaar et al. (2012), we encountered some discrepancies across programs and some estimation difficulty with Mx. Since Mx was slow to estimate relative to our implementation, we instead decided to conduct the simulation study that follows this empirical example as an additional check on our code for HGR models and include example code in Supplementary Materials.
Results
Results for the GRMP and GPCMP were similar to that in the first empirical example: The GRMP tended to fit better, some items were modeled with at least
Obtained AIC and BIC for SAPA Data.
Note: Two best fitting models for each row are in bold. AIC = Akaike information criterion; BIC = Bayesian information criterion; AIC-step = AIC-stepwise model; BIC-step = BIC-stepwise model; SAPA = synthetic aperture personality assessment data; GRMP = monotonic polynomial graded response; GPCMP = monotonic polynomial generalized partial credit; HGR-p = heteroscedastic graded response model with probit link function; HGR-L = heteroscedastic graded response model with logistic link function.
The GRMP correctly estimated CRFs for the reverse-keyed items (1–2, and 7–9) as

Category response functions for SAPA data, GRMP (top two rows), and HGR-L (bottom two rows) AIC-stepwise models.
Simulations
Simulation 1
To check recovery of the GRMP and HGR-L models, and also probe whether either approach can recover CRFs from the opposing model, a simulation study was conducted.
Method
Data were generated for 10 items using item parameters from one of two models: The GRMP or HGR-L AIC-stepwise models from the second empirical example (see Supplementary Materials for item parameters). A standard normal latent trait was assumed with 50 replications and
To each data set, five models were fit to the data: (1) GRM as natively available in mirt (GRM); (2) GRM custom programmed using the parameterization in Equation (13) and estimated with mirt (GRMcust;
Results
Recovery of CRFs was calculated for each item and replication using root integrated mean square error (RIMSE; e.g., Falk & Cai, 2016a), which is calculated in the same manner as RIMSD, but computed using expected scores based on the true
RIMSE for Simulation Study 1.
Note. RIMSE = root integrated mean square error; GRM = graded response model; GRMP = monotonic polynomial graded response; HGR-L = heteroscedastic graded response model with logistic link function.
Under missing data, GRMP and HGR-L with AIC-stepwise selection improved recovery over the GRM, but only if the model corresponded to the data generating mechanism. For example, fitting the GRMP to HGR-L data did not tend to improve recovery and vice versa. Under complete data, a similar pattern held with one exception. Fitting the closest model to the true data generating model tended to have the best CRF recovery. However, the GRMP showed some gains in CRF recovery
Simulation 2
The second study had two goals. First, based on reviewer comments, we sought to examine CRF recovery from a data generating model that was neither derived from the GRMP nor logistic HGR. Second, MP models typically do not perform well unless very large sample sizes are used. Here, we experimented with a relatively smaller sample size, but with stronger priors for the GRMP in an attempt to better stabilize estimation.
Method
To construct a data generating model, a nonparametric approach that also works with incomplete item responses as in our empirical examples is not immediately apparent. We therefore used all respondents with complete data
All models described in Simulation 1 were fit to each data set, with one change to the GRMP: Either weak,
Results
The GRMP and strong priors resulted in the best CRF recovery at the smaller sample (
RIMSE for Simulation Study 2.
Note. RIMSE = root integrated mean square error; GRM = graded response model; GRMP = monotonic polynomial graded response; HGR-L = heteroscedastic graded response model with logistic link function.
Discussion and Conclusion
We have presented the GRMP, an extension of the GRM in which the linear predictor is replaced with an MP. The GRMP can model a nonlinear monotonic relationship between the latent variable and underlying response variables, and can fit data better than the GPCMP, a previously developed MP-based item model (Falk & Cai, 2016a). That the GRMP fits better than the GPCMP is consistent with Maydeu-Olivares (2005), who found that the GRM tended to fit data better than the GPC across several personality data sets. Still, Maydeu-Olivares (2005) suggested that simpler parametric models may be more likely to cross-validate. One reason that some complex item models may improve fit over parametric models stems from possible unmodeled multidimensionality. This is part of the impetus for changes in the MP parameterization, which takes a step toward the development of multidimensional MP models. However, other parameterizations of MPs may also be worthwhile to investigate (e.g., Murray et al., 2013).
In another example, the GRMP fit better than the HGR, but only if using a probit link function as originally presented by Molenaar et al. (2012). “Fit” here is quantified using AIC and BIC, which have their own limitations. For example, BIC did not always suggest that higher order polynomials improved fit, yet BIC has a tendency toward parsimony. In some cases, BIC prefers item response models that are simpler than the true model (Waller & Feuerstahler, 2017). Falk and Cai (2016b) found a similar result when using sum score–based item fit statistics (Orlando & Thissen, 2000) for determining polynomial order: BIC suggested that the use of MPs might not improve fit. Results of simulations, however, suggest that AIC selection and the procedure of Falk and Cai (2016b) or the use of other search algorithms (Falk, 2019) can improve recovery of response functions when using MP-based item models.
Simulations suggested that the GRMP is flexible and can recover a variety of CRF shapes. In the second simulation study, recovery was improved at small sample sizes
That GRMP and HGR tended to suggest different CRF shapes highlights the need for better theory in choosing an appropriate modeling strategy as mere “curve-fitting” (Samejima, 2008) may not help us arrive at a useful model. Strong theory coupled with improved fit may provide a good argument for one model and can result in better recovery of CRFs. This, in turn, we would expect to result in better recovery of latent traits (e.g., Falk & Cai, 2016a). There are a number of reasons why heteroscedasticity in the response variable may occur in practice (Molenaar et al., 2012). The GRMP is apparently more flexible than the HGR and may help uncover other oddities in response functions but at the cost of more estimated parameters. The HGR has one extra parameter per item, whereas the GRMP will have two or more depending on the order of the polynomial. This might suggest that the GRMP is a more complex and less parsimonious model than the HGR. We have therefore chosen to contrast the underlying assumptions regarding the GRMP and HGR. Regardless, this work takes steps at making the HGR more feasible as simulations supported EM-MML estimation for both the GRMP and logistic HGR. Evidence that overall model fit is improved due to either item models may warrant further inspection of item content and an investigation of the possible cause.
Supplemental Material
supplement_APM_R2 – Supplemental material for The Monotonic Polynomial Graded Response Model: Implementation and a Comparative Study
Supplemental material, supplement_APM_R2 for The Monotonic Polynomial Graded Response Model: Implementation and a Comparative Study by Carl F. Falk in Applied Psychological Measurement
Footnotes
Acknowledgements
The author thanks the editors and reviewers for helpful comments during the review process.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study received support from the Fonds de recherche du Quebec—Nature et technologies (grant no. 2019-NC-255344).
Supplemental Material
Supplementary material is available for this article online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
