Abstract
Some previous studies suggest that response times (RTs) on rating scale items can be informative about the content trait, but a more recent study suggests they may also be reflective of response styles. The latter result raises questions about the possible consideration of RTs for content trait estimation, as response styles are generally viewed as nuisance dimensions in the measurement of noncognitive constructs. In this article, we extend previous work exploring the simultaneous relevance of content and response style traits on RTs in self-report rating scale measurement by examining psychometric differences related to fast versus slow item responses. Following a parallel methodology applied with cognitive measures, we provide empirical illustrations of how RTs appear to be simultaneously reflective of both content and response style traits. Our results demonstrate that respondents may exhibit different response behaviors for fast versus slow responses and that both the content trait and response styles are relevant to such heterogeneity. These findings suggest that using RTs as a basis for improving the estimation of noncognitive constructs likely requires simultaneously attending to the effects of response styles.
Keywords
Recent advances in technology have fostered the use of computer-based assessments in education and psychology. Unlike traditional paper-and-pencil tests, computer-based assessments enable the collection of process data in addition to item responses, potentially providing construct-relevant information about how individuals interact with items. The examples of process data include log data, such as mouse clicks, timestamps, keystrokes, and action sequences (e.g., navigation between pages), as well as other types of information, such as eye movement and brain imaging data. Because process data are sometimes readily available, it provides opportunities for researchers to understand the underlying cognitive, behavioral, and psychological response process contributing to the final item responses.
The amount of time a respondent spends responding to an item is one of the most common forms of process data. Response time (RT) has been extensively studied especially in relation to cognitive measures, often with a goal of understanding the cognitive processes underlying the item response (see De Boeck & Jeon, 2019, for an overview), but also as a basis for possible improved estimation of respondent proficiency (Bolsinova & Tijmstra, 2018). De Boeck and colleagues (Bolsinova et al., 2017; De Boeck & Partchev, 2012; DiTrapani et al., 2016; Molenaar & De Boeck, 2018) have shown that different RTs (e.g. fast vs. slow responses) possibly reflect different response mechanisms/processes (automatic vs. controlled processing). Many other researchers have also illustrated the use of RTs for detecting inattentive or aberrant response behaviors (such as rapid guessing, speeding, cheating, and omission behaviors; e.g., Meyer, 2010; Ulitzsch et al., 2020a, 2020b; van der Linden & Guo, 2008; Wang & Xu, 2015; Wang et al., 2018; Wise & DeMars, 2006; Wise & Kong, 2005), which are major threats to measurement validity in cognitive assessments. Despite the similar potentials of RTs with noncognitive constructs, there has not been as much research examining the usefulness of RTs in this context (Henninger & Plieninger, 2021; Ranger, 2013).
Noncognitive assessments often rely on self-report items using Likert-type rating scales. One important distinction from cognitive items is that the rating scale items do not have correct item responses. Of those studies that have explored RTs in relation to noncognitive measures, results suggest that individuals at the extremes of a latent trait continuum tend to have shorter average RTs, exhibiting an inverted-U relationship (Akrami et al., 2007; Kuiper, 1981). Such a relationship can possibly be modeled based on the speed–distance hypothesis assuming that faster item responses are associated with larger item-person distances (Ferrando & Lorenzo-Seva, 2007). That is, a larger item-person distance can imply higher confidence in responding and thus yield a quicker item response. Alternatively, Ranger and Ortner (2011) suggested an approach attending to the relationship between item RTs and the probability of a given response. They assumed that responses are generally faster if the given item response is more probable considering the content trait and psychometric characteristics of the item. For instance, on a rating scale from extremely disagree to extremely agree, a respondent with a high content trait level is expected to select a category reflecting a high level of the trait (e.g., extremely agree) faster than a category that is unlikely (e.g., extremely disagree; see Ranger, 2013, for a comparison of the two approaches).
Both of these hypotheses are largely in line with the findings in other studies demonstrating that fast responses are likely to occur when respondents have a strong self-schema related to the item content (Markus, 1977), which facilitates automated response processing (Akrami et al., 2007; Fazio et al., 1986; Holden et al., 1991). If a respondent has an accessible and strong trait relevant to item content, clear self-knowledge about the trait can lead to high confidence and certainty about the response, and thus, the response can be given more easily and quickly (Arndt et al., 2018; Germeroth et al., 2015; Grant et al., 1994; Hanley, 1965; McIntyre, 2011). At the item level, fast RTs suggest the item was clearly understood and evaluated by the respondent, potentially indicating the item to be a more useful (i.e., discriminating) indicator of the content trait level than a slower item response, which may indicate greater difficulty in item interpretation or evaluation (and thus lower discrimination). As greater item discrimination supports a stronger weighting of the item in scoring, one implication could be that faster responses be upweighted (and slower responses downweighted) in the scoring of individual respondents.
At the same time, one concerning feature of the self-report rating scale format is the potential for response style heterogeneity across respondents. Response styles refer to systematic tendencies to over- or underselect specific response categories irrespective of item content (Paulhus, 1991). Here, we define response style in a more limited way as in Henninger and Plieninger (2021), distinguishing it from other response behaviors related to careless or insufficient effort responding (C/IER) that do not reflect the content trait, such as straight-lining or random responding (Huang et al., 2012; Ulitzsch et al., 2022; see Alarcon & Lee, 2022, for the relationship of response styles and C/IER) that are sometimes also characterized as response styles (see Baumgartner & Steenkamp, 2001; Van Vaerenbergh & Thomas, 2012). Two of the most frequently observed response styles according to the former definition are extreme response style (ERS; the tendency to prefer extreme categories) and midpoint response style (MRS; the tendency to prefer a midpoint category). Henninger and Plieninger (2021) have recently illustrated that response styles are strongly associated with RTs, in that responses are faster when a respondent provides a response style-driven response. Such results suggest the availability of the selected response as a likely factor in RT (see also Lyu & Bolt, 2022). For example, respondents with a high ERS level generally respond faster when selecting extreme response categories and slower for their nonextreme responses. Because response style behavior is often viewed as a manifestation of less attentive responding on the part of the respondent, such effects may counteract the potential benefits of using RT to improve content trait estimation.
While there recently have been approaches simultaneously attending to C/IER and content trait in the use of item-level (Ulitzsch et al., 2022) or screen-level RTs (Ulitzsch et al., 2023), there is limited research that attends to simultaneous influences of content and response style traits (Lyu & Bolt, in press). It is conceivable that previously found associations between content trait and RTs might only be a manifestation of response style effects or alternatively that the Henninger and Plieninger (2021) findings are due to unmodeled content trait effects. In this article, we explore the simultaneous relevance of content and response styles to better understand the reality of both effects and potentially to gain insight into their relative strength. Given that a primary purpose in modeling RTs in relation to the content trait may be to understand the potential usefulness of RTs toward improved trait estimation, it is important to understand how, if at all, response styles may interfere with such applications. Furthermore, attending to both the content trait and response styles, we attempt to examine within-individual heterogeneity in the cognitive psychological processes underlying the responses given at different speeds. We believe such an exploration can potentially provide a meaningful basis for future developments of RT models in the context of noncognitive assessments.
Our methodological approach parallels the approaches adopted in recent studies with cognitive assessments that have modeled local dependencies between item responses (i.e., response accuracy) and RTs by allowing heterogeneity in model parameters in relation to RTs (Bolsinova et al., 2017; DiTrapani et al., 2016; Molenaar & De Boeck, 2018; Partchev & De Boeck, 2012). These studies have consistently found systematically varying item characteristics across slower and faster responses, which are interpreted as suggesting different cognitive processes involved in relation to RT. In a similar way, we intend to understand how RTs may suggest different things about the performance of items on rating scale assessments of noncognitive constructs by examining how psychometric item characteristics (attending to both content trait and response styles) differ across RTs by comparing the same item’s performance in the presence of fast versus slow responses. One advantage of our approach is that it helps to examine the heterogeneity in response process within individuals across fast versus slow responses and is more useful in understanding how RTs inform the measurement of traits. It is expected that such an investigation might help reveal the potential of RTs for ultimately improving the estimation of the content traits.
Based on our theory above and the findings of earlier investigations that separately examined the influence of content and response style traits, we anticipate that faster responses will show greater discrimination than slower responses, although as noted, it is unclear whether that higher discrimination will be seen only for the content trait, only for response style traits, or for both sets of trait parameters. Also of interest would be the relative changes in discrimination seen across fast versus slow responses for both types of traits.
We acknowledge that Lyu and Bolt (in press) recently explored the simultaneous effect of content and response styles traits by regressing RTs on the estimated traits and response styles. Specifically, Lyu and Bolt (in press) looked at the prediction of RT in a regression model attending to how RTs are related to the likelihood of response category selection as determined by both response styles and content trait influences, but they did not consider measurement implications. Thus, their results do not speak to the potential value of using response styles in improving measurement, nor do they speak to the interpretations regarding potential differences in psychological response process (i.e., that fast responses can either be more response style driven or content trait driven).
The remainder of this article is organized as follows. First, we describe an item response theory (IRT) model, specifically a multidimensional nominal response model (MNRM) that attends to both content and response style traits in rating scale assessments. We then demonstrate the way in which we classify each item response into a fast or slow response class and how we fitted the model so as to allow the same item to have different parameters across RT classes. We next examine systematic differences in item parameter estimates across fast and slow responses, particularly focusing on item discrimination estimates on the content trait and response styles. We further demonstrate how the relative differences in discrimination estimates between fast versus slow responses vary across the content trait and response styles. Lastly, we discuss the results of this study in regard to potential attempts toward using RTs to improve content trait estimation.
MNRM for Response Styles
One approach to modeling item responses on self-report rating scales considers item responses as reflective of both the intended-to-be-measured content trait and unintended-to-be-measured response styles, such as ERS and MRS. A flexible IRT model that can attend to both types of latent traits is an MNRM (Bolt & Johnson, 2009; Bolt & Newton, 2011; Falk & Cai, 2016; Falk & Ju, 2020; Thissen & Cai, 2016), among others (see Henninger & Meiser, 2020, for an overview of other approaches). The MNRM is a multidimensional extension of Bock’s (1972) nominal response model for polytomous items, introducing additional factor(s) to account for response style(s). While the original MNRM models response styles with either freely estimated item slopes or equally constrained slopes across items (Bolt & Johnson, 2009; Bolt & Newton, 2011; Wetzel & Carstensen, 2017), the reparameterization of MNRM proposed by Thissen and Cai (2016) and Thissen et al. (2010) separates the overall item discrimination and scoring function (that defines the relationship between item response categories and latent trait), making it easier to model multiple response styles simultaneously while allowing us to examine varying effects of latent traits (including response styles) across items (see Falk & Cai, 2016). This feature is particularly useful for our study as the heterogeneity in item slopes for content trait and response styles across fast and slow responses is of interest.
The general form of MNRM (Thissen & Cai, 2016) is specified as:
where
The MNRM is a compensatory multidimensional IRT model that assumes item responses are influenced by both content and response style traits in an additive way. In this model, the content and response style traits are defined by assigning specific values to
Item Response Mixture Analysis
We examine psychometric differences across fast and slow responses by comparing separately estimated item parameters for the two response types. For this purpose, as summarized in Figure 1, we apply a form of response mixture analysis involving two steps: (1) classifying each item response into a fast or slow class and (2) estimating class-specific item parameters for each item. The classes here are classes of item-by-person responses (not classes of items or persons) and are manifest classes (not latent classes). For the classification of item-by-person responses, we use double-centered log-transformed item RTs obtained after accounting for item and person effects using cross-classified random effects modeling. In the second step, we estimate class-specific item parameters for each item by allowing a separate estimation of item parameters under the MNRM that incorporates both the content and response styles traits. We describe each step in the following two subsections.

A diagram illustrating the two-step procedure of response mixture analysis.
Classification: Distinguishing Fast Versus Slow Responses
To define item responses as fast or slow, we base classification on the double-centered log-transformed item RT, in which the item and person effects are removed. This allows each respondent and item to have relatively fast and slow responses irrespective of the respondents’ baseline response speed (e.g., reading speed, cognitive processing speed) and item attributes (e.g., item length, item complexity). We can thus examine what RTs within individuals imply, which can potentially help us learn how individuals behave differently across items dependent on RTs. Admittedly, there are arbitrary aspects to this classification (e.g., respondent variability in speed might imply larger percentages of items should be fast or slow for particular examinees), but the resulting approach also has the advantage of creating a balance of both items and examinees across classes that supports parameter linking (under assumptions of respondent parameter invariance), making it easier to interpret differences in item parameters across groups.
To estimate the double-centered item RTs, we treat each log-transformed item RT as a value nested within items and respondents (having multiple memberships) and fit a cross-classified random effects model. Specifically, the cross-classified model characterizes RTs as
where
Based on
where
Example Demonstration of Data Showing Item Responses (
Note.

Example illustration of classification of slow and fast item responses within an individual. Note. In this illustration, item responses are on a 5-point Likert-type scale. Black points represent the expected (fitted) log-transformed response time of the individual for each item based on the respondent and item speed parameters. Item responses above and below the points are defined as slow (red) and fast (blue), respectively.
Estimation: Obtaining Class-Specific Item Parameter Estimates
Once item responses are classified as fast versus slow, we separately estimate item parameters for the two classes of responses. We formulate a model that can estimate class-specific item parameters under the MNRM by specifying the item response probability as:
where
In this model, we can effectively estimate different item parameters for the slow and fast item responses according to the class of item responses (
and
where all parameters are defined as before. Importantly, as already implied in Equation 4, latent trait vector
Empirical Datasets
We used three empirical datasets, each of which contained both item responses and RTs. The datasets correspond to three different online personality tests: the Fisher Temperament Inventory (FTI), the Big-Five Personality Test, and the Multidimensional Introversion-Extraversion Scales (MIES). While there is a limited information on how the RTs were collected under each test, it is noted in a codebook for the Big-Five Personality Test that an item RT was calculated as the time difference between the button click for the item and the previous click. The original full datasets and their relevant documentations can be accessed from www.openpsychometrics.org.
Fisher Temperament Inventory (FTI)
The FTI consists of 56 items measuring four broad temperament scales: Curious/Energetic, Cautious/Social Norm Compliant, Analytical/Tough-Minded, and Prosocial/Emphathetic. Sixteen items are used to measure each scale. All items are measured on a 4-point Likert-type rating scale (1 = strongly disagree, 2 = disagree, 3 = agree, and 4 = strongly agree) and were presented to each respondent in random order. The accessed data contain item response and RT information (in milliseconds) from 4,967 respondents who had responded to the FTI as an optional survey at the end of the assessment.
Big-Five Personality Test
The Big-Five Personality Test measures five personality factors (Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness) based on the Big-Five Factor Markers (Goldberg, 1992) from the International Personality Item Pool. The test has 50 items in total (10 items for each dimension) and includes a number of reverse-worded items. Item responses are 5-point Likert-type ratings (1 = disagree, 3 = neutral, and 5 = agree) and are presented in the same order to all respondents. The accessed data contain item responses and RTs (in milliseconds) from 1,015,341 respondents.
Multidimensional Introversion-Extraversion Scales (MIES)
The MIES test consists of 91 items measuring traits expected to show substantial differences between introverts and extroverts. The test includes reverse-worded items. Item responses are 5-point Likert-type ratings (1 = disagree, 2 = slightly disagree, 3 = neutral, 4 = slightly agree, and 5 = agree) and are administered in a random order to each respondent. The accessed data contain item responses and corresponding RTs (in milliseconds) from 7,188 respondents.
Data Preparation
For each dataset, we initially carried out a two-step procedure for data cleaning. First, we removed the cases from the same IP address indicating a duplicate submission or submissions from a shared network for FTI and Big-Five Personality Test datasets, which had such information available. Second, we eliminated item responses with extremely long or short RTs because such responses likely indicate invalid or aberrant responses that may distort the results (Henninger & Plieninger, 2021; Mayerl, 2013; Zhang & Conrad, 2014). We treated item responses with log-transformed item RTs that deviated more than
Empirical Data Analyses
We analyzed the three datasets based on the response mixture analysis described above. Specifically, we first estimated the double-centered log-transformed item RTs and classified each item-by-person response into fast or slow classes. Then, we fitted the model in Equation 4 to separately estimate item parameters for fast and slow classes under MNRM. For model evaluation purposes, we also fitted the original MNRM, estimating a single set of item parameters for an item using all item responses, as a baseline model. By comparing the two approaches, we can evaluate whether estimating class-specific item parameters for fast versus slow responses improves model fit.
As the three datasets had different numbers of content traits and response categories, the specifications of the MNRM fitted to each data were slightly different. For FTI data, we incorporated four different content traits (corresponding to each of the four constructs measured) and ERS in the model as there were four response categories. The scoring functions were thus specified as
For model estimation, we adopted a fully Bayesian estimation method. We implemented the No-U-Turn Sampler (Hoffman & Gelman, 2014), an extension of Hamiltonian Monte Carlo algorithm, by using Stan (Stan Development Team, 2023b) with RStan package (Stan Development Team, 2023a) in R (R Core Team, 2022). The priors used for the analyses were
For each dataset, we first compared the model fit for the response mixture model to that of the baseline model, which estimates the same item parameters for fast and slow responses. The expected log pointwise predictive density (ELPD) calculated based on the leave-one-out cross-validation (LOO) is used for the comparison (Vehtari et al., 2017). A larger ELPD (lower LOO information criterion) indicates a better model fit. After confirming that a separate estimation of item parameters improved the model fit (i.e., the response mixture model being the preferred model), we compared the estimated item parameters across response classes for each dataset to detect systematic differences in the estimates. Then, we further investigated how such differences might vary across latent traits.
Empirical Analysis Results
The double-centered log-transformed RTs obtained from the three datasets generally seemed to be distributed around 0 (see Figures A1
–A3 in Appendix), indicating that persons and items are likely to be evenly distributed across two classes. As noted earlier, such balance helps us ensure that item parameter estimates are comparable across fast and slow classes and that we have an adequate number of respondents in each class to obtain sufficiently precise estimates of item parameters for each item. The findings from the response mixture analysis are presented in the following subsections showing the model fit comparison results followed by systematic differences found in item parameter estimates across fast and slow response classes. We consistently denote the content trait-related factors as
Model Fit
We first consider the estimated correlations among the content (
Table 2 presents the model fit of the response mixture analysis approach in comparison to the baseline model. As can be seen from the table, the response mixture model produced lower LOO information criterion (LOOIC) values compared to the baseline model for all three datasets, indicating a better model fit. Such results suggest that fast and slow responses are associated with different psychometric characteristics, a result also seen previously with cognitive test instruments.
Model Fit Comparisons Between the Response Mixture Analysis and the Baseline Model for Fisher Temperament Inventory (FTI), Big-Five Personality Test (BIG5), and Multidimensional Introversion-Extroversion Scales (MIES) Datasets
Note. LOOIC = Leave-one-out cross-validation information criterion;
Differences in Item Parameter Estimates Across Fast and Slow Responses
Inspecting the differences in item parameter estimates across fast and slow response classes, we observed systematic differences in item discrimination estimates on both the content and response style traits. Figure 3 presents the scatterplots of discrimination estimates from fast versus slow responses for each latent trait (i.e., content traits and response styles). Note that item discrimination estimates on each trait can be directly compared across fast and slow responses because the same respondents are present across the response classes, and respondent trait parameters are assumed to be invariant. We can see from the plots that the points consistently lie above the diagonal, indicating that discrimination estimates on both the content and response style traits (ERS and MRS) are consistently higher for faster responses compared to slower responses. Such a systematic difference in discrimination estimates across slow and fast responses for all traits implies that faster responses are simultaneously more reflective of both the content trait and response styles than slower responses.

Comparisons of the estimated item discrimination parameters on content trait (
To examine this in more detail, we looked at RT distributions of responses for different subgroups of respondents with different levels of content traits and response styles. Figures 4 and 5 provide an illustration of distributions of double-centered log-transformed RTs for each response category for groups of individuals with varying levels of content trait and response styles. The respondents were defined as “high” for each latent trait if their trait estimates were greater than or equal to 1, “medium” if the estimates were between −0.5 and 0.5, and “low” if below −1. As expected, we observe that the responses that conform to the levels of the content trait and response styles were generally given faster than responses that were not. For example, respondents with high or low content trait levels tended to give faster responses on response categories in the direction corresponding to their trait level (i.e., high and low content trait groups tended to show shorter RTs for higher and lower extreme ends, respectively). This pattern can particularly be seen in the middle boxplots (for “medium” levels of responses styles) of (a) and (c) of Figures 4 and 5. Similarly, as to the response styles, respondents with high ERS levels generally selected extreme categories faster compared to nonextreme categories (see plots in the third column of Figure 4) and those with high and low MRS levels, respectively, tended to select the midpoint category relatively faster (see plots in the third column of Figure 5) and slower (see plots in the first column of Figure 5) than other categories. One important observation is that the plots exhibit simultaneous effects of both the content trait and response styles on RTs. For instance, respondents with high levels of both content trait and ERS tended to provide the fastest responses for a high extreme end, while rather slower responses for the lower extreme end, a manifestation of the mixed effects of high content trait and high ERS. Such observations in Figures 4 and 5 consistently demonstrate that faster responses (under the dotted horizontal line at 0) better reflect both the respondents’ content trait and response styles levels, as was already suggested by the higher item discrimination estimates for faster responses.

Distributions of double-centered log-transformed response time for each response category selection for different groups of respondents with different levels of content trait and extreme response styles for Big-Five Personality Test datasets. (a) High content trait (θ). (b) Medium content trait (θ). (c) Low content trait (θ).

Distributions of double-centered log-transformed response time for each response category selection for different groups of respondents with different levels of content trait and midpoint response styles for Big-Five Personality Test datasets. (a) High content trait (θ). (b) Medium content trait (θ). (c) Low content trait (θ).
Also of interest in Figure 3 is the tendency to see a much stronger linear relationship between discrimination estimates on the content trait across classes as opposed to the weaker linear relationship between discrimination estimates for the response style traits. We think this also has an explanation. Importantly, regardless of whether the response is fast or slow, the item is the same. Some items will naturally be more aligned with the construct (i.e., have higher discriminations) than others, and we expect these effects to be present regardless of whether the item response is slow or fast. As a result, this strong relationship between discriminations can be expected. For response styles, by contrast, we don’t necessarily have such expectations, as there may well be nothing about the item specifically that makes it more or less susceptible to response styles.
Overall, the results illustrate that respondents tend to select responses faster when the chosen category is more consistent with their content and/or response styles trait levels. In other words, responses are generally faster if the given response is highly likely given a respondent’s content trait and response styles, whereas slower if less likely. This can also be evidenced by Figure 6, which illustrates that faster responses are, on average, producing higher likelihoods for respondents compared to slower responses.

Boxplots of averaged log-likelihoods for fast versus slow item responses for all respondents for (a) Fisher Temperament Inventory, (b) Big-Five Personality Test Personality Test, and (c) Multidimensional Introversion-Extraversion Scales datasets.
In addition to the discrimination parameters, we examined differences in category intercept estimates across fast versus slow responses. For this purpose, we compared the mean and variance of item scores for slow and fast responses, which are easier to interpret and ultimately provide similar information to that of the category intercepts. Fast responses tended to show lower mean scores when items were more difficult (i.e., items with more “disagree” responses) and higher mean scores as items became easier. As to the variance of item responses, the variance of fast responses relative to that of slow responses was generally larger for items with overall mean scores near the center of the scale (implying that fast responses may tend to be in the two extreme ends) but reduced as the mean score either increased or decreased. These results are consistent with the tendency for faster responses to more likely be extreme responses than slower responses, a result also consistent with previous studies illustrating the association between fast RTs and extreme ratings (Casey & Tryon, 2001; Grant et al., 1994; Kuiper, 1981).
Relative Differences in Discrimination Across Fast and Slow Responses
We further compared the relative differences in discrimination estimates for fast versus slow responses across the content and response style traits. Specifically, the log of the ratio of discrimination estimates from fast to slow responses (

Comparisons of boxplots of relative size of discrimination estimates for fast versus slow responses across content trait (

Comparisons of the relative discrimination estimates for fast versus slow responses (with 95% credible intervals) in relation to content trait (
Among the two response style traits considered, MRS tended to show a larger difference in fast and slow discrimination estimates than ERS (see Figure 7), suggesting that fast responses to midpoint responses may be particularly indicative of MRS. Such an effect appeared to be significant for the Big-Five dataset (p value
Conclusion and Discussion
Due to the increased use of computer-based assessments, RTs are now a common and accessible form of data that can provide useful information for understanding underlying response processes or behaviors. This article explored the association between item responses and RTs in noncognitive rating scale assessments by attending to the simultaneous influence of two types of traits that have been found relevant in previous studies: the intended-to-be-measured content trait and unintended response style traits. In this respect, we extend prior work that has mainly attended to separate influences of the content trait and response style sources. By considering both sources simultaneously, we not only confirm the effects of each on RT but also examine their relative effects. The current datasets studied seem to support such evaluations due to the low correlations seen between content and response style traits.
Following parallel work with cognitive assessments, we considered these effects by examining psychometric differences between fast and slow responses and showed how items display different psychometric properties depending on whether the item response is faster or slower than expected. Systematically higher item discriminations are seen for faster responses on both the content and response style traits, indicating that faster item responses are generally more reflective of both types of traits. The distinction appears to be even greater for response styles, as evidenced by the greater difference in discriminations between fast and slow responses. Such findings suggest the presence of heterogeneity in response behaviors within individuals, which is dependent on item RTs and its simultaneous relevance to the content trait and response styles. This is in line with the recent study results by Lyu and Bolt (in press), demonstrating the simultaneous influence of content trait and response styles on RTs on rating scale items.
These effects, seen in this study for both content and response style trait types, are consistent with previous studies, where differences in item parameter estimates across classes have been taken to suggest differences in psychological response process for fast versus slow responses. According to Callegaro et al. (2009) and Henninger and Plieninger (2021), fast responses on rating scale items may be due to a respondent’s (1) high confidence in the item response and/or (2) a shallower cognitive process. Many studies have reported that respondents tend to give fast and “automated” responses when they have a strong self-schema related to the item (Akrami et al., 2007; Fazio et al., 1986; Grant et al., 1994; Hanley, 1965; Holden et al., 1991), whereas slow responses are given when respondents are uncertain about the responses or provide more highly “controlled” responses (e.g., editing or fake responses; Holden, 1995; Holden & Hibbs, 1995; Holden et al., 1992; Holtgraves, 2004; Monaro et al., 2021; Roma et al., 2018). At the same time, fast and spontaneous responses are also known to occur when respondents go through a shallow and superficial cognitive process (Callegaro et al., 2009), which may arise as a response style-driven response process. Along these lines, Henninger and Plieninger (2021) illustrated that RTs are shorter if a respondent makes a response-style-driven category selection, and we think this can be viewed as a type of “automated” response process. Fast responses may thus be associated “automated” response process either with higher confidence and stability (Arndt et al., 2018; Casey & Tryon, 2001; Germeroth et al., 2015; Grant et al., 1994; Kuiper, 1981) or response-style driven responses. In contrast, slow responses can be considered to be related to a “controlled” response process that involves editing/faking of responses or selection of anti-response style responses, both leading to responses inconsistent with the content or response styles traits.
In sum, this study lends insight into understanding response processes/behaviors across RTs with regard to the content trait and response styles. Respondents tend to exhibit different response behaviors (e.g., “automated” or “controlled”) across items depending on the response speed, and both the content trait and response styles are associated with such heterogeneity. One practical implication is that response styles will likely need to be attended to if one wants to use RTs to improve the estimation of the content trait. Unlike cognitive tests, where item correctness information is available and faster responses can often be regarded as more reflective of a latent proficiency, in noncognitive assessments, the contributions of both intended and nuisance traits likely preclude such applications, unless the response style dimensions are explicitly accounted for in the analysis.
Our results highlight another place in which the effects of the content trait and response styles in item response processes may be confounded (Adams et al., 2019). Item response tree models for response styles have received much attention for modeling response styles in rating scale items. The models typically assume that the effects of the content trait and response styles can be separately modeled by attaching each trait to a different subprocess underlying the item. From another perspective, however, the influence of content and response style traits may be inextricably confounded, possibly due to the presence of a mixture of respondent types (Kim & Bolt, 2021). In this study, we see the same form of confounding present in RTs. Specifically, it is difficult to distinguish whether fast extreme responses are being given due to an extreme level of the content trait or ERSs. This is particularly because noncognitive items do not have “correctness” of responses, unlike the cognitive tests where we can possibly separate the sources of fast responses based on the item correctness. As a consequence, it does not seem that attending to RTs necessarily provides a panacea to the confounding of these traits in the selection of the item response.
There remain additional directions for research. We primarily focused on examining psychometric differences between fast and slow responses by comparing item properties. Given that the ultimate goal of modeling RT is to improve the estimation of content traits, future work might explicitly quantify the degree to which RTs can actually improve trait estimation. In this study, though not presented in this article, the estimated latent traits and their posterior standard deviations (PSDs) obtained from the response mixture analysis turned out to be very similar to those estimated from the baseline model that does not account for response speeds (correlations > .98 for the estimates and >.80 for the PSDs). This is possibly due to a large number of items and multiple constructs being measured in the tests, conditions where item responses themselves already provide sufficient information for estimating the content and response styles traits. It is plausible that the response mixture analysis improves the estimation of latent traits in conditions having less information for the content and response styles trait estimation. Thus, it can be particularly useful and interesting to look at with a shorter test, where the estimation of person traits solely based on item responses can be relatively less accurate. In addition, while this study mainly focused on two types of response styles (i.e., ERS and MRS), researchers may consider other types of response styles, such as acquiescent response style (the tendency to agree irrespective of item content) or socially desirable responding (the tendency to respond in a way to look good).
Our study was naturally subject to some limitations. We classified item responses into a fast or slow response class based on double-centered log-transformed item RTs, whereby the main effects associated with items and persons on RTs were removed. This way of characterizing RTs made it possible to examine how RTs are connected to the content and response style traits within individuals. The double-centering approach taken in this study, similar to previous approaches taken with cognitive item response data, can be viewed as somewhat arbitrary in defining fast versus slow response classes. While we expect some robustness to our findings to imperfections in classification, it is of course reasonable to speculate the some responses classified as fast might actually be slow and vice versa. Future studies might consider alternative ways of defining RTs for classification, such as by accounting for the effects of item features (e.g., item position, item length complexity), or else distinguish fast and slow classes in different ways to facilitate a better understanding of item-person interactions. For instance, we could use median values of RTs to distinguish fast and slow responses or even quartiles to split RTs into four classes instead of two as in this study. A stochastic way of finding a cutoff value that best explains the data can also be considered, as was done in Molenaar and De Boeck (2018). Rather than dichotomizing RTs, still another possibility might be to use RTs in a continuous way and examine the psychometric heterogeneity across RTs by explicitly modeling the effects of RTs on item parameters (Bolsinova et al., 2017).
Another limitation concerns the interpretation of class differences in item parameters seen in this study. Although it is common to take such differences as a reflection of processing differences, another perspective can view these differences more as artifacts associated with the presence of person-by-item interactions that simultaneously influence both item response and RT. Lyu et al. (2023) note the potential for item parameter heterogeneity across response classes that has no psychological meaning but is instead a consequence of omitted underlying causes of both the item response and RT. Along these same lines, although our study examined measurement variability conditional upon RT, our study does not address effects concerning the directionality of effects between item scores and RT, nor the possibility that there are no directional effects, only associations. As with other studies, we are limited in our ability to speak to these possible effects given our ability to only observe the item scores and associated RTs and no other aspects of response process.
We are naturally cautious in generalizing the results of this study as different findings can be observed with different populations or tests. For instance, the three datasets used in this study were all collected from online respondent samples that had no incentive for participation and may not always have been highly motivated. It is plausible that using data from other types of tests collected in different ways (particularly in a way that can draw highly motivated respondents) might lead to different findings. Lastly, it would be interesting and useful to use other types of process data, such as eye tracking or brain imaging data, which are becoming more available, to validate the findings from RTs or to further improve the understanding of item response process in rating scale measurement.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
