Abstract
Careless responding measures are important for several purposes, whether it’s screening for careless responding or for research centered on careless responding as a substantive variable. One such approach for assessing carelessness in surveys is the use of an instructional manipulation check. Despite its apparent popularity, little is known about the construct validity of instructional manipulation checks as measures of careless responding. Initial results are inconclusive, and no study has thoroughly evaluated the validity of the instructional manipulation check as a measure of careless responding. Across 2 samples (N = 762), we evaluated the construct validity of the instructional manipulation check under a nomological network. We found that the instructional manipulation check converged poorly with other measures of careless responding, weakly predicted participant inability to recognize study content, and did not display incremental validity over existing measures of careless responding. Additional analyses revealed that instructional manipulation checks performed poorly compared to single scores of other alternative careless responding measures and that screening data with alternative measures of careless responding produced greater or similar gains in data quality to instructional manipulation checks. Based on the results of our studies, we do not recommend using instructional manipulation checks to assess or screen for careless responding to surveys.
Introduction
Recent research has shown that participants often engage in careless responding, a behavior in which respondents provide responses without regard for item content (Meade & Craig, 2012). Careless responding can inflate (Huang, Liu, et al., 2015) or deflate relationships between variables and bias factor analysis results (Huang et al., 2012). To address these undesirable effects, screening survey data for careless responding is important for researchers and practitioners, which necessitates accurate measurement of careless responding (see Kim et al., 2018). In this study, we use a nomological network to examine the construct validity of one popular index of careless responding in the context of surveys—the instructional manipulation check (IMC; Oppenheimer et al., 2009). Below, we describe the IMC approach and then discuss the nomological network we used to evaluate the construct validity of the IMC as a measure of careless responding to surveys.
Instructional Manipulation Checks
IMC Items Adapted From Oppenheimer et al. (2009) Administered in Sample 1 and Sample 2.
Note. Correct answer in bold. The instructional statement is posed near the end of each item. Item for Sample 2 was taken from Oppenheimer et al. (2009), whereas item from Sample 1 was adapted from the original Oppenheimer et al. (2009) item.
Validity
Despite this popularity, very little is known about the validity of an IMC as a measure of careless responding to surveys. Some studies have suggested moderate convergent validity evidence with other survey careless responding indices (Maniaci & Rogge, 2014), whereas others have indicated poor convergent validity evidence (Conway et al., 2019). No study, however, has examined the convergent validity of an IMC using best practice in careless responding measurement. Best practice measurement is important for examining convergent validity because the external measures need to have strong construct validity evidence to rule out the possibility that poor convergence between the IMC and external measures was observed because of deficiency in the external measures. Prior studies have relied on measures that have questionable validity evidence, such as raw response time (see Bowling et al., 2023; e.g., Conway et al., 2019) or post hoc analyses (Maniaci & Rogge, 2014).
Furthermore, there are logical grounds to question the validity of IMCs as measures of careless responding to surveys. First, IMCs may require a level of effort to pass that is far higher than the level of effort necessary for responding attentively to survey items. Whereas survey items typically require participants to just attend to single-sentence statements (e.g., IPIP-300; see Goldberg et al.,2006 1 ), IMCs require participants to comprehend 2 a lengthy paragraph. Using an IMC to screen survey data may therefore needlessly omit participants who have, in fact, displayed a sufficient level of effort for survey items. Put simply, IMCs may have poor specificity for assessing careless responding to surveys. There is an empirical basis for this argument: Several studies have found that IMCs flag a considerably large number of participants (14–46%; Conway et al., 2019; Klein et al., 2014; Maniaci & Rogge, 2014; Oppenheimer et al., 2009). These failure rates exceed prevalence estimates of careless responding obtained using other measures (e.g., 3%–12%; see Meade & Craig, 2012).
Additionally, because IMCs have a distinctive appearance (see Table 1), they may have poor sensitivity in crowd-sourced samples, which are often composed of experienced participants (see Hauser & Schwarz, 2016). In crowd-sourced research platforms (e.g., Amazon’s MTurk), where respondents have often participated in several previous studies, careless participants may learn to identify and respond correctly to IMCs. This could be facilitated by IMCs’ distinctive appearance (see Hauser & Schwarz, 2016). This limitation of IMCs may be exacerbated when researchers follow prescribed best practices of limiting participation in their study to experienced respondents—the very people who are likely to have experience with IMCs and thus know how to circumvent them.
Lastly, IMCs are less practical to administer than are other measures of careless responding. Whereas IMCs require their own page and cannot be included seamlessly into a battery of traditional survey items (see Table 1), other methods do not require the addition of special items (e.g., page time), pre-planning (e.g., psychometric synonyms), and can be included seamlessly into a traditional survey (e.g., infrequency items). The relative impracticality of the IMC further justifies the need for an examination of its construct validity.
Item-Level Data for Infrequency Scale Items Compared to Instructional Manipulation Check.
Note. N = 478 for Sample 1. N = 284 for Sample 2. *p < .05. ∧denotes that Steiger’s z test indicated the correlation between the criterion and the page time score or infrequency item is significantly larger than the correlation between the IMC and the criterion. 1 indicates the item is from Meade and Craig (2012). 2 indicates the item is from an unpublished work. (p) indicates that the item is positively scored and 6 (Agree) and 7 (Strongly Agree) are were scored as careful (0). On items without a (p), 1 (Strongly Disagree) and 2 (Disagree) are scored as careful (0).
3Given that Sample 1 was composed of MTurk workers, we dropped the item “I am enrolled in a psychology course currently” because it cannot be reasonably assumed that MTurk participants are enrolled in a psychology course.
Nomological Network
Convergence With Other Measures of Careless Responding
Measures of a given construct should correlate strongly with other measures of that same construct (e.g., see Bowling et al., 2023). Given that research has suggested that careless responding is consistent within and between assessments (see Bowling et al., 2016), this prediction should hold true in the context of careless responding measurement (i.e., careless responding indices at one point in a survey should correlate with careless responding indices at another point). Therefore, we examine whether the IMC correlates strongly with other measures of careless responding that have displayed promising validity across a multitude of studies, namely, (a) infrequency items, (b) page time, and (c) a standardized careless responding index. Below, we discuss each index.
Infrequency Scale
Infrequency scales contain items with “clear correct answer[s]” (Meade & Craig, 2012, p. 441). Given that the majority of attentive participants should provide the obvious correct answer, these items should reflect individual differences in effort. An example item is “I have never brushed my teeth” (Meade & Craig, 2012). Virtually all attentive participants should disagree with this item, whereas a careless participant may endorse it.
Studies have found that infrequency scales (a) display strong convergent validity evidence (Bowling et al., 2021; Gibson & Bowling, 2020), (b) predict participant inability to recognize study content (Bowling et al., 2021), and (c) do not evoke negative reactions from participants (Huang, Bowling, et al., 2015). Given that research has found that infrequency items function well, we investigated whether an IMC converged with an infrequency scale.
Page Time
Page time measures careless responding through dichotomized page-level response time (Huang et al., 2012). The measure assumes that items require a minimum amount of time for an attentive response (Bowling et al., 2023). Researchers typically score page-level response time using a 2-s-per-item cutoff, a cutoff which recently received empirical support (see Bowling et al., 2023). Page time has (a) displayed strong convergent validity with other measures of careless responding (Huang et al., 2012; Ward & Meade, 2023), (b) strongly predicted participant ability to recognize study content, (c) displayed strong incremental validity over raw response time, and (d) displayed discriminant validity with verbal ability (Bowling et al., 2023). Given the validity evidence, we tested whether an IMC converged with page time.
Standardized Careless Responding Index
Bowling et al. (2016) introduced the standardized careless responding index (SCRI), which combines several indices of careless responding into an unweighted composite. The reasoning is that different post hoc careless responding indices (e.g., psychometric synonyms and antonyms, intra-individual response variability; see Dunn et al., 2018; Meade & Craig, 2012) each have unique limitations that can be controlled for by combining them into a composite (see Bowling et al., 2016). This practice leverages the principle of aggregation (see Rushton et al., 1983), which holds that an unweighted composite of several measurements is more reliable and valid than a single measurement. We examined whether an IMC converged with an SCRI containing three post hoc careless responding indices: (a) psychometric synonyms, (b) psychometric antonyms, and (c) intra-individual response variability (see Dunn et al., 2018; Meade & Craig, 2012).
Prediction of Participant Ability to Recognize Study Content
Bowling et al. (2023) introduced a measure of participant inability to recognize study content (IRSC), which embeds within a questionnaire 11 items with memorable content (e.g., “If my friends dared me to eat a live goldfish, I would probably do it”) and then later assesses each participant’s ability to recognize that content. The idea is that participants who engage in careless responding fail to encode item information and are therefore unlikely to recognize such information. Recent research has found that careless responding indices strongly predict IRSC scores (Bowling et al., 2023). Thus, we investigated whether an IMC predicted IRSC.
Incremental Validity Over Other Measures of Careless Responding
Novel or unestablished measures of a construct should demonstrate incremental validity over established measures. This fact is especially true for the IMC approach because it is often less practical than are other approaches. We therefore examined whether an IMC demonstrated incremental validity in predicting inability to recognize study content.
Exploratory Analyses
We conducted two additional sets of exploratory analyses to further examine the construct validity of IMCs for assessing careless responding to surveys. First, we examined whether an IMC outperformed single infrequency items or single page-time scores. Given that the IMC is a single-item measure of carelessness, it may be of interest to know whether the IMC is comparable or better than alternative single-score indices of carelessness.
Second, we examined whether screening data using an IMC or other careless responding indices produced increases in data quality. Given that careless responding is known to harm data quality, screening data should produce gains in survey data quality (see Maniaci & Rogge, 2014; Oppenheimer et al., 2009). Specifically, we examined the extent to which screening data using an IMC produces improvements in coefficient alpha of five personality inventories. 3
Present Research
Using a nomological network (see Table 2), we examined the construct validity of an IMC as a measure of careless responding within two samples: an MTurk sample and an undergraduate sample. In each sample, we investigate whether an IMC converged with established measures of careless responding, predicted IRSC, displayed incremental validity over existing careless responding indices, outperformed other single-score indices of careless responding, and produced improvements in data quality when used to screen data.
Method
Participants
Sample 1 was composed of 478 MTurk workers 4 (59% male; mean age = 38.05). We compensated each MTurk worker $2.00. To ensure our results were generalizable to typical research using MTurk, we followed best practice recommendations (see Aguinis et al., 2021) when sampling MTurkers. Specifically, we required MTurk workers to have an accepted HIT rate >95%, 5 to have completed more than 100 HITs, and to be located within the United States. For Sample 2, participants were 284 undergraduate students (75.45% female; mean age = 20.21 years) who participated in the study in exchange for course credit.
Measures
In Sample 1, the main questionnaire consisted of 340 items that we randomly distributed across 17 pages—288 of these items were IPIP items (Goldberg et al., 2006), 10 were infrequency items from Meade and Craig (2012), 11 were target items required for IRSC, and the remaining pilot survey items that are for another project (example item: “I enjoy talking to people that I like”). In Sample 2, the main questionnaire consisted of 380 items randomly distributed across 19 pages. The majority of these items (300) were from the IPIP NEO-PI (Goldberg et al., 2006), 10 were infrequency items from two sources (i.e., Meade & Craig, 2012 and an unpublished work), 11 were target items required for IRSC, and the remaining items were filler items that assessed a variety of other constructs. In both samples, we (a) randomly administered pages to participants, (b) administered each item on a 7-point Likert scale, and (c) administered an 11-item IRSC assessment after the main questionnaire (see Bowling et al., 2023).
Computing Careless Response Indices
We calculated scores for IRSC and four careless responding indices—(a) an IMC (see Table 1), (b) an infrequency scale, (c) page time (see Bowling et al., 2023), and (d) an SCRI containing three careless responding indices (see Dunn et al., 2018; Meade & Craig, 2012): psychometric synonyms, psychometric antonyms, and intra-individual response variability. We scored each careless responding index such that higher scores indicated greater careless responding.
IMC
In both samples, we included an IMC item adapted from Oppenheimer et al. (2009). We scored participants who gave the instructed response as attentive (scored as “0”), whereas incorrect responses were scored as careless (scored as “1”). In Sample 1, the IMC was displayed after the survey. In Sample 2, the presentation of the IMC was counterbalanced. 6 These presentations are consistent with previous works (e.g., Paas et al., 2018).
Infrequency Scale
In Sample 1, we used 10 infrequency items from Meade and Craig (2012) to assess careless responding. In Sample 2, we used 3 infrequency items from Meade and Craig (2012) and 7 items from an unpublished work (see Table 2). To score items, we used the method from Meade and Craig (2012), which scores two responses as attentive (see Table 2). 7
Page Time
We computed a sum score for page time by rescoring page-level response time for each participant. To do this, we applied the 2-s-per-item cutoff that has been supported by past research (Bowling et al., 2023). Specifically, we scored page-level response times that exceeded the 2-s-per-item cutoff as attentive (scored as “0”) and page-level response times that were quicker than the 2-s-per-item cutoff as careless (scored as “1”). We then summed these rescored page-level response time values to create the page time index.
Standardized Careless Responding Index
Means, Standard Deviations, and Correlations for Careless Responding Variables.
Note. N = 478 for Sample 1. N = 284 for Sample 2. * indicates p < .05. Coefficient α provided in parentheses along diagonal when possible. CR refers to careless responding. Note that the standardized CR index is an unweighted composite containing intra-individual response variability, psychometric synonyms, and psychometric antonyms. Psychometric synonym scores were multiplied by −1 so that greater scores reflected greater carelessness. In Sample 1, we were unable to compute psychometric antonyms because there were no antonym pairs (i.e., negative item correlations stronger than r = −.60), so the Sample 1 SCRI only contained psychometric synonyms and intra-individual response variability.
Inability to Recognize Study Content
We assessed participants’ inability to recognize study content using 11 target items and 11 quiz items from Bowling et al. (2023). This measure assesses item content recognition through embedding target items with salient content in the substantive questionnaire (e.g., “I would like to go skydiving”) and then later quizzing participants on that content at the end of the assessment (e.g., “Which of the following ‘extreme’ sports were you asked about earlier in the questionnaire?” with “Sky diving,” “Hang gliding,” “Scuba diving,” and “Bungee jumping” as response options). We scored correct quiz responses as attentive (scored as “0”) and incorrect quiz responses as careless (scored as “1”). We summed the participants’ rescored quiz responses to create an IRSC score.
Analytical Approach
Prior to the substantive analyses, we examined the scatterplot for the relationships between each pair of study variables. We did not find any evidence that our results were meaningfully affected by the presence of outliers. We examined correlations between the IMC and other measures of careless responding and participant inability to recognize study content to evaluate convergent and criterion-related validity, respectively. We used hierarchical regression to conduct incremental validity analyses. We compared correlations between single-infrequency items or page-time scores using Steiger’s z tests. To examine the efficacy of the IMC and other careless responding indices for screening survey data, we examined the improvement in scale coefficient alpha before and after data screening.
Results
Convergence With Established Measures
Within both samples, we found that the IMC converged poorly with the other careless responding indices (see Table 3). Specifically, the IMC displayed poor convergence with page time (Sample 1 r = .14; Sample 2 r = .12, n.s.), the infrequency scale (Sample 1 r = .25; Sample 2 r = .14), and SCRI (Sample 1 r = .28; Sample 2 r = .22). Follow-up Steiger’s Z tests indicated that the effect size estimates for the IMC were significantly (p < .05) smaller than for other careless responding indices (r = .66 to .80). Given that page time, the infrequency scale, and SCRI all converged highly with each other (r = .57 to .73 in Sample 1 and r = .63 to .82 in Sample 2), it is therefore unlikely that the poor convergence we observed for the IMC was the result of any poor performance of the non-IMC measures.
Prediction of IRSC
We examined whether IMC scores predicted participants’ inability to recognize study content (see Table 3). Across both samples, the IMC poorly predicted IRSC scores (Sample 1 r = .18, p < .05; Sample 2 r = .16, p < .05). Follow-up Steiger’s Z tests indicated that the effect size estimates for the IMC were significantly (p < .05) smaller than were those of the other careless responding indices (r = .59 to .78). Taken together, we found poor criterion-related validity evidence for the IMC but excellent criterion-related validity evidence for the other careless responding indices.
Incremental Validity
Incremental Validity Analyses for Samples 1 and 2.
Sample 1 N = 478. Sample 2 N = 284. *p < .05. Relative weights that have superscripts of a different letter are significantly different from each other. We used bootstraping (with 10,000 resamples) to test for the statistical significance of relative weights. CR refers to careless responding.
For the relative weights analysis, we examined predictive capability of the IMC in predicting IRSC relative to established measures (see Table 4). Specifically, we compared the IMC against page time, the infrequency scale, and SCRI. IRSC served as the criterion variable. We then tested (a) the significance of relative weights analysis weights and (b) whether weights for the IMC were significantly different from the weights of other measures. To address the latter, we used a bootstrapping procedure (10,000 replications). Each of the six relative weights analyses across Samples 1 and 2 indicated (a) that existing measures of careless responding significantly (p < .05) outperformed the IMC in predicting IRSC and (b) that the relative weight of the IMC was weak (.01–.02) in presence of other measures of careless responding. These findings suggest that IMCs provide virtually no incremental value beyond that of other indices.
IMC Performance Compared to Single-Score Measures of Carelessness
It is informative to compare the IMC to other single-item measures of careless responding, since the IMC’s status as a single-item measure may put it at a disadvantage. We thus correlated each infrequency item (see Table 2) with both overall page time and the SCRI. The infrequency items generally displayed relatively strong correlations with overall page time (Sample 1 mean r = .52; Sample 2 mean r = .48) and SCRI (Sample 1 mean r = .64; Sample 2 mean r = .42).The IMC, in contrast, displayed relatively weak correlations with overall page time and SCRI scores (r = .12–28). We observed a similar pattern when we correlated individual page time scores with the overall infrequency scale (Sample 1 mean r = .49; Sample 2 mean r = .60) and the SCRI (Sample 1 r = .53; Sample 2 mean r = .51). These results suggested that the IMC generally performed worse than did single-score versions of non-IMC indices.
Effect of Screening on Data Quality
Data Screening Results for Sample 1 and Sample 2.
Note. N = 478 for Sample 1. N = 284 for Sample 2. SCRI refers to standardized careless responding index. CR refers to careless responding. Discrepancies in sample sizes for the individual CR indices reflect differences in missing data for each index. Each sample included 10 infrequency items.
For Sample 1, personality was assessed using 24 neuroticism items, 27 openness to experience items, 21 extraversion items, 25 conscientiousness items, and 25 agreeableness items from the IPIP (Goldberg et al., 2006). For Sample 2, personality was assessed using the IPIP-300 (Goldberg et al., 2006), which consists of 60 items per Five Factor Model trait.
To identify cut scores for the page-time index, we adopted the method used by Soland et al. (2019): We classified any participant who was flagged by more than half of the page time scores as careless. This resulted in cut scores of 9 and 10 failed page time checks in Samples 1 and 2, respectively. Finally, because there is no published precedent for screening data using SCRI, we computed cut scores at every .2 interval. We then retained the cut score that displayed the strongest correlation with IRSC. Our rationale was that an optimal cut score value should display the strongest prediction of IRSC relative to less optimal cut score values. This method resulted in cut scores of −.6 and 4.2 for Samples 1 and 2, respectively.
Across the two samples, the non-IMC measures of careless responding produced similar or larger post-data cleaning gains in the internal-consistency reliability of substantive measure relative to that of IMC. In Sample 1, the IMC produced average post-cleaning gains in internal-consistency reliability of only .01; conversely, the non-IMC indices had average post-cleaning gains of .146 (infrequency), .096 (page time), and .116 (SCRI). In Sample 2, the IMC and non-IMC indices produced similar post-cleaning gains in internal-consistency reliability; however, the IMC screened out a far larger proportion of participants (65.49%) than infrequency (5.84%), page time (3.54%), or SCRI (5.73%). Taken together, the Sample 1 results indicated that the IMC was outperformed by other indices, whereas the Sample 2 results indicated the IMC produced similar gains while incurring a larger loss of participants.
Discussion
The IMC has emerged as an accessible and popular method for assessing and screening for careless responding (Oppenheimer et al., 2009). The purpose of the current research was to examine the construct validity of the IMC using a more extensive nomological network (see Table 2). We also gave particular attention to the effective measurement of the external variables we included in both studies (e.g., we employed best practices when computing non-IMC careless responding scores). Across our two samples, we found (a) that the IMC converged poorly with other established measures of careless responding, (b) that it weakly predicted participants’ inability to recognize study content, and (c) that it failed to consistently display incremental validity over established measures of careless responding. Further exploratory analyses indicated that the IMC failed to outperform single-item versions of the infrequency and page time indices. Furthermore, screening data using non-IMC indices produced greater or similar gains in internal-consistency reliability than did screening data with an IMC. These results suggested the IMC performed poorly as a measure of careless responding to surveys. Based on these findings, we recommend that researchers discontinue using IMCs to screen survey data for carelessness.
We should note that the current results differ from those reported in prior experimental research, which had found support for the validity of the IMC (Hauser & Schwarz, 2016; Oppenheimer et al., 2009). This previous research, however, has suggested that IMCs may be useful for screening participants from studies that require significant amounts of reading (e.g., Caleo, 2016; Chernyak-Hai et al., 2023; Oppenheimer et al., 2009). In contrast, we found that IMCs may be inappropriate for screening participants from survey studies. Surveys generally consist of brief, single-sentence items. The tendency to avoid reading lengthy text—the behavior assessed by IMCs—may therefore have little relevance to the quality participants’ responses to survey items. The current findings support this possibility: IMCs performed poorly in both samples as measures of careless responding. In short, IMCs may be useful for identifying participants who are likely to respond carelessly to materials that require a significant amount of reading; however, they are inappropriate for identifying participants who are likely to respond carelessly to surveys.
We also found that the IMC displayed distinct patterns in failure rates across our two samples. Whereas the IMC suggested that careless responding was relatively lower in Sample 1 (i.e., the MTurker sample) than in Sample 2 (i.e., the undergraduate sample), the non-IMC indices suggested that careless responding was relatively higher in Sample 1 than in Sample 2. The pattern we observed for the non-IMC indices is similar to the pattern reported within a recent meta-analysis (i.e., see Moore et al., 2023). Whereas previous research has used high IMC pass rates as evidence for the quality of MTurk samples (e.g., Hauser & Schwarz, 2016), our findings raise doubts about such conclusions. Given that (a) the failure rates within our MTurk sample were high when we used less conspicuous measures of careless responding (e.g., page time) and (b) that the IMC demonstrated relatively poor construct validity evidence, we offer an alternative interpretation of the high IMC pass rate of MTurkers: Simply put, MTurkers may have learned to recognize and circumvent IMCs. Again, this may be due to the IMC’s distinctive appearance (i.e., they consist of a lengthy block of text, which makes them conspicuous when placed among brief questionnaire items). An experienced MTurker can easily identify and thwart an IMC. Future research could further examine this possibility and whether this threat to validity is present in other conspicuous measures of careless responding (e.g., instructed-response items).
The pattern of failure rates for the undergraduate sample is also noteworthy. The majority of undergraduate participants (65.49%) in Sample 2 failed the IMC. This result is consistent with past research that has found high IMC failure rates for undergraduate samples (Conway et al., 2019; Oppenheimer et al., 2009). One explanation for the high failure rate is that IMCs confuse otherwise attentive undergraduate students, and therefore they display a high failure rate because of increased false positive cases. Another possible explanation for the high IMC failure rate within student samples is that the IMC requires more effort to pass than do other survey careless responding indices. As we discussed earlier, passing an IMC likely requires a greater amount of effort than does passing a page-time index or an infrequency item (i.e., because survey items are relatively brief and thus may require less effort to respond to carefully). The high failure rate on the IMC for the undergraduate sample, therefore, may have indicated that students generally responded to our study questionnaire with a level of effort that was sufficient for passing survey careless responding indices but was insufficient for passing an IMC. If this interpretation is correct, then using an IMC to screen survey data would lead researchers to omit too many participants (i.e., participants who have displayed sufficient effort to respond carefully to survey items but not to the IMC would be unnecessary removed).
Limitations and Future Research
We acknowledge a few limitations of our work that could be addressed by future studies. First, our research only examined the validity of the IMC as it was introduced and currently used—as a single-item measure of careless responding. Future research could examine whether multiple IMCs are more effective at assessing carelessness. We believe, however, that significant improvements are unlikely given that (a) single infrequency items and single page-time scores outperformed IMCs within both studies, and (b) due to the distinctive appearance of IMCs, participants who pass one IMC may pass any subsequent IMCs that appear within the same questionnaire (i.e., because carefully reading one IMC tips the participant off about subsequent IMCs). To address the latter, future research could examine the validity of IMCs that consist of fewer words than those used in the current research. If an IMC becomes sufficiently short, then it may function as an instructed-response item (e.g., “Please select disagree for this item”)—an approach that has been found to effectively assess careless responding (see Kam & Chan, 2018).
Second, to minimize the complexity of the results, our study did not examine screening efficacy at different cutoffs for page time, infrequency, and the SCRI. Exploring multiple cut scores may be necessary when using screening effects as a basis to examine the efficacy of measures because such a comparison confounds measures with cut scores. This issue is further complicated by the apparent lack of research on how to properly identify accurate cut scores on careless responding indices (see Ward & Meade, 2023). We attempted to address this issue by using methods identified in past research or by empirically justifying the cut score.
Finally, research is needed to further examine why MTurk participants (Sample 1) are more likely than student participants (Sample 2) to correctly respond to the IMCs. We don’t interpret these findings as suggesting that MTurk participants are particularly attentive; instead, they may suggest that MTurk participants—who are often savvy and experienced at completing surveys—are simply able to identify and thwart IMCs. Consistent with this interpretation, we found that the MTurk participants were more likely to be flagged by other, more surreptitious careless responding indices than were the student participants. Further research should extend this finding by directly examining the relationship between research participation experience (e.g., number of studies completed) and the ability to identify the purpose of and thwart IMCs.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
