Abstract
This manuscript aims to present a novel behavioral impulsivity test ImGo, which is suitable for impulsivity assessment in the general population. A series of three studies was conducted to validate its psychometric qualities. In Study 1 we describe the principles of ImGo and verify its test-retest and split-half reliability and its convergent validity with an impulsivity self-report scale and Stop Signal test. In Study 2 we re-analyze the convergent validity of ImGo with a Stop Signal test and examine the potential relationship between ImGo and oculomotor inhibition measured by an Anti-Saccades test. In Study 3 we present a robust research with a large sample size and investigate the discriminant validity of ImGo with tests of other related cognitive and executive processes. Backed by our findings from these studies we can safely claim ImGo is a powerful tool with a good level of reliability (both test-retest and split-half) and validity (convergent and discriminant). Its potential lies in its use in diagnostic and research practice of experts from various countries as the test has already been translated to 9 languages so far. The open-source Hypothesis platform, on which the ImGo test is running, provides the option of both individual and group testing in laboratory conditions as well as remotely through an internet browser.
Keywords
General introduction
Impulsivity is defined as a tendency to act prematurely without foresight (Dalley et al., 2011) and is considered a multidimensional construct. For example, Patton et al. (1995) proposed a three factor model consisting of attentional impulsivity, motor impulsivity and non-planning impulsivity; Nigg (2000) distinguished between four processes which generate impulsivity: interference control, cognitive inhibition, behavioral inhibition and oculomotor inhibition; Lynam et al. (2006) described five dimensions of impulsivity: negative urgency, positive urgency, lack of premeditation, lack of perseverance and sensation seeking; and Dalley et al. (2011) differentiated between stopping and waiting behavioral impulsivity.
The multitude of conceptual definitions for impulsivity is naturally reflected in a large number of methods used to measure it. Authors commonly use standardized self-report scales (e.g., Lynam et al., 2006; Patton et al., 1995) or performance tests based on variations of the Go/NoGo test, anti-saccades test and Stop Signal test. This diversity of measurement has naturally led to a general lack of intercorrelations between methods, which suggests that impulsivity manifests at the behavioral level in different ways (Dalley et al., 2011).
Consequently, impulsivity is associated with many other psychological and psychiatric phenomena, for example with attention disorders (Metin et al., 2012), drug addictions (Smith et al., 2014), personality disorders (Coffey et al., 2011), mental disorders (Enticott et al., 2008), interference control (Aichert et al., 2012), and intelligence (Lynam et al., 1993). Research into impulsivity has also a large potential in clinical psychology and psychiatry as an important predictor of other disorders within clinical populations (Moeller et al., 2001). The construct of impulsivity is also a core symptom domain in many various clinical diagnoses in the Diagnostic and Statistical Manual of Mental Disorders V (cf. Berlin & Hollander, 2014). Therefore, a large part of the research of impulsivity is performed on clinical and subclinical populations.
The present paper consists of three studies and its aim is to introduce a novel impulsivity test called ImGo, which is based on the Go/NoGo test and offers objective testing of behavioral impulsivity. For several reasons we believe that ImGo can be perceived as an improvement to the existing battery of similar objective methods for impulsivity assessment. First of all, our method is standardized on quite a large population, which is an issue with most of the available methods as reported by Matusiewicz (Matusiewicz et al., 2011). Even though there are other similar methods with relatively good psychometric properties such as Conners' Continuous Performance Test (CCPT II, Conners, 2000; Conners et al., 2003; Shaked et al., 2020) the ImGo has several properties that make it quite useful for especially large scale assessment. The test was implemented on the online testing platform Hypothesis (Šašinka, Morong, et al., 2017) that requires only an internet connection, mouse and keyboard and standard web browser, no special equipment or software is needed. The Hypothesis software solution thus allows quick online administration on multiple (up to several dozen) participants in a parallel fashion. Additionally, its module-based architecture allows for quick adaptation of the test (e. g. into other languages). Therefore, we believe that ImGo is suitable for testing both within a general population (research, traffic psychology, organizational psychology, etc.) and for clinical purposes (diagnosis of ADHD, differential diagnosis of dyslexia, pathological gambling, etc.).
In the first study, we describe the ImGo instrument and verify its test-retest and split-half reliability and its convergent validity with an impulsivity self-report scale and Stop Signal test. In the second study, we re-analyze the convergent validity of ImGo with a Stop Signal test and examine the potential relationship between ImGo and oculomotor inhibition measured by an Anti-Saccades test. The third study provides robust research with a large sample size and investigates the discriminant validity of ImGo with other related cognitive and executive processes, such as intelligence, attention and inference control.
Study 1
Introduction
The aim of Study 1 is to describe a newly developed ImGO test for measuring behavioral waiting impulsivity and gather evidence of its test-rest reliability, split-half reliability and convergent validity with other behavioral and self-report measures of impulsivity. Even though reliability estimation is relatively common in self-report questionnaire assessment, its thorough psychometric evaluation is not less crucial in the performance measures of cognitive processing, as high reliability reduces measurement error (Parsons et al., 2019). Nevertheless, even evidence of high reliability is not sufficient in psychological research if the method lacks validity, which, however, usually cannot be expressed – unlike reliability – by a certain number. We therefore also analyzed convergent validity (i.e., the relationship between two measures of a construct that should be theoretically related) of the ImGo with behavioral stopping impulsivity measured by a Stop-signal test and a self-report Impulsive Behavior Scale. Since we did not test concurrent validity (i.e., two instruments which measure the same construct), we did not expect strong correlations between methods. However, since all three tests were expected to measure different aspects of impulsivity, they should have been at least weakly associated (e.g., Cyders & Coskunpinar, 2011; Reynolds et al., 2006).
Methods
Tests
ImGo test. The ImGo test was designed by Řádová et al. (2018) as an instrument for behavioral measurement and analysis of waiting impulsivity (Dalley et al., 2011) on go/no-go trials. Stimuli are designed as geometric shapes, more specifically, square, cross, triangle, line, star and hexagon. Participants are instructed to press a spacebar as quickly as possible whenever they see a cross, triangle, line, star or hexagon (i.e., go trials), and to inhibit pressing the key when presented with a square (i.e., no-go trials; see Figure 1(a)). ImGo has the following general procedure: 1) fixation target presentation (500 ms), 2) stimuli presentation (200 ms or key press), and 3) blank screen (900 ms or key press; see Figure 1(b)).

(a) NoGo and Go stimuli. (b) Procedure of one ImGo trial.
ImGo contains 28 training trials with feedback and 12 training trials without, plus 300 test trials (48 no-go and 252 go items; see Figure 2). The test is administered in a pseudo-random order, with a 30 second break in the middle of the test part. The whole test takes about 15 minutes.

Overall ImGo procedure.
Three metrics are usually reported in the family of go/no-go methods (Meule, 2017, p. 1) commission errors (i.e., incorrectly pressed button in no-go trials; also called false alarms), 2) reaction times in go-trials, and 3) omission errors (i.e., incorrectly not pressing the button in go trials; also called misses). However, due to the asymmetric distribution of RTs, which is usually right-skewed (Van Zandt, 2002), scholars sometimes work with (besides the above-mentioned) coefficient of variation (i.e., relative standard deviation; calculated as
Stop-signal test. The first method used to evaluate the convergent validity of ImGo is the stop-signal test (SST). SST is an instrument for stopping impulsivity assessment (see Dalley et al., 2011). The test was originally created by Fabianová (2019) with respect to the general recommendations for SST tasks (cf. Verbruggen et al., 2019). As with the Go/NoGo paradigm, the SST contains go and stop (no-go) trials. In the go trials, the participant must press the left or right button as quickly as possible depending on the direction of the arrow (Figure 3). In the stop trials, a stop symbol (violet rectangle around the arrow) appears to signal the participant not to press any button (Figure 3). The stop signal trials vary in interval between stimulus onset and the stop symbol appearance, a so-called stop-signal delay (SSD). In other words, the stop symbol in the stop signal trials appears with delays of various duration. Seven different SSDs are used (150 ms, 200 ms, 250 ms, 300 ms, 350 ms, 400 ms and 450 ms). One trial cycle is as follows: 1) fixation target presentation (800 ms) and 2) stimulus presentation (800 ms or key press; Figure 3).

Procedure of the SST’s trial with go/stop symbol.

Overall SST procedure.
In total, the test contains 720 trials (180 stop signal trials and 540 go trials) divided into six blocks (Figure 4). Each of the six blocks contains 30 stop signal and 90 go trials; the order of the trials in each block is pseudorandom. The whole test takes about 25 minutes.
Several metrics are commonly reported in the SST family of tests: a) go omission errors (i.e., no response) on go trials, b) choice errors on go trials, c) choice errors on unsuccessful stop trials, d) go RTs, and e) stop-signal RTs (SSRT; i.e., the time it takes to complete the inhibitory process after the appearance of the stop-signal; Verbruggen et al., 2019). However, because SSRT cannot be observed directly (unlike go RT), is it typically estimated within the independent horse-race model (Logan & Cowan, 1984), which offers various estimation procedures and possible behavioral metrics (for review, see Band et al., 2003; Verbruggen et al., 2019; Verbruggen & Logan, 2009). Since all SSDs were set a priori, we used the fixed-SSDs procedure (alternatively, a tracking procedure could have been used in the case of adaptive adjustments of SSDs based on the performance of the participants; cf. Verbruggen & Logan, 2009). The SSRT was estimated for each participant per each SSD block through the integration method, which in many ways outperforms the most commonly used mean method of estimation (Verbruggen et al., 2013), and subsequently averaged into a single score (Band et al., 2003).
UPPS-P & S-UPPS-P
The second method for ImGo convergent validity assessment is a Czech mutation of the self-report Impulsive Behavior Scale (UPPS-P), translated and validated by Linhartová et al. (2017). UPPS-P is a 59-item self-report questionnaire (sample item e.g., When I am upset, I often act without thinking; Lynam et al., 2006), which is composed of five subscales, namely: negative urgency, positive urgency, lack of premeditation, lack of perseverance and sensation seeking. Participants answer the items on a four-point ordinal scale (anchors: 1 – agree strongly to 4 – disagree strongly. A briefer version of the scale, S-UPPS-P (see Cyders et al., 2014), contains 20 items (four items per subscale).
Power analysis
An a priori power analysis was performed in G*Power (v3.1.9.7; Faul et al., 2009). Since previous research has reported rather small effect sizes for relationships between behavioral impulsivity and self-report questionnaires (e.g., Cyders & Coskunpinar, 2011) and relationships between two behavioral impulsivity instruments (e.g., Reynolds et al., 2006), we set the effect size to 0.30, significance level to .05 and power to .80 for two-tailed bivariate correlations. The power analysis results with these settings suggested that the total sample size should be at least 84 participants in order to achieve sufficient power.
Sample and procedure
We gathered data from 81 participants aged 19–45 years (M = 23.05, SD = 5.30). Most were women (n = 51, 63%) and had high school education (n = 60, 74%). Participants were tested twice in separate sessions. The time interval between the sessions was between 1 and 28 days (M = 8.09, SD = 5.21). Participants were contacted through student groups on social networking services. Participants received small rewards (e.g., USB flash drives) for their participation. Participants were asked not to consume alcohol and caffeine before testing. They were also examined for hand dominance and asked to use their dominant hand to press the button. Informed consent was obtained from participants before administration of the test battery.
The SST and ImGo tests were administered on SW Hypothesis (Šašinka, Morong, et al., 2017), the UPPS-P was administered via Google Forms. Data were collected under controlled conditions in a computer room at Masaryk university (common PC mouse, keyboard and LCD screen 22’ with resolution 1920 x 1080). All statistical analyses were performed in R (v3.6.2; R Core Team, 2020), especially with the packages lavaan (v0.6–6; Rosseel, 2012), semTools (v0.5–2; Jorgensen et al., 2018) and irr (v0.84.1; Gamer et al., 2019) and in JASP (v0.11.1.0).
Data inspection and cleaning
We set several criteria for data exclusion before data inspection and statistical analysis. Because the three methods used were of different character, we described the exclusion criteria for each separately. In the ImGo test, reaction times with anticipation reactions (i.e., participants reacted more quickly than he/she were able to even see the stimuli, because the stimulus did not appear yet – RTs less than 500 ms in case of ImGo) were not included in the computation of mean RTs. Mean RTs were calculated only from correctly answered go trials without errors (i.e., RTs were less than 1600 ms). Two participants were excluded because of a high number of ImGo omission errors (greater than 10%; based on Congdon et al., 2012).
In the SST test, individuals with an absolute response rate (i.e., probability of correctly responding on a stop trial was equal to 0% or 100%) were omitted from the SSRT estimation (this exclusion criterion is based on Congdon et al., 2012, who recommend excluding extreme values close to 100% or 0% of correctness in the fixed-SSDs procedure). RTs (both go and stop-signal) were calculated only from trials without errors. Four participants were excluded because of a high number of SST go errors (i.e., go commission errors + go choice error were greater than 15%; on the basis of a rule proposed by Congdon et al., 2012). As recommended, for example by Ratcliff (1993), we avoided any other type of rule-of-thumb removal of RT outliers (e.g., ±2SD) in both methods because of the already mentioned specific skewed data distribution typical for RTs.
In addition, five participants did not participate in the second testing and were excluded from the test-retest analysis for both the ImGo and SST.
Regarding the UPPS-P and S-UPPS-P scale, one item forming part of both versions of the questionnaire was discarded for technical problems which occurred during administration (item not recorded for most participants).
To choose a suitable statistical procedure, the assumptions of the correlation analyses were tested. We inspected data distributions, performed Shapiro–Wilk tests and analyzed the presence of outliers. We applied Spearman’s rank correlation (rs) and Pearson’s product moment correlation (rp) coefficients depending on the acquired results (de Winter et al., 2016). All p-values obtained from the correlation analyses were corrected with the Holm–Bonferroni method for multiple comparisons to reduce any potential type I error rate (Curtin & Schulz, 1998). Regarding the test-retest reliability of ImGo, intraclass correlation coefficients (ICC) with two-way random effects model were calculated (Weir, 2005).
Results
The aim of study 1 was to verify the psychometric properties of behavioral instruments for measuring impulsivity. In the first step, we examined the test-retest reliability of ImGo and SST which was satisfactory for both ImGo indicators (r = .418 – .717; ICC = .355 – .653) and the SST indicators (r = .424 – .732), suggesting that both instruments were stable in time (Table 1). In the next step, we performed split-half reliability estimation on two identical halves from the first wave of testing. The results also suggested satisfactory reliability for both ImGo (r = .504–.733) and SST (r = .696–.836; Table 1). All pairwise relationships showed medium to high correlation coefficients and were statistically significant. Their confidence intervals did not contain the value of zero effect (Table 1). Hence, both methods could be considered sufficiently reliable.
Descriptive statistics and reliability estimation of the ImGo and the SST.
*p < .05 (two-tailed); **p < .01 (two-tailed); ***p < .001 (two-tailed); [ ]: 95% confidence intervals; M: mean; SD: standard deviation; Me: median; IQR: interquartile range; S-W: Shapiro–Wilk test; SST: stop-signal test; RT: reaction time; CV: coefficient of variance; SSRT: stop-signal reaction time; ICC: intraclass correlation coefficient.
The reliability and validity of the U-PSS-P questionnaire were subsequently verified. In the first step, we used confirmatory factor analysis to verify the five-factor UPPS-P configural model. Because of the small sample size, an MLR estimator was used instead of WLSMV since WLSMV yields moderate overestimation of the interfactor correlations at small sample sizes (Li, 2016). However, the 59-item version had unsatisfactory fit indices (χ2(1585) = 2674.425, p < .001, RMSEA [90% CI] = .091 [.085, .097], SRMR = .107, CFI = .611, TLI = .595). Consequently, the shortened version S-UPPS-P was also verified with much better fit indices (χ2(142) = 178.831, p = .020, RMSEA [90% CI] = .056 [.024, .080], SRMR = .092, CFI = .902, TLI = .882). Since the majority of fit indices are within commonly used rules-of-thumb (except values of SRMR and TLI), the configural model fit can be thus considered satisfactory for our purposes. Regarding internal consistency, the McDonald’s ω coefficients of the Czech S-UPPS-P varied between .623−.789 (total .841, M = .704), which is considered satisfactory in terms of a low number of items per scale (Table 2).
Descriptive statistics and reliability estimation for S-UPPS-P.
[ ]: 95% confidence intervals; M: mean; SD: standard deviation; S–W: Shapiro–Wilk test; NU: negative urgency; PU: positive urgency; LPR: lack of premeditation; LPE: lack of perseverance; SS: sensation seeking.
Finally, the convergent validity of the ImGo test with the SST and the S-UPPS-P was verified. Only variables relevant for impulsivity measurement were included into the correlation analysis. Therefore, ImGo omissions, SST go omission errors, SST go choice errors and SST go RT were omitted. Results show that the SST stop choice errors correlate with ImGo commissions (rp = .509). Correlations between the SST stop choice errors and ImGO RT (rp = −.284) and ImGo CV (rs = .294) were statistically significant only without correction for multiple comparisons (p < .05). Regarding convergent validity with the self-report questionnaire, the only significant correlation was observed between ImGo coefficient of variance and S-UPPS-P negative urgency subscale (rs = −.394), meaning that participants with higher level of negative urgency also showed lower variability in their responses. Nevertheless, we did not find any significant correlations between coefficient of variation of the ImGo test and the stop-signal reaction time of the SST test, which can be considered the most important indicators of behavioral impulsivity (Table 3).
Convergent validity of the ImGo with SST and S-UPPS-P instruments.
*pHolm < .05 (two-tailed); **pHolm < .01 (two-tailed); *** pHolm < .001 (two-tailed); [ ]: 95% confidence intervals; SST: stop-signal test; RT: reaction time; CV: coefficient of variance; SSRT: stop-signal reaction time; NU: negative urgency; PU: positive urgency; LPR: lack of premeditation; LPE: lack of perseverance; SS: sensation seeking.
Discussion
The aim of Study 1 was to verify the reliability and convergent validity of the ImGo instrument. Reliability was considered satisfactory for both the test-retest and split-half. Even though the findings suggested a lower ICC for ImGo omission errors, these two metrics are not necessary criteria for reliability since this indicator is primarily a control function. Moreover, similar test-retest results were observed by Hedge et al. (2018), who found that omission errors ICC were also lower than go RT and commission errors.
Regarding the convergent validity with self-report questionnaire, our results agree with previous studies. Meta-analysis of 28 studies of associations between the self-report and behavioral measures of impulsivity suggests that those associations are statistically significant yet very weak (r coefficients are lower than .30; Cyders & Coskunpinar, 2011). Other systematic review has even proposed that self-report and behavioral measures of impulsivity might reflect more distinct theoretical constructs (Newman & Meyer, 2014). More specifically, the vast majority of research has usually yielded a statistically significant correlation between the no/go-no method and only one subscale of UPPS-P or other self-report impulsivity questionnaires (particular subscales which demonstrate this association are inconsistent across studies, however) and shown that such correlations tend to be very weak (e.g., Aichert et al., 2012; Malesza & Ostaszewski, 2016; Perales et al., 2009; Reynolds et al., 2006; Spinella, 2004) or even non-significant (Hasegawa et al., 2019). In the terms of previous findings, a correlation found between ImGo and negative urgency (rs = −.394) appears not only strong but also crucial from theoretical point of view since this subscale better explains the inter-individual variability in behavioral impulsivity instrument than other subscales (see Wilbertz et al., 2014), and thus, it is an important criterion of convergent validity. However, this finding is interesting also due to its valency, which is opposite than expected. Our result suggests that participants with a high level of negative urgency showed lower variability of their reaction times which means that they were more stable during answering. This result is probably caused by the fact that self-report methods measure a different facet of impulsivity than cognitive tests.
Similar results can be observed in previous studies regarding the association between two behavioral measures of impulsivity – go/nogo and SST. For example, Aichert et al. (2012) and Hasegawa et al. (2019) did not find a statistically significant association, Reynolds et al. (2006) found only weak correlation, and Hedge et al. (2018) found medium correlation. Previous findings suggest that the go/nogo tasks and stop-signal tasks might measure related yet slightly different cognitive processes (e.g., waiting and stopping impulsivity) and simultaneously saturate motoric impulsivity. These processes seem to recruit widely different neural dynamics (Dalley et al., 2011; Raud et al., 2020), which might manifest especially in cognitive deficits. For example, people with ADHD have demonstrated high levels of both types of impulsivity (e.g., Rubia et al., 2001; Schachar et al., 2007), whereas people with addictions (for review, see Smith et al., 2014) and people with borderline personality disorder (e.g., Barker et al., 2015; Cackowski et al., 2014; Wright et al., 2014 Wright et al., 2014) have exhibited differences in their levels of waiting and stopping impulsivity.
In summary, the obtained results (medium correlations between ImGo and other applied methods) correspond to the above-mentioned dual view on behavioral impulsivity as related yet slightly different cognitive processes. However, we did not identify any significant association between the coefficient of variation of the ImGo test and the stop-signal reaction time of the SST test, which is also crucial for ImGo validity assessment. The convergent validity of ImGo can therefore be considered only partially established.
Study 2
Introduction
The primary aim of Study 2 was to explore the potential relationships between oculomotor inhibition and behavioral impulsivity measured by an anti-saccade test. Previous research suggests that a lack of oculomotor inhibitory control is commonly viewed as a key characteristic of impulsivity, and simultaneously, as a key cause of lack of inhibitory control in oculomotor behavior (Nigg, 2000). In Study 1, we also investigated convergent validity between ImGo and SST, which are two methods used in the assessment of stopping and waiting impulsivity. Since convergent validity was only partially established, we decided to replicate the study using both methods to obtain more compelling evidence.
Methods
Tests
Anti-saccade task. The Anti-saccade task (AST) is an eye-tracking instrument for the measurement of oculomotor inhibition (Fabianová, 2019). It is based on a standardized protocol described by Antoniades et al. (2013). The AST contains two subtests: 1) pro-saccade and 2) anti-saccade. In the pro-saccade subtest, participants follow the presented stimuli with their eyes. In the anti-saccade subtest, participants move their eyes in the opposite direction to the presented stimuli (Figure 5). The procedure of trials is as follows: 1) target fixation at the center of the screen (1000–3500 ms, M = 1500 ms), and 2) stimulus presentation (1000 ms; randomly, half at 10° VA from the right of the target, the other half at 10° VA from the left of the target; Figure 5(a) shows a sequence of two pro-saccade trials, one left from the center, one right. Figure 5(b) shows a sequence of two anti-saccade trials).

Procedure of the anti-saccade task: (a) Pro-saccade trials. (b) Anti-saccade trials.
The AST contains 22 training trials (11 pro saccade, 11 anti-saccade) and 260 test trials (80 pro-saccade, 180 anti-saccade) divided into five subtests: 1) 40 pro-saccade trials, 2) 60 anti-saccade trials, 3) 60 anti-saccade trials, 4) 60 anti-saccade trials, 5) 40 pro-saccade trials. A 10 second break follows each subtest (Figure 6 for the test procedure). The whole test takes about 15 minutes.

Overall procedure of the anti-saccade task.
Since the pro-saccade subtest adopts only a control role, the main metrics are computed from the anti-saccade subtest, namely the number of anti-saccade errors, anti-saccade velocity and saccade latency (Antoniades et al., 2013). Even though Antoniades et al. (2013) proposed the absolute amount of false anti-saccades as a main metric, we believe that a ratio of anti-saccade errors/corrects is more suitable due to the very high number of excluded responses. Hence, we incorporated this indicator into the following analyses.
Power analysis
Power analysis was calculated using G*Power (v3.1.9.7; Faul et al., 2009) with the following settings: effect size: 0.37, significance level: .05, power: .80, analysis: one-tailed bivariate correlations. Effect size was set on the basis of results of Study 1, the strength of association between ImGo and SST, where effect sizes of significant correlations varied between 0.284 and 0.509 (M = 0.370). According to the power analysis results, assessment of 41 participants should be enough to achieve sufficient power.
Since previous research into the associations between eye-movements and go/no-go scores have shown ambiguous findings (see Taylor, 2016), we decided to use a rather exploratory approach in this case. These relationships were therefore not estimated in the power analysis. A one-tailed estimation setting was also used because we assumed the same directions of expected correlation as reported in Study 1.
Sample and procedure
In total, we gathered data from 37 participants. Six participants were removed from the analysis during the process of data cleaning due to problems with their eye-tracking data (missing data, faulty detection of fixations), which resulted in 31 participants included in the data analysis. Participants were aged 18 n of years (M = 24.29, SD = 5.73), most of them were women (n = 20, 64.5%) with mainly a high school education (n = 20, 64.5%). The participants were contacted through student groups on social networking services. Participants received small rewards (e.g., USB flash discs) for their participation and signed an informed consent.
The SST and ImGo tests were administered on SW Hypothesis (Šašinka, Morong, et al., 2017) under controlled conditions in HUME Lab at Masaryk university (common PC mouse, keyboard and LCD screen 22’ with resolution 1920 × 1080); the AST was administered via SMI Experiment Center (v3.7). Eye-movement data were collected with an SMI RED eye-tracker with 500 Hz sampling rate with integrated 22″ monitor (Dell P2213) with resolution of 1680 × 1050 px. The data were subsequently processed in SMI Begaze (v3.7) software. All other statistical analyses were performed in JASP (v0.11.1.0).
Data inspection and cleaning
The same data cleaning procedure as reported in Study 1 was used for ImGo and SST. Two participants were removed from ImGo due to high (i.e., >10%) omission errors, and six participants were removed from SST due to high (i.e., >15%) go commission and go choice errors.
Regarding the AST, three criteria had to be fulfilled to keep a trial in the dataset: 1) first saccade had to begin less than 40 px (equivalent of one visual angle) from the starting point (fixation target); 2) saccades had to begin between 60 ms and 500 ms (indicators of a quicker than physiologically possible reaction and very slow reaction); and 3) a blink could not appear before the start of the saccade. If all three conditions were fulfilled at the same time, a saccade was considered valid. 50.7% (n = 4087) of the maximum number (n = 8060; computed as 31 participants × 260 saccades) of saccades fulfilled all three criteria. Furthermore, only those saccades which were correct in the sense of their direction were included in the statistical analysis.
Since the data contained some extreme outliers (cf. min and max values in Table 4) and were considerably skewed, non-parametric correlation estimates were used. In addition, because of the limited sample size, we calculated Kendall’s Tau-b rank correlation coefficient (τb), which is (compared to Spearman correlations) considered less biased and does not overestimate significant associations in small samples (e.g., Arndt et al., 1999). Because of the known directionality of significant associations from Study 1 (ImGo – SST relationships) that we wanted to replicate, we applied a one-tailed (OT) method of null hypothesis significance testing. Furthermore, statistical corrections for multiple comparisons are used only in the confirmatory analyses of ImGo – SST relationships. For exploratory analyses of ImGo – AST relationships, we used a two-tailed (TT) method without correction for multiple comparisons.
Descriptive statistics of the ImGo, SST and AST instruments.
M: mean; SD: standard deviation; Me: median; IQR: interquartile range; SST: stop-signal test; RT: reaction time; CV: coefficient of variance; SSRT: stop-signal reaction time; AST: anti-saccade task; C/F: correct/false.
Results
As mentioned above, the aims of Study 2 were: 1) to explore potential relationships between impulsivity and eye-movements, more specifically, between motor and oculomotor inhibition; and 2) to re-analyze the convergent validity of ImGo with the SST. The descriptive statistics of the basic metrics of all three methods used are summarized in Table 4.
In the correlation analysis between ImGo and AST metrics, we discovered that the number of AST false anti-saccades showed a statistically significant weak association with ImGo commissions errors (τb = .282) and ImGo reaction time (τb = −.291). ImGo RT also showed a weak significant association with the AST ratio of correct and false anti-saccades (τb = .282).
The replication of Study 1, an analysis of associations between ImGo and SST to ascertain the convergent validity of both methods, indicated two moderate correlations, more specifically, between the ImGO coefficient of variation and SST stop choice errors (τb = .333) and SSRT (τb = .431). However, only the latter was statistically significant after a Holm–Bonferroni correction for multiple comparisons (Table 5).
Convergent validity of the ImGo with SST and AST instruments.
*p/pHolm < .05; [ ]: 95% confidence intervals; (OT): one-tailed; (TT): two-tailed; SST: stop-signal test; RT: reaction time; CV: coefficient of variance; SSRT: stop-signal reaction time; AST: anti-saccade task; C/F: correct/false.
Discussion
The first aim of Study 2 was to examine the relationship between motoric impulsivity measured by ImGo and oculomotor inhibition measured by the AST. Even though we found only a few weak and statistically significant relationships (despite the rigorous statistical estimation), especially between the ImGO RT and the AST false anti-saccades and AST ratio of correct/false anti-saccades, our results corresponded to previous studies which used similar methods. For example, Aichert et al. (2012) found weak associations between go/nogo tasks and the AST (r = .134) and Spinella (2002) found medium association (r = .470). Taylor (2016) did not find any significant relationship between the AST and self-report impulsivity questionnaire. Therefore, is it possible that the oculomotor inhibition shares some general underlying process with motor impulsivity, yet both are in fact distinct processes. This assumption is also supported by neurobiological findings (Nigg, 2000).
The secondary aim of this study was to re-analyze the convergent validity of the ImGo with the SST. Study 2 revealed medium association between the main indicators of motoric impulsivity, i.e., between the SSRT and the ImGo CV, which is crucial for establishing convergent validity (for detailed discussion, see Study 1). Hence, we can conclude that the ImGo instrument not only has satisfactory reliability (Study 1) but also satisfactory convergent validity with SST (Study 1 & Study 2) and is suitable for further use.
In summary, Study 2 provided evidence for convergent validity of the ImGo test, which can be used as a reliable and valid tool for motoric impulsivity assessment, and also revealed the relationship between motoric impulsivity and oculomotor inhibition.
Study 3
Introduction
Study 3 aimed to verify the discriminant validity of the ImGo test with other cognitive and executive processes. Based on the evidence of reliability and convergent validity reported in Study 1 and Study 2, further information about the psychometric properties of the instrument process could result in comprehensive evidence of the construct validity for the ImGo instrument. As already mentioned, impulsivity is a multidimensional construct stemming from multiple specific cognitive and executive processes and is therefore often related to a number of other constructs. Previous research has identified potential relationships of impulsivity, for example with personality disorders (Coffey et al., 2011), mental disorders (Enticott et al., 2008), attention disorders (Metin et al., 2012), interference control (Aichert et al., 2012) or intelligence (Lynam et al., 1993). If we want to be certain that ImGo measures impulsivity and not the above-mentioned constructs (evidence of discriminant validity), we would expect very weak (if any) relationships between ImGo and measurements of these constructs. In Study 3, we explore the relationship of ImGo impulsivity indicators with tests of other cognitive functions, namely: Positional Interference Test (selective attention), Flagtest (selectivity, concentration and stability of attention), Test of Verbal Reasoning (verbal reasoning and visuospatial skills), Viennese Matrices Test (non-verbal intelligence), The Personality Styles and Disorder Inventory (subclinical personality disorders), Inventory of Neurotic Symptoms (neurotic symptoms).
Methods
Tests
Positional inference test. The Positional Interference Test (PIT) is an instrument selective attention measurement based on the Stroop test (Stroop, 1935), more specifically, on its modification called the Spatial Stroop Test (e.g., Hilbert et al., 2014). The family of Stroop tests distinguishes between automatic and voluntary cognitive processes and is commonly used for the psychological assessment of cognitive inhibition (Swerdlow et al., 1995) closely related to the concept of behavioral impulsivity (Nigg, 2000). The PIT (Šašinka, Čeněk, Morong, Malatincová, et al., 2017) consists of three subtests with 40 trials per subtest (120 trials in total). The three subtests are (Figure 7): 1) reaction to position in congruent trials (i.e., psychomotor speed; PIT1); 2) reaction to semantic meaning in congruent trials (i.e., automatic process; PIT2); 3) reaction to semantic meaning in incongruent trials (i.e., voluntary process; PIT3).

PIT subtests.
The main scores usually reported in similar tests are error rates, reaction times and the Stroop effect, computed as RTincongruent ncRTcongruent (e.g., Bugg et al., 2008; higher score means higher Stroop effect). However, it is also possible to report, for example words per second (Hilbert et al., 2014). Regarding psychometric properties, the computerized Stroop tests show in general high validity and reliability (for review, see Din & Tat Meng, 2019). The PIT metrics explored in previous studies have shown satisfactory test-retest reliability (rp varied between .743 and .813 for PIT RTs and rp = .624 for the Stroop effect; Helísková, 2016). The whole test takes about 10 minutes.
Flagtest
The Flagtest is an instrument for measuring the selectivity, concentration and stability of attention (Šašinka, Čeněk, Morong, Urbánek, et al., 2017). The Flattest is a modified PC-administered adaptation of the traditional D2 test of attention (Brickenkamp & Zillmer, 1998). The main difference stems from the replacement of letters with flags in order to avoid the potential effect of reading skills on test performance (Figure 8). Participants identify the target flag (i.e., a hoisted flag with exactly two points – irrespective of their positions). The Flagtest consists of 20 sets of flags, each set being composed of 36 flags. Each trial is for 14 seconds and contains 8 or 9 target flags. The whole test takes about 10 minutes.

Flagtest stimuli. (a) Examples. (b) Examples of correct answers and distractors.
The main score of the Flagtest is the average of correctly marked target flags across all subtests. The Flagtest has a high split-half reliability (rsb between .908 and .950; Šašinka, Čeněk, Morong, Urbánek, et al., 2017) and concurrent validity in error rate with a D2 test of attention (rs = .369; Tichá, 2017). Similar methods, for example the already mentioned D2 test of attention, show satisfactory reliability and validity in general (e.g., Bates & Lemay, 2004).
Test of verbal reasoning
The test of verbal reasoning (TVR) was created by Čeněk et al. (2017) on the basis of Baddeley’s (1968) grammatical transformation method, with special emphasis on sentence verification tasks (Clark & Chase, 1972). The test allows measurement of verbal reasoning and spatial skills (Kirschner et al., 2015). Moreover, a lack of verbal reasoning might also point to related deficits of selective attention (e.g., Nielsen et al., 2014) and work memory (e.g., Kane et al., 2005; Yuan et al., 2006).
Items consist of a sentence and combination of geometric shapes. Each item is a combination of two main dimensions: 1) negative and positive wording of sentence – argument (e.g., “The circle is not in a square.” or “The circle is in a square.”); and 2) congruence or incongruence of sentence and shape (e.g., “The circle is in a square” when the circle is indeed in a square) and incongruent (e.g., “The circle is not in a square,” when the circle is indeed in a square). Furthermore, two types of arguments and shapes are defined (in/out and above/below; see Figure 9). The TVR is composed of 40 pseudo-randomly presented items and takes around 7 minutes.

Example of the TVR’s stimuli. (a) Inside/outside condition. (b) Above/below condition.
The main scores of TVR are average correctness and the sum of RTs (Čeněk et al., 2017). The TVR demonstrated satisfactory predictive validity with other instruments for measuring spatial skills and verbal reasoning (Potyková, 2017).
Viennese matrices test
The Viennese Matrices Test (VMT; Formann et al., 2011) is a nonverbal intelligence test which measures a one-dimensional general “g” factor of intelligence (Spearman, 1904). The VMT is basically an adapted PC-administered version of the Raven progressive matrices (Raven, 1958) and conforms to the Rasch model (Rasch, 1960). The VMT consists of 24 matrices (each matrix has 1 correct answer and 7 distractors), and the matrices are arranged with increasing difficulty. In this study, we use a Czech version of VMT validated by Klose et al. (2002).
The most important score of the VTM is the error rate (or simply the number of correct answers; Klose et al., 2002). The entire test has a 20-minute time limit. The original validation study reported satisfactory internal consistency (α = .76–.81), split-half reliability (r = .83) and concurrent validity with Raven progressive matrices (r = .74–.92; Formann & Piswanger, 1979; Formann et al., 2011). The Czech version also showed satisfactory test-retest reliability (r = .71) and concurrent validity with the Raven progressive matrices (r = .92) and the IST – Intelligence Structure Test (r = .82; Klose et al., 2002).
The personality styles and disorder inventory
The Personality Styles and Disorder Inventory (PSDI/PSSI, Kuhl & Kazén, 2009) is a self-report inventory for the assessment of personality styles which are defined as non-extreme and non-pathologic personality disorders (Švancara, 2002). It contains 14 subscales with 10 four-point Likert type items each (1 – does not apply at all, 4 – fully applies; 140 items in total). The test takes around 30 minutes. The following personality styles are included in the scale: paranoid, schizoid, schizotypal, borderline, histrionic, narcissistic, avoidant, dependent, obsessive-compulsive, negativistic, depressive, altruistic, rhapsodic and antisocial subscales. The original PSDI has satisfactory internal consistency (α = .73–.85), test-retest reliability (r = .68–.83) and concurrent validity with the Big Five questionnaire (r = −.57–.73; Kuhl & Kazén, 2009). The Czech adaptation of the scale (Švancara, 2002) was used in this study. The number of items can be reduced to 36, which results in a shorter version (with six subscales: schizotypal, borderline, narcissistic, avoidant, obsessive-compulsive and antisocial) called PSDI-6. Previous research suggested that this shorter version also has satisfactory confirmatory factor analysis results and reliability estimations in research (Hain et al., 2016).
For the purposes of our research, only the borderline subscale (e.g., my feelings often change abruptly and impulsively), which measures impulsive manifestation in behavior and feeling, and the obsessive-compulsive subscale (e.g., even under time pressure, I cannot stop being thorough), which measures thoroughness, diligence and accuracy, are the relevant scales. Only they were therefore included in further correlation analyses. Both scales are represented in full PSDI and the shorter PSDI-6.
Inventory of neurotic symptoms
The Inventory of Neurotic Symptoms (N-70) is a Czech questionnaire created with the purpose to detect individuals who may be too sensitive for military service (Vacíř, 1973). Even though its main purpose is a psychological diagnostic in military hospitals, it is also used for research purposes (e.g., Flegr et al., 2012; see Appendix for an English translation of N-70). It consists of 7 subscales with 10 four-point Likert type items per subscale (1 – never, 4 – often; 70 items in total). The subscales are anxiety, depression, obsession-phobia, hysteria, hypochondria, psychosomatic symptoms and psychasthenia subscales. The N-70 has satisfactory internal consistency in most of the subscales (α = .628–.867; Flegr et al., 2018). The test takes around 15 minutes.
For the purposes of our research, we used the subscales that measure symptoms potentially related to impulsive behavior: hysteria (e.g., do you feel like fainting when you are strongly keyed up?), obsession-phobia (e.g., do you get severely out of balance when your daily habits are disturbed?), and psychosomatics symptoms (e.g., does your heart flutter or start to race easily in demanding situations?; Vacíř, 1973).
Power analysis
Since a correlation analysis performed on a relatively small sample size with low statistical power might lead to both underestimated and spuriously large correlation coefficients compared to the real effect (Button et al., 2013), we also performed a robust study with a large sample size, because, for example, correlation analyses tend to stabilize in approximately 250 observations (see Schönbrodt & Perugini, 2013). Hence, no a priori power analysis was required in this case. Post-hoc power analysis of sensitivity (α = .05, 1−β = .80) revealed that the critical r value for statistical significance was only ± .037 (without correction for multiple testing), which suggests very high power.
Sample and procedure
Besides the methods described above, the entire battery also contained the ImGo instrument. We collected data from 2860 participants in total (2426 participants fulfilled all methods). Participants were between 18 dat years of age (M = 29.42, SD = 9.43), were either candidates to join the Czech Armed Forces or current members of Czech Armed Forces, and the majority being men (n = 2508, 87.7%). Participants agreed with the testing procedure as well as with the collection of personal data. However, for secondary data analysis only a fully anonymized dataset was used. Since the Czech Armed Forces are composed only of 12.96% women soldiers (Ministry of Defense and Armed Forces of Czech Republic, 2020), the gender proportion of the sample corresponds to the gender distribution in the Czech Armed Forces. In order to identify any potential selection and sampling bias caused by unbalanced gender proportion, we compared arithmetic and weighted means and medians in all variables, which showed only negligible changes. Furthermore, we performed an analysis of gender differences in all variables. These results were mostly non-significant (those, which were statistically significant, were practically insignificant, due to the small effect sizes). Hence, we can conclude that the mentioned gender disproportion should not be biasing the results in any way (see Appendix).
Data collection took place from October 2018 to March 2020 at the Military hospital at Brno. The whole testing procedure took approximately two hours. All methods were administered through the SW Hypothesis (Šašinka, Morong, et al., 2017). Participants were tested in groups (max. 20 per group), and the hardware (common PC mouse, keyboard and LCD 22’ screen size and resolution 1920 x 1080 px) used was identical for all tests and all participants. All statistical analyses were performed in R (v4.0.0; R Core Team, 2020), which included the packages lavaan (v 0.6–6; Rosseel, 2012), semTools (v0.5–2; Jorgensen et al., 2018), TOSTER (v0.3.4; Lakens, 2018) and robust (0.5–0; Wang et al., 2020), and in JASP (v0.12.2.0).
Data inspection and cleaning
As with Study 1 and Study 2, all exclusion criteria were set an a priori data analysis. When a participant fulfilled the exclusion criteria in a specific test, they were excluded, but only from the particular test. The scores from other tests remained in the data matrix. The ImGo data cleaning procedure is described in the corresponding section of Study 1. Based on the exclusion criteria, we discarded 140 participants (mainly due to the high rate of omission errors).
Regarding the PIT, all RTs less than 200 ms and greater than 3000 ms should be excluded from the analysis before computation of the mean RTs (Bugg et al., 2008). However, no participant fulfilled this criterion for exclusion. Four participants with error rates greater than 50% (Lupiáñez & Funes, 2005) were removed from further analysis.
The exclusion criteria of the error rate were set to 43% for the Flagtest and 50% for the TVR, based on the Czech validation studies and their population norms (Čeněk et al., 2017; Šašinka, Čeněk, Morong, Urbánek, et al., 2017), resulting in 48 excluded participants from the Flagtest and 31 excluded from the TVR. Regarding the exclusion criteria of the error rate for the VMT, participants with less than 8 correct answers (i.e., less than 33.3% of the VMT score) were excluded since this score could indicate misunderstanding of the test, temporary cognitive impairment or intellectual impairment or intellectual disability. This led to the exclusion of 141 participants.
The assumptions of parametric correlation and regression analyses were not met due to skewed non-normal data distributions, statistically significant Shapiro–Wilk tests (Table 6) and the presence of outliers (de Winter et al., 2016; Yu & Yao, 2017). Hence, we used non-parametric Spearman’s rank correlation coefficients (rs) and robust multiple multivariate linear regressions (MM-estimation) with standardized robust beta coefficients (βr) in all cases. As with Study 1 and Study 2, all p-values obtained from the correlation analyses were corrected for multiple comparisons with the Holm–Bonferroni method.
Descriptive statistics and reliability estimation of the ImGo, the PIT, the Flagtest, the TVR and the VMT and PSDI and the N-70 methods.
[ ]: 95% confidence intervals; M: mean; SD: standard deviation; Me: median; IQR: interquartile range; S–W: Shapiro–Wilk test; RT: reaction time; CV: coefficient of variance; sum: suma; RT: reaction time; CV: coefficient of variance; PIT: positional inference test; TVR: test of verbal reasoning; VMT: Viennese matrices test.
Conditional equivalence testing (Campbell & Gustafson, 2018) can be performed if traditional null hypothesis significance testing fails to find statistically significant effects, because insignificant results do not always indicate the absence of effect, which is, however, important for evidence of discriminant validity. This procedure can be found in additional equivalence tests, namely two-one-sided t-tests (TOST; Lakens, 2017), which examine practical significance rather than statistical significance and provide evidence to support null hypotheses. We specified upper (ΔU) and lower (ΔL) equivalence bounds based on the smallest effect size of interest (SESOI; Lakens et al., 2018) to −.25 and .25 as equivalent to weak effect size. According to TOST, the values within the equivalence range lack practical significance and are therefore equivalent to the null hypothesis (i.e., no associations). If the correlation analyses supplemented by the TOST procedure resulted in these small or even non-significant coefficients within the mentioned equivalence range, we would consider them sufficient evidence of a lack of relationship, especially with respect to the large sample size and consequent high statistical power (Furr, 2017). Therefore, no other method of discriminant validity estimation, such as multi-trait multi-method (MTMM), is required.
Results
The aim of Study 3 was to verify the discriminant validity of the ImGo with other cognitive processes (e.g., intelligence, attention, verbal reasoning) and personality traits (e.g., obsession, borderline, hysteria). The following results section is divided into two subchapters: Discriminant validity of the ImGo and self-report questionnaires and Discriminant validity of the ImGo and performance methods. Again, only relevant variables were included in the following correlation and regression analyses (i.e., variables which would indicate some level of impulsivity or related psychological constructs). ImGo omissions, PIT score and PIT1 RT were therefore omitted. The descriptive statistics of all methods are summarized in Tables 7 and 8.
Descriptive statistics and reliability estimation of the PSDI and the N-70 methods.
[ ]: 95% confidence intervals; M: mean; SD: standard deviation; Me: median; IQR: interquartile range; S–W: Shapiro–Wilk test.
Relationships between the ImGo and related questionnaires.
*pHolm < .05 (two-tailed); ***pHolm < .001 (two-tailed); ΔU: upper equivalence bound significant at p < .05; ΔL: lower equivalence bound significant at p < .05; [ ]: 95% confidence intervals; RT: reaction time; CV: coefficient of variance.
Discriminant validity of the ImGo and self-report questionnaires
In the first step, we verified the factor structures of the questionnaires used in this study using CFA. The WLSMV estimator, which is suitable for non-Gaussian categorical data from Likert-type scales (Finney & DiStefano, 2006), was used in all cases. Since the standardized 14-dimensional version of the PSDI yielded unsatisfactory fit indices (χ2(9360) = 93844.088, p < .001, RMSEA [90% CI] = .056 [.055, .056], SRMR = .093, CFI = .670, TLI = .662), the shorter 6-dimensional version PSDI-6 was also verified. From the analysis of the modification indices, two more items were excluded from the measurement model (item 23 and item 107). This resulted in satisfactory fit indices and improvement of the model (χ2(512) = 5525.081, p < .001, RMSEA [90% CI] = .058 [.056, .059], SRMR = .060, CFI = .910, TLI = .901). PSDI-6 is therefore used in further statistical analyses. Regarding the N-70 inventory, the confirmatory factor analysis revealed excellent fit indices (χ2(2324) = 6345.151, p < .001, RMSEA [90% CI] = .023 [.023, .024], SRMR = .063, CFI = .956, TLI = .954), suggesting satisfactory construct validity. Hence, the N-70 can be used in further analyses without any changes. Regarding the internal consistency of relevant subscales of PSDI-6 and N-70, the McDonald’s ω coefficients varied in the range .682−.814, which we considered satisfactory (Table 6).
The correlation analysis of ImGO scores and scores of the relevant PSDI-6 and N-70 subscales suggested very weak or no relationships. Even though some associations were flagged as statistically significant, the correlation coefficients were very small (rs varied between -.088 and .114; see Table 8). TOST analysis also revealed that all correlation coefficients between ImGo and the questionnaire subscales lacked any practical significance since their values were within the equivalence range (Table 8). In terms of the above-mentioned overpowered tests, the results of TOST procedure and with respect to the 95% confidence intervals (which are close to zero values), the results can be interpreted as evidence of no relationship between the ImGo and self-report impulsivity questionnaires.
We also performed robust multiple multivariate linear regressions to reveal the potential predictive power of the self-report questionnaires in the ImGo. The main indicators of impulsivity measured by ImGo were used as outcomes (i.e., ImGo commissions, ImGo RT and ImGo CV), and the relevant questionnaire subscales were used as predictors (Table 9). Even though some regression coefficients were statistically significant, their size was very close to zero. Moreover, the data from the questionnaires explained only 0.6–1.4% of the variability in the response data for indicators of impulsivity provided by the ImGo. We can therefore conclude that questionnaires do not have any notable predictive power on the ImGo under scrutiny.
Predictive power of self-report questionnaires in the ImGo for related subscales.
*p < .05 (two-tailed); **p < .01 (two-tailed); ***p < .001 (two-tailed); βr: standardized robust regression coefficient; RT: reaction time; CV: coefficient of variance.
Discriminant validity of the ImGo and performance methods
The correlation analysis of ImGo, PIT, Flagtest, TVR and VMT showed relatively weak or no associations between the ImGo and related performance methods (rs = −.156–.220; see Table 10). For reasons described above, we performed equivalence tests with the same SESOI. TOST suggests that all relationships (and their confidence intervals) were included in the equivalence ranges, which means that they were practically insignificant and equivalent to no relationship (Table 10). In conclusion, our results represent strong evidence for null effects and therefore also no relationship between the ImGo and methods for assessment of attention, verbal reasoning and visuospatial skills, or non-verbal intelligence.
Relationships between the ImGo and related performance methods.
*pHolm < .05 (two-tailed); **pHolm < .01 (two-tailed); ***pHolm < .001 (two-tailed); ΔU: upper equivalence bound significant at p < .05; ΔL: lower equivalence bound significant at p < .05; [ ]: 95% confidence intervals; RT: reaction time; CV: coefficient of variance; M: mean; sum: suma; PIT: Positional Inference Test; TVR: Test of Verbal Reasoning; VMT: Viennese Matrices Test.
To verify the predictive power of performance methods on the ImGo, robust multiple multivariate linear regressions were performed. The main indicators of impulsivity measured by ImGo were used as outcomes (i.e., ImGo commissions, ImGo RT and ImGo CV) and relevant indicators of performance methods were used as predictors in the regression model (Table 11). PIT 2 RT and PIT3 RT were not included in the regression model because of multicollinearity with the PIT Stroop effect. The results of the regression analyses suggested that the Flagtest score is weakly associated with ImGO RT (βr = .299). The size of the rest of the regression coefficients is very small, sometimes non-significant (Table 10). The performance methods explained only 2.3–6.2% of the variability of indicators of impulsivity provided by the ImGo. The proportion of shared variance indicated by multiple R2 is probably caused by psychomotor speed or attention qualities which are naturally saturated across performance methods in general. Since all the multiple R2 and regression coefficients are very weak, we can conclude that the performance methods did not have any or had only negligible predictive power on the ImGo under scrutiny.
Predictive power of performance methods on the ImGo.
*p < .05 (two-tailed); **p < .01 (two-tailed); ***p < .001 (two-tailed); βr: standardized robust regression coefficient; RT: reaction time; CV: coefficient of variance; PIT: Positional Inference Test; TVR: Test of Verbal Reasoning; VMT: Viennese Matrices Test.
Discussion
The aim of Study 3 was to verify the differential validity of ImGo and the methods used to measure other potentially related psychological constructs (personality styles, neuroticism, visual and selective attention, verbal reasoning and visuospatial skills, and non-verbal intelligence) on a large sample of respondents. This approach leads to a decreased probability of type II error due to large power, and it can provide deeper insights into the potential motoric impulsivity correlates.
In the first step, we analyzed the relationships between the ImGo and the self-report questionnaires of personality styles and neuroticism. Correlation and regression analyses suggested that no practical significance existed among these constructs. We concluded that the motoric impulsivity measured by ImGo is not associated with either borderline and obsessive-compulsive personality styles (measured by PSDI), or neurotic characteristics such as hysteria, obsession-phobia and psychosomatics symptoms (measured by N-70), despite some studies reporting their relationship (especially in the case of borderline personality disorder) to various aspects of impulsivity (e.g., Barker et al., 2015; Cackowski et al., 2014; Coffey et al., 2011; Onur et al., 2016; Rentrop et al., 2008; Ruchsow et al., 2008; Wright et al., 2014). Our findings can be interpreted by the differences in our research design. First, we used a much larger sample size, and second, we gathered data from the general population rather than the clinical population with diagnosed personality disorders. Moreover, these findings correspond to the finding obtained in Study 1 on the lack of relationship between behavioral measures and self-report questionnaires (see discussion of Study 1).
In the second step, we examined the relationships between the ImGo and potentially related performance instruments. Although those associations were generally higher compared to the associations with the questionnaire methods, the TOST procedure suggested that no practical significant effects existed between behavioral impulsivity and other examined cognitive processes, such as intelligence, attention or verbal reasoning. Even though these processes share some common elementary cognitive aspects, for instance psychomotor speed, it is evident that ImGo measures a different construct. Since behavioral impulsivity should differ from other cognitive and executive processes, we consider these findings crucial evidence for the discriminant validity of ImGo.
Regarding the association between impulsivity and the interference control measured by the Stroop test, it is sometimes hypothesized that greater impulsivity should be associated with weaker interference control (Nigg, 2000). However, our results agree with other studies which have suggested that the association between impulsivity and selective attention measured by the Stroop family of tests is only small or non-significant (Aichert et al., 2012; Enticott et al., 2008; Morooka et al., 2012; Strasser et al., 2016).
As for the relationship between selective attention and impulsivity, the vast majority of research was conducted on samples with diagnosed attention-deficit/hyperactivity disorder (ADHD). Generally, it is assumed that lower attention should lead to higher error rates and slower reaction times in go/no-go trials (Metin et al., 2012). This type of research has indeed found that impulsivity is associated with attention (Bezdjian et al., 2009; Kulacaoglu et al., 2017) and distinguishes between clinical and general populations or between various types of attention deficit disorders (Lopez et al., 2015; Miller et al., 2009; Trommer et al., 1988). According to a recent meta-analysis, those results had stable effects (Metin et al., 2012) and were also observed within the general population (Chamorro et al., 2012). However, none of the studies mentioned applied behavioral measures of attention, for example a D2 test. Although the observed correlations between both constructs were highest in our analysis, they were not practically significant and therefore indicated a small or no relationship.
The last relationships we examined were between behavioral impulsivity and general and verbal intelligence measured by the Viennese Matrices Test and Test of Verbal Reasoning. We argue that impulsivity and intelligence are unrelated constructs and therefore the observed lack of relationship is essential for the discriminant validity of the ImGo instrument. Even though some studies suggest that impulsivity is related to both general and verbal intelligence, (e.g., Buchmann et al., 2011; Koolhof et al., 2007; Lozano et al., 2014; Lynam et al., 1993; Russo et al., 2008; Schweizer, 2002), other research has not discovered any direct relationship between these constructs or even labeled these associations as spurious (e.g., Lozano, 2015; Vigil-Coleṭ & Morales-Vives, 2005). Our research therefore supports the results of the second group of studies. It should be mentioned that the direct comparability of our results to the above-mentioned studies on the relationship of intelligence and behavioral impulsivity is limited since each study used a unique research design, was conducted on specific population (e.g., delinquent population), and used different behavioral and self-report measures.
Conclusion
The present paper provided a set of three separate studies with evidence of the validity and reliability of the ImGo instrument for studying the general population. In Study 1, moderate to high test-retest and split-half reliability of the ImGo were found. The results suggested that the indicators of behavioral impulsivity obtained by the ImGo are stable in time and show satisfactory internal consistency between two halves of the instrument, which is necessary for minimizing measurement errors and thus producing reliable results. Studies 1 and 2 together also provided sufficient evidence of convergent validity; the ImGo showed medium correlations with some related indicators of impulsivity measured by the self-report Impulsive Behavior Scale, i.e., negative urgency subscale, and by the Stop Signal test. Besides providing evidence of validity, Study 2 also used an exploratory eye-tracking method to identify whether associations existed between oculomotor inhibition and impulsivity. Nevertheless, these associations were rather weak or non-significant. The aim of Study 3 was to examine the differential validity of ImGo with personality disorders/styles, namely borderline personal style, hysteria, obsessive-compulsive behavior and psychosomatic symptoms measured by self-report scales and objective methods which measure general and verbal intelligence, attention and inference control. All showed that the associations tended to be very weak and lacked any practical significance. The discriminant validity is therefore considered highly satisfactory.
Our findings from Study 1 and Study 2 also showed that behavioral impulsivity differs to a certain extent from other impulsivity instruments, suggesting that impulsivity manifests differently at the behavioral level, yet the impulsivity instruments simultaneously share a general impulsive trait. Regarding the impulsivity correlates within the general population, which we examined in Study 3, our results showed that higher impulsivity is not associated (from the practical significance point of view) with lower verbal or general intelligence, higher cognition and inference inhibition and lower attention despite the general assumptions of previous clinical research.
In conclusion, all three studies combined offer a comprehensive view on behavioral impulsivity and its correlates within the general population. ImGo showed satisfactory reliability and construct validity, supported by the results of convergent and discriminant validity. ImGo is also easy to administer since it is possible to run online and has a free license for non-commercial use. ImGo has already been translated into several languages (English, German, Spanish, Portuguese, Slowak, Traditional Chinese, Simplified Chinese, Turkish), but further verification of its psychometric properties within other cultures and languages is still deemed desirable, for instance in cross-cultural studies. The Hypothesis software, which runs the ImGo instrument, is an open-source software (Apache License 2.0; https://github.com/poweredonhypothesis/hypothesis). Hence, the ImGo method is considered suitable for further scientific purposes and for practical use within the general population.
Supplemental Material
sj-pdf-1-prx-10.1177_00332941211040431 - Supplemental material for ImGo: A Novel Tool for Behavioral Impulsivity Assessment Based on Go/NoGo Tasks
Supplemental material, sj-pdf-1-prx-10.1177_00332941211040431 for ImGo: A Novel Tool for Behavioral Impulsivity Assessment Based on Go/NoGo Tasks by Č. Šašinka, D. Lacko, J. Čeněk, S. Popelka, P. Ugwitz, H. Řádová, M. Fabianová, A. Šašinková, J. Brančík and M. Jankovská in Psychological Reports
Footnotes
Acknowledgments
We would like to thank prof. Holmqvist for consultation of the eye tracking task and we would like to thank the HUME Lab–Experimental Humanities Laboratory, Masaryk University, for providing us with the necessary machine time and equipment.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This publication was supported by the Czech Science Foundation (GC19-09265J: The influence of socio-cultural factors and writing system on perception and cognition of complex visual stimuli).
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
