Abstract
Objectives:
Previous studies have widely demonstrated that inhibition of return (IOR) with audiovisual targets decreases due to audiovisual integration (AVI). It is currently unclear, however, whether the impaired AVI in children with ADHD has effects on IOR. The present study used the cue-target paradigm to explore differences between the IOR of audiovisual targets and the IOR of visual targets in ADHD and typically developing (TD) children.
Method:
A total of 81 native Chinese speakers aged 6 to 13 years were recruited, including 38 children with ADHD and 43 age- and sex-matched TD children.
Results:
The results showed that there was a smaller magnitude of IOR with audiovisual targets as compared with visual targets in the two groups. Importantly, the reduction of IOR in audiovisual conditions was significantly smaller in children with ADHD than in children with TD. Race model analyses further confirmed that differences in IOR between ADHD and TD are due to deficits of audiovisual integration in ADHD.
Conclusion:
The results indicated that children with ADHD have impaired audiovisual integration, which has a minimal impact on IOR.
Highlights
Audiovisual integration is impaired in ADHD
Due to audiovisual integration, audiovisual inhibition of return is lower than visual inhibition of return in both ADHD and typically developing children
Deficits in audiovisual integration in ADHD reduce a significant variation in inhibition of return across groups
Introduction
ADHD is a neurodevelopmental disorder associated with inattention, hyperactivity, and impulsivity (Tsai, 2003). These symptoms typically arise during childhood, with approximately 11% of children receiving a diagnosis of ADHD (Visser et al., 2014). ADHD are impaired in cognitive function (Coghill et al., 2014; Fair et al., 2012), encompassing executive functioning and attention (Read et al., 2020). In terms of cognitive function, it is now generally accepted that problems in inhibitory represent a core deficit among children with ADHD (Barkley, 2006; Nigg, 2006). One approach used to study inhibitory mechanisms involves the inhibition of return (IOR) task (Klein, 1988, 2000; Klein & MacInnes, 1999), and is based on a phenomenon first documented by Posner and Cohen (1984). In the cue-target paradigm, the IOR effect occurred when the stimulus onset asynchrony (SOA) is longer than 250 ms (Posner & Cohen, 1984). This is because attention has been directed away from a location for a sufficient period, resulting in an increased RT in the cued condition relative to the uncued condition (Posner et al., 1985).
Previous studies have explored the visual IOR (visual cue and visual target) between ADHD and typically developing (TD) children and they found that the manifestation of IOR in ADHD was nearly identical to that of controls (C. S. Li et al., 2003), indicating that the inhibitory attention mechanism subserving IOR is not entirely compromised in ADHD children. Behavioral results showed that the IOR effects can be reduced by multisensory integration (MSI), showing that the IOR effects at the audiovisual target condition were noticeably smaller in contrast to those at the single visual target condition (Tang et al., 2019). MSI refers to the set of processes by which information arriving from the individual sensory modalities interacts and converges into a coherent and meaningful representation (Talsma et al., 2010) and audiovisual integration (AVI) is one of the common types. However, AVI occurs conditionally, attending to auditory and visual modalities simultaneously (bimodal-divided attention) is necessary for AVI (Talsma et al., 2007). AVI can be reduced or even abolished when focusing on modality-specific targets (selective attention; Mozolic et al., 2008; Talsma et al., 2007; Tang et al., 2016). It is important to note that bimodal-divided attention can impact not only AVI but also the IOR with audiovisual targets. Compared to visual selective attention, AVI is more efficient under bimodal-divided attention, resulting in a greater IOR for visual targets than for audiovisual targets (Tang et al., 2019). To summarize, AVI can affect IOR specifically when attention is distributed across the senses.
Recent evidence suggests that perceptual salience, a distinct perceptual quality isolating some items from their background (van der Burg et al., 2008b, 2011), plays a role in the interaction between AVI and IOR (Peng et al., 2021). More specifically, IOR reduces perceptual salience for previously attended spatial locations (i.e., valid cue locations) and resulting a larger RT in valid cues relative to invalid cues (Prime & Ward, 2006; van Koningsbruggen et al., 2010). The IOR effects equally exist in the AV target and V target conditions. However, in the AV target condition, the attended auditory stimulus can be integrated with a simultaneous visual stimulus and thus enhance the perceptual salience (van der Burg et al., 2008a, 2011). The audiovisual IOR may vanish when the increased perceptual salience from auditory-visual integration surpasses the decreased perceptual salience of IOR. In a word, AVI can modulate the perceptual salience of targets and help the audiovisual target resist being inhibited in the cue-target paradigm (Tang et al., 2019).
Contrary to previous reports, ADHD children, unlike typically development (TD) children of similar age, do not have normal AVI (McCracken et al., 2019). The impaired AVI in ADHD is possibly caused by the sensory integration disorder. A well-organized sensory system can integrate input from multiple sources (American Academy of Pediatrics, 2012). However, sensory neurons in children with ADHD are not signaling or functioning efficiently (Ayres, 1979), which results in children with ADHD being unable to properly receive and process sensory information (Dunn & Bennett, 2002). Consequently, ADHD children have difficulties processing visual (Martin et al., 2008) and auditory stimuli (Fu et al., 2022; Ghanizadeh, 2009), which leads to AVI issues (Molholm et al., 2020). For example, ADHD showed AVI deficits in processing multiple sensory streams in parallel (Schulze et al., 2021) and there were no beneficial effects from a multimodal source in patients with ADHD (Michalek et al., 2014). In addition, the neural substrates that underlie attention and multisensory processing share many commonalities, including the superior colliculus, fronto-parietal, and temporo-parietal networks (Dionne-Dostie et al., 2015). Therefore, ADHD patients showed anomalies in AVI (Michalek et al., 2014; Panagiotidi et al., 2017; Schulze et al., 2021), resulting in poor attention which enhanced less perceptual salience than the controls. This makes it reasonable to assume that in the exogenous cueing paradigm, assisting the audiovisual target in resisting inhibition proves more challenging for individuals with ADHD.
The present study aims to investigate the difference between the IOR effect of auditory-visual (AV) targets and the IOR effect of visual (V) targets in children with ADHD and controls under conditions of divided-modalities attention. In our study, we apply the cue-target paradigm and manipulate factors of group (ADHD, TD), cue validity (cued, uncued), and target type (audiovisual, visual, auditory). According to the findings of previous research, we expected that impaired audiovisual integration exists in ADHD children relative to controls. We assume that, because of the defects, the audiovisual integration reduced audiovisual IOR effect would be smaller in ADHD children than in TD children.
Method
Participants
A total of 49 ADHD children aged 6 to 13 years were initially recruited for this study. These children were matched on chronological age with a control group of 48 TD children. However, 11 children with ADHD and 5 children with TD were eliminated for poor achievement (the RT or accuracy z score was greater than three). The remaining 38 children with ADHD (mean age: 8.74 ± 1.94 years; 31 boys and 7 girls) and 43 children with TD (mean age: 9.09 ± 1.86 years; 29 boys and 14 girls) were finally selected for statistical analysis. The female/male ratio aligns with the incidence rate of ADHD (Huss et al., 2008; Rucklidge, 2010). Between the ADHD and TD groups, no significant group differences were found in age (t(79) = 0.84, p = .402) or sex ratios (χ2(1) = 2.10, p = .147). Based on the G-Power toolbox (Faul et al., 2007) and the prior behavioral study (Tang et al., 2019), for every group, a sample size of N = 24 was adequate to attain the intended effect size (f = 0.25, alpha = .05, power = 0.9, according to the predicted effect size), demonstrating that adequate power could be achieved in this sample.
Children diagnosed with ADHD were drawn from the Children’s Hospital of Soochow University. The diagnosis was based on being clinically interviewed with the Diagnostic and Statistical Manual of Mental Disorders, 5th ed. (DSM-5) criteria by qualified psychiatrists. Furthermore, all of the ADHD-diagnosed children satisfied the requirements of the Swanson, Nolan, and Pelham-IV Rating Scale (SNAP-IV), which was filled out by the parents of the children. The SNAP-IV has been found to have good validity and reliability based on earlier research (Gau et al., 2009; Hall et al., 2020).
Every child diagnosed with ADHD fulfilled the subsequent requirements: (a) Qualified psychiatrists diagnosing ADHD based on the DSM-5 and moderate to severe scores on the SNAP-IV; (b) right-handedness; (c) hearing and vision acuity normal or corrected-to-normal; (d) normal intelligence quotient (IQ > 25%) as determined by the Raven Standard Reasoning Test (RSRT); (e) drug-naïve; (f) without a history of other neurological or mental conditions; and (g) without a history of any other common comorbid conditions, such as anxiety, depression, or learning disabilities. The inclusion criteria for the TD group were the same as for the ADHD group except for the ADHD diagnosis. And the elimination of ADHD symptoms was also based on parent and teacher reports, as well as the observation during the whole experiment. TD children included in this study must meet all of the following conditions: (a) Never being diagnosed with a neuropsychological disorder including ADHD; (b) the absence of academic or behavioral problems; (c) in the age appropriate grade in school with no history of special education services; and (d) no other chronic conditions, neurological disorders (e.g., epilepsy), the use of medications, and other primary psychiatric diagnoses (e.g., depression, anxiety, and psychosis). All parents of minors gave their informed consent in compliance with the Helsinki Declaration. This study was approved by the academic committee of the Department of Psychology at Soochow University in Suzhou, China.
Apparatus and Stimuli
The participants took the test in a laboratory with dim lighting and sound isolation. Visual stimuli were presented on a 27 in. (61 cm × 37 cm) LCD monitor (120 Hz, 2,560 × 1,440 screen resolution) with a black background (0.4 cd/m2) at a distance of 60 cm from the participants. A white fixation cross measuring 0.05° × 0.05° of the visual angle and white boxes measuring 1° × 1° with 4.5° eccentricity on either side comprise the fixation stimulus, which has a brightness of 155.2 cd/m2. One square was filled with white as an external cue to highlight its location in response to the fixation stimulus. By bolding the fixation cross (0.1° × 0.1° of the viewing angle), the center cue was intended to draw attention back to the central position. The target stimuli were classified as visual (V), auditory (A), or audio-visual (AV). The visual target stimulus is a checkerboard pattern (0.8° × 0.8°of the visual angle) composed of white (155.2 cd/m2) and red (27.5 cd/m2). The stereo headphones (model: HyperX) deliver a 1000 Hz pure tone (65 dB, 100 ms, with a linear rise and fall times of 10 ms) as the auditory target stimulus. The audio-visual target is presented by both visual and auditory target stimuli in the same spatial location.
Design and Procedure
Experiment 2 of Tang et al. (2019), which was previously detailed, served as the foundation for our current investigation. A mixed design of 2 (group: ADHD, TD) × 2 (cue validity: cued, uncued) × 3 (target type: visual, auditory, audiovisual) was used. Following one practice block with 30 trials, participants completed four experimental blocks for 90 trials each. About 20% (72 trials) of the total trials were catch trials (no target appearance). Thus, there were 48 trials for each experimental condition. The entire experiment lasted approximately 25 minutes.
The flow of a single trial in the experimental phase is shown in Figure 1. To make sure covert attention was at fixation, a 750 ms gap was added at the start of each trial. Each trial commenced with a 50 ms visual peripheral cue, which was succeeded by a 150 or 250 ms central fixation cross brightening. Following another gap of 150 to 250 ms, a visual, auditory, or audiovisual target or no stimuli (catch trials) would appear for 100 ms. The stimulus onset asynchrony (SOA) was set randomly, which means the SOA between the peripheral cue and the target stimulus was completed within 400/600 ms, which was to prevent the participants from predicting the target due to a fixed SOA.

Illustration of targets and procedure. Sequence of event and duration under the uncued condition with audiovisual target.
Throughout the experiment, participants were instructed to maintain their gaze on the central fixation cross. There were two trial types: cued and uncued trials. In the cued conditions, the cue and the target showed up in the identical location. In the uncued conditions, they showed up in opposed locations (Figure 1). The amount of cued and uncued trials was the same. There were three modalities: A, V, and AV targets. The numbers of A, V, and AV trials were also the same. Participants were instructed to respond to target stimuli by pressing the keyboard key “B” as rapidly and accurately as possible with the dominant hand’s index finger. In catch trials, no target appeared, and thus no response was required. Feedback on the number of correct responses was provided after each block. For subsequent accuracy analysis, as no target was presented in the catch trials, only trials containing target stimuli were included, while catch trials were excluded.
Data Analysis
The stimuli were presented and the responses were recorded using the E-prime program (3.0.3.9 version). For each participant, only go trials with a correct response were used in the RT analysis (Van der Stoep et al., 2017). The incorrect response was removed (9.67% of the data in ADHD group and 6.31% of the data in TD group). Then RTs smaller than 100 ms or larger than 1,000 ms, and attempts with a difference of 3 standard deviations from the mean response time (RT) were removed. Because they were assumed to be the result of anticipation or not paying attention to the task (Van der Stoep et al., 2017). These outliers were infrequent, in total 288 trials, 3.68% of the data was removed: 3.78% of the ADHD groups (on average 11 trials per participant) and 3.59% of the TD groups (on average 10 trials per participant) were discarded.
The processed data were analyzed by JASP (Version 0.18.1). The accuracy and average RT data were compared using a 2 (group: ADHD, TD) × 2 (cue validity: cued, uncued) × 3 (target type: visual, auditory, audiovisual) repeated-measures ANOVA. The partial eta-squared (η2p) effect size was computed for the ANOVA. To test for the presence of a speed-accuracy trade-off (SAT), Pearson’s correlation analysis was conducted on accuracy and average RT in each condition. The IOR effect was calculated by subtracting the RTs in the uncued condition from the RTs in the cued condition (IOR effect = RTcued − RTuncued). Then a comparison was drawn for the IOR effect with the 2 (group: ADHD, TD) × 3 (target type: A, V, or AV) repeated measure ANOVA.
To investigate the amount of speedup in the multisensory condition compared with the unimodal condition, the amount of relative multisensory response enhancement (rMRE) was calculated for each participant and each cue validity condition (cued and uncued) by using the following formula (Tang et al., 2019; Van der Stoep et al., 2017). The rMRE represents the enhancement effect of multisensory response (Mishler & Neider, 2016). The paired t-test was used to compare variations in rMRE between the different cue validity condition (Stevenson et al., 2014; Van der Stoep et al., 2017).
To examine whether any speeding-up in the multisensory condition could be explained by statistical facilitation or by multisensory integration, we analyzed the race model with reference to previous research using the following formula (Laurienti et al., 2006; J. Miller, 1982, 1986; Tang et al., 2019).
Over the RT range of 0 to 1,000 ms, the Race Model shows the probability difference between the audiovisual CDF and the Race Model CDF every 10 ms (Laurienti et al., 2006). The race model is considerably violated and indicates the occurrence of a multisensory integration effect if the actual CPAV for a specific reaction time range is significantly larger than the predicted CPRaceModel (Ulrich et al., 2007). The resulting p-values were Bonferroni corrected for the number of tests within a condition (N = 100 as there were one hundred quantiles) using the formula: pcorrected = 1 − (1 − p)n (Van der Stoep et al., 2015).
The resulting p-values were Bonferroni corrected, The difference between the CDF of the audio-visual condition and the CDF of the race model indicates the magnitude of the multisensory integration (J. Miller, 1982; Raab, 1962; Ulrich et al., 2007). The race model inequality curve’s peak within the significant violation window can be used to calculate the positive area under the curve (pAUC) and estimate the size of the integration effects under various conditions by analyzing deviations from the curve (Yang et al., 2014).
Results
Accuracy (ACC)
Accuracies were submitted to a 2 (group: ADHD, TD) × 2 (cue validity: cued, uncued) × 3 (target type: visual, auditory, audiovisual) mixed ANOVA. The accuracies of responses to different conditions in ADHD and TD children are shown in Table 1. The results showed a significant main effect of target type, F(1.432, 113.131) = 14.38, p < .001, η2p = .15, and a post hoc analysis with Bonferroni correction showed that the accuracy for the audiovisual condition (94.4%) was significantly higher than that for the visual condition (89.7%), t(78) = 5.24, pBonf < .001, Cohen’s d = 0.52, 95% CI [0.26, 0.78], and the auditory condition (91.1%), pBonf = .001, Cohen’s d = 0.36, 95% CI [0.11, 0.61], and the difference in accuracy between the visual condition (89.7%) and the auditory condition (91.1%) was not significant, t(78) = 1.65, pBonf = .303. The main effect of group was significant, and the accuracy for the ADHD group (90.5%) was significantly lower than that for the TD group (93.9%), F(1, 79) = 4.30, p = .041, η2p = .05. The interaction between cue validity and target type was significant, F(2, 158) = 6.11, p = .003, η2p = .07, and the interaction between cue validity and group was also significant, F(1.432, 113.131) = 5.47, p = .012, η2p = .07. No other significant results were detected in accuracies.
Average of Reaction Times (RTs, in ms), Accuracy (ACC, Percent Correct), and Standard Deviation (SD) for All Combinations of Target Type (A, V, and AV) and Cue Validity (Cued and Uncued) in Each Participants Group.
Reaction Times (RTs)
Correct RTs were analyzed with a 2 (group: ADHD, TD) × 2 (cue validity: cued, uncued) × 3 (target type: visual, auditory, audiovisual) repeated measures ANOVA (see Table 1 and Figure 2). The p values of all main effects and interactions were corrected with the Greenhouse - Geisser correction. The three-way repeated-measures analysis of variance (ANOVA) revealed a significant main effect of cue validity, F(1, 79) = 192.30, p < .001, η2p = .71, with slower RTs on the cued trials (432 ms) than on the uncued trials (395 ms), which indicates a significant IOR. The main effect of target type was significant, F(1.515, 119.717) = 127.35, p < .001, η2p = .62. Further pairwise comparisons based on the Bonferroni correction indicated that RTs tends to be significantly faster for audiovisual targets (367 ms) than on for visual targets (424 ms) and auditory targets (448 ms), which reveals a significant redundant signals effect—audiovisual targets and visual targets: t(80) = 10.92, pBonf < .001, Cohen’s d = 0.61, 95% CI [0.43, 0.79]; audiovisual targets and auditory targets: t(80) = 15.54, pBonf < .001, Cohen’s d = 0.87, 95% CI [0.65, 1.09]; visual targets and auditory targets: t(80) = 4.62, pBonf < .001, Cohen’s d = 0.26, 95% CI [0.11, 0.40]). The main effect of group was also significant, F(1, 79) = 8.68, p = .004, η2p = .10, with a slower RT for ADHD (442 ms) than for TD children (385 ms). Neither the interaction between cue validity and target type, F (1.861, 147.018) = 116.97, p < .001, η2p = .60, nor between target type and group, F (1.515, 119.717) = 29.76, p < .001, η2p = .27, were significant. Importantly, the interaction between the three factors was also significant, F (1.861, 147.081) = 4.45, p = .015, η2p = .05.

Mean reaction times in the ADHD and TD groups. Error bars represent standard errors of the mean.
For ADHD children, the main effect of cue validity was significant, F(1, 37) = 106.36, p < .001, η2p = .74, indicating that RTs on uncued condition (424 ms) were faster than cued condition (460 ms). There was a significant main effect of target type, F(1.548, 57.279) = 101.73, p < .001, η2p = .73, indicating that RTs on auditory trials (497 ms) were slower than those on visual trials (433 ms, p < .001) and audiovisual trials (395 ms, p < .001). The interaction between target type and cue validity was also significant, F(1.545, 57.176) = 53.45, p < .001, η2p = .59. Simple main effect analysis revealed a significant difference between the cued and uncued trials under the V condition (469 ms vs. 398 ms, p < .001), the A condition (503 ms vs. 491 ms, p = .020), and the AV condition (407 ms vs. 384 ms, p < .001).
For TD children, the main effect of cue validity was significant, F(1, 42) = 92.56, p < .001, η2p = .69, suggesting that RTs to the cued condition (403 ms) were longer than the uncued condition (366 ms). The main effect of target type was significant, F(1.379, 57.934) = 58.23, p < .001, η2p = .58, with RTs significantly longer for the visual target (415 ms) than for the audiovisual target (339 ms, pBonf < .001), and significantly longer for the auditory target (399 ms) than for the audiovisual target (339 ms, pBonf < .001). Furthermore, significant interactions were observed between cue validity and target type, F(2, 84) = 68.87, p < .001, η2p = .62. For the visual target, there was a significantly slower in the cued condition (460 ms) versus the uncued condition (371 ms), p < .001; for the auditory target, RTs were significantly slower in the cued condition (405 ms) than in the uncued condition (394 ms), p = .047; for the audiovisual target, RTs in the cued condition (346 ms) were significantly slower than those in the uncued condition (333 ms), p = .036.
Correlation Between Reaction times (RTs) and Accuracy (ACC)
For the speed-accuracy trade-off (SAT), previous evidence showed that variations in RT and ACC are negatively correlated when SAT is present (Heitz, 2014; Vandierendonck, 2021). We analyzed the correlation between RTs and ACC for each condition in both ADHD and TD groups (Table 2). In ADHD groups, the two measures were significantly negatively correlated in the A - cued condition (Pearson r = −.365, p = .024) and were on the edge of significance in the A - uncued condition (Pearson r = −.319, p = .051). In TD groups, there was a significant correlation in the V - cued condition (Pearson r = −.362, p = .017). No other significant results were detected in correlation.
Correlation Between Average of Reaction Times (RTs, in ms) and Accuracy (ACC, Percent Correct) for All Combinations of Target Type (A, V, and AV) and Cue Validity (Cued and Uncued) in Each Participants Group.
Standard Deviation of Reaction Times (RTSD)
A repeated measures ANOVA was conducted on mean RTSD with the factors of target type, cue validity, and group (Table 1). The main effect of group was at the very edge of significance, F(1, 79) = 3.83, p = .050, η2p = .05. Compared to TD children (116 ms), children with ADHD (129 ms) showed a nearly significantly larger RTSD. Additionally, the main effect of target type was significant, F(1.763, 139.291) = 79.02, p < .001, η2p = 0.50. The interaction between cue validity and target type was also significant, F(2, 158) = 6.92, p = .001, η2p = .08.
Inhibition of Return (IOR)
The IOR effect (calculated by subtracting the RTs in the uncued condition from the RTs in the cued condition) was calculated for each condition. Then, IOR effects were submitted to a 2 (group: ADHD vs. TD) × 3 (target type: visual vs. auditory vs. audiovisual) mixed ANOVA (see Figure 3a). The results showed a significant main effect of target type, F(1.861, 147.016) = 116.97, p < .001, η2p = .60, and further multiple comparisons showed that the IOR effect for the visual condition (80 ms) was significantly higher than that for the audiovisual condition (18 ms), t(80) = 12.52, pBonf < .001, Cohen’s d = 1.78, 95% CI [1.29, 2.26], and the auditory condition (11 ms), t(80) = 13.87, pBonf < .001, Cohen’s d = 1.97, 95% CI [1.46, 2.48], and the difference in IOR effect between the audiovisual condition (18 ms) and the auditory condition (11 ms) were not significant, t(80) = 1.36, pBonf = .530. The main effect of group was not significant, F(1, 79) = 0.18, p = .671, η2p = .002. The interaction between group and target type was significant, F(1.861, 147.016) = 4.45, p = .015, η2p = .05.

(a) Magnitudes of the inhibition of return (IOR) effect for visual, auditory, and audiovisual targets and (b) relative multisensory response enhancement (rMRE) magnitudes under differing cue validity and group conditions. Error bars represent the standard errors of the mean.
To examine performance on the IOR between the ADHD and TD groups, a series of paired-sample t tests based on Bonferroni correction was conducted for the two respective groups. IOR effect showed no significant difference between ADHD group and TD group in any of the three modalities—auditory: 12 ms versus 11 ms, pBonf = 1; visual: 71 ms versus 89 ms, pBonf = .280; audiovisual: 22 ms versus 13 ms, pBonf = 1. These results indicated that ADHD children demonstrate an IOR effect that is grossly similar to that observed in controls.
In the ADHD group, further pairwise comparisons based on the Bonferroni correction indicated that IOR tends to be significantly greater for the visual targets (71 ms) than for the auditory targets (12 ms) and the audiovisual targets (22 ms)—visual targets and auditory targets: t(37) = 8.16, pBonf < .001, Cohen’s d = 1.69, 95% CI [0.96, 2.42]; visual targets and audiovisual targets: t(37) = 6.58, pBonf < .001, Cohen’s d = 1.36, 95% CI [0.67, 2.05]; auditory targets and audiovisual targets: t(37) = 1.58, pBonf = 1, Cohen’s d = 0.33, 95% CI [−0.94, 0.29]. In the TD group, a similar trend of IOR effect was detected, indicating that the visual target IOR effect (89 ms) was significantly larger than the auditory target (11 ms, t(42) = 11.57, pBonf < .001, Cohen’s d = 2.25, 95% CI [1.46, 3.03]), and the audiovisual target (13 ms, t(42) = 11.27, pBonf < .001, Cohen’s d = 2.19, 95% CI [1.42, 2.97]).
The IOR effect of the AV targets was smaller than the V targets in both the ADHD and TD groups, and more importantly, compared to the ADHD group (47 ms), the audiovisual IOR decreased significantly greater in the TD group (76 ms), t(79) = 3.12, p = .003, Cohen’s d = 0.70, 95% CI [0.24, 1.14].
A one-sample t-test was performed on the rMRE of different cue validity for the two groups (comparing the means with 0). The results are shown in Figure 3b. The result showed that both valid targets (rMRE = 10.3%, p < .001) and invalid targets (rMRE = 4.0%, p = .008) were significantly larger than 0 in TD group, which indicates a redundant signals effect. However, in ADHD group, only the valid targets (rMRE = 9.2%, p < .001) were significantly larger than 0, and were not shown in invalid targets (rMRE = 0.5%, p = .799).
A 2 (cue validity: cued, uncued) × 2 (group: ADHD, TD) repeated measures ANOVA was conducted, and the main effect of cue validity was significant, revealing that rMRE was significantly smaller for the uncued condition than for the cued condition, F(1, 79) = 39.95, p < .001, η2p = .34. The result demonstrated that the redundant signals effect was decreased for the invalid target condition compared to the valid target condition both in ADHD group (pBonf < .001) and TD group (pBonf = .001). No other significant main effects or interaction effects were observed (ps > .05).
Race Model Violation
In the response time interval of 0 to 1,000 ms, the probability values of the cues in different prime validity conditions for ADHD and TD groups were calculated in each 10 ms interval: visual P (RTv < t), auditory P (RTA < t), and visual-auditory P (RTAV < t). The values of the difference between the actual audiovisual cumulative distribution probability (CPAV) and the predicted cumulative distribution probability of the race model (CPRace model) for the different conditions were subjected to a one-sample t-test (compared with 0) in each 10 ms interval.
The results are shown in Figure 4: For uncued condition, invalid targets showed a negative value in both ADHD group and TD group, and did not show visual or auditory integration, so further analyses were not conducted. For cued condition, valid targets differed for 80 ms (360–430 ms) with a peak at 390 ms for 5.1% of significant violations of the race model in ADHD group; and valid targets differed for 280 ms (280–550 ms) with a peak at 320 ms for 7.64% of significant violations of the race model in TD group. After applying the Bonferroni correction to the obtained p-value, valid targets differed for 40 ms (310–320 and 350–360 ms) of significant violations of the race model in TD group; but no significant violations of the race model were detected in ADHD group. The significant peaks for ADHD group were smaller, and the time windows were narrower in cued condition, suggesting that ADHD had less audiovisual integration.

Analysis of the race model for different cue validity and group. The horizontal axis indicates the time windows where the race model was significantly violated, with the actual cumulative probability of AV (CPAV) being significantly larger than the race model predicted cumulative probability (CPRace model).
Independent-sample t-tests were performed for pAUC values and peaks, however, none of the significant differences were detected between ADHD and TD group (ts < 1).
Discussion
Using the cue-target paradigm, we examined the impact of audiovisual integration on the effect of IOR in children with ADHD and compared it with control groups. Children with ADHD and their controls were selected in this research which enriched the field of IOR. The IOR effect of the unimodal visual target was larger than the IOR of the bimodal audiovisual target and showed a larger decrease at TD compared with ADHD. The result also showed a poorer integration in ADHD. Taken together, the findings show that ADHD presents weaker audiovisual integration which influences less on the IOR with audiovisual targets.
In the present study, it was found that RTs for auditory targets are surprisingly slower than visual targets in individuals with ADHD, which was not found in TD groups. On the one hand, this can be explained by previous studies that ADHD children have abnormalities in auditory processing and they rely more on brain areas related to visual (Fassbender & Schweitzer, 2006). Children with ADHD are unable to dynamically adjust cognitive resources in response to changing task demands due to damage to the prefrontal cortex and anterior cingulate cortex (E. K. Miller, 2000; E. K. Miller & Cohen, 2001). An ERP study also found that the physiological basis of auditory processing deficits in children with ADHD may lie in the absence of the N2 anterior contralateral component (N2ac), the EEG component of selective attention in auditory space, and that the extent of the deficit predicts the longer RT (Fu et al., 2022). On the other hand, the slower auditory RTs may be due to the speed-accuracy trade-off (SAT). In our research, the significantly negative correlation in auditory targets of ADHD group suggests an SAT presented in auditory target trials. It appears that auditory condition was challenging for the ADHD children, under the instruction as rapidly and accurately as possible, a trade-off was made between reaction time and accuracy in the auditory condition. Conversely, visual conditions are relatively easier and do not require such trade-offs. The results can prove from another aspect that ADHD Children have abnormalities in auditory processing (Fu et al., 2022), but not in visual processing (Fassbender & Schweitzer, 2006).
As the hypothesis suggests, ADHD children have deficits in audio-visual integration. More specifically, results of the race model indicated that among TD children, audiovisual integration occurred in the valid cue condition, rather than the invalid cue condition. However, audiovisual integration did not occur in either the valid cue or invalid cue conditions in children with ADHD. The results were consistent with previous studies, findings from spatial temporal order judgment (TOJ) also showed deficits in individuals with high ADHD traits, which have a narrower integration time window compared to those with low ADHD traits (Panagiotidi et al., 2017). This might have to do with the circumstances surrounding visual-auditory integration. Multisensory integration effects depend on the multisensory objects being fully attended—that is, when both the visual and auditory senses are attended (Talsma et al., 2007). However, children diagnosed with ADHD typically struggle with effective attention management and are more easily distracted in certain contexts (Caldani et al., 2019; van der Stelt et al., 2001; van Mourik et al., 2007). In addition, it seems that the results of the present study showed a more severe deficit. Probably because we selected children patients rather than adult patients. Of the children diagnosed with ADHD, approximately 50% will have symptoms that persist into adulthood (Sadock et al., 2000). Studies have suggested that there is a substantial decline over time in the number of individuals who retain clinically significant symptoms of ADHD and that inattention symptoms are more likely than hyperactive symptoms to persist into adulthood (Biederman et al., 2000). Thus, one possible explanation for the worse audiovisual integration deficiencies in ADHD children might be an increased prevalence of symptoms.
The current study found that there was no significant difference between the ADHD and TD groups for IOR in any of the three target modalities. This was consistent with previous studies, while they only involved the visual modality (visual cue and visual target; C. S. Li et al., 2003). We extended the scope and found that ADHD also showed no deficits in cross-modal IOR (visual cue and auditory target) and bimodal audiovisual IOR (visual cue and audiovisual target). For audiovisual target condition, audiovisual integration occurred and decreased the IOR effect (Tang et al., 2019). In the current study, similar results existed in both ADHD and TD groups that the IOR effect elicited by an audiovisual target was reduced. That’s because audiovisual integration enhances the perceptual salience of the target (van der Burg et al., 2011), while IOR attenuates the perceptual salience (Prime et al., 2003; Satel et al., 2013). More importantly, however, the current study found that TD group showed a greater decreased value of IOR than the ADHD group. That’s because compared to TD children, poorer audiovisual integration in ADHD increases less perceptual salience (Panagiotidi et al., 2017), meanwhile, IOR reduces equal perceptual salience (C. S. Li et al., 2003). The results confirm the hypothesis and suggest that deficits in audiovisual integration in ADHD were the reason for the significant variation in IOR across groups.
In our research, the magnitude of the multisensory response enhancement effect in the cued condition was greater than that in the uncued condition in both ADHD group and TD group. The result was inconsistent with prior findings for adults (Tang et al., 2019; Van der Stoep et al., 2015). According to the hypothesis of perceptual sensitivity induced by exogenous cues, exogenous spatial attention can reduce perceptual sensitivity at cued locations compared to uncued locations (Peng et al., 2019; Van der Stoep et al., 2015). Consequently, the perceptual sensitivity tends to be weaker at the cued location than at the uncued location. Consistent with inverse effectiveness, audiovisual integration is more beneficial for weaker stimuli than for stronger ones (Otto et al., 2013; Senkowski et al., 2011). Thus, the advantage of audiovisual integration is greater for stimuli presented at cued locations than for those at uncued locations. Moreover, we also found that rMRE effect for uncued condition only exists in TD children rather than ADHD children, which shows that invalid cues impede the generation of multisensory response enhancement. This may be because in ADHD children the multisensory information produces no interaction effect and they respond relying on only one dominant modality. Using the Colavita paradigm, Qian et al. (2024) found that school-aged children with ADHD showed a larger Colavita effect, suggesting that ADHD children have a visual dominance effect. Other studies have also found visual dominance effects in the ADHD group (X. Li et al., 2024). Therefore, we speculate that due to an over reliance on visual information, ADHD children have a visual dominance effect in uncued condition.
We also identify a deficit in sustained attention among ADHD. Compared to TD, RTSD showed a general tendency to be larger than ADHD. This implies that an underlying issue for children with ADHD may involve difficulties in maintaining optimal performance, the same phenomenon also reported in previous studies (Börger et al., 1999). Children with ADHD demonstrate increased intraindividual variability across a wide range of tasks, controlled observations suggest that ADHD perform persistently inconsistently on behavioral and neurocognitive tests (Kofler et al., 2008; Lipszyc & Schachar, 2010; Tamm et al., 2012; Willcutt et al., 2008). Continuous performance tasks are widely used to examine attention and have revealed that children with ADHD are poorer and slower at detecting visual targets. They often exhibit higher rates of errors of omission, slower response speeds, and greater response variability than healthy controls (Huang-Pollock et al., 2017; Stern & Shalev, 2013; Wang et al., 2011). Attentional deficits in children with ADHD were also found by other researchers which showed significantly impaired sustained attention and visual processing speed (McAvinue et al., 2015). Recent research emphasizes that ADHD should be viewed as a heterogeneous disorder because ADHD exhibits a heterogeneous set of neuropsychological characteristics (Kofler et al., 2019; Martel et al., 2011; Wolfers et al., 2020). These findings confirm the complexity of the disorder and demonstrate the heterogeneity of ADHD (Kofler et al., 2019; Luo et al., 2019).
This research remains a limitation. In contrast to previous studies utilizing electroencephalography (EEG) techniques, the present study did not perform pre-response neural signaling. For example, research on adults with neurological malformations has used a rapid sequence visual presentation (RSVP) paradigm and event-related potential (ERP) techniques (Talsma et al., 2007) to identify changes in the early integration (50–100 ms) and late integration time windows prior to a participant’s button press. However, our study could not capture pre-response neural changes. Therefore, in the future, EEG technology can be used to further explore whether there are deficits in early integration versus late integration in children with ADHD. Additionally, as the symptoms of ADHD have a developmental change, cross-sectional and longitudinal studies should also be considered in future research on how multimodal processing develops in patients with ADHD.
Conclusion
In conclusion, there is a diminished inhibition of return for audiovisual targets in both ADHD and TD children. However, this weakening effect is attenuated in ADHD children due to their impaired audiovisual integration abilities.
Footnotes
Acknowledgements
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
Conceptualization, H.Z., Y.C., S.C., A.W., and X.T.; Methodology, H.Z., J.S., A.W., and X.T.; Formal analysis, H.Z. and J.S.; Investigation, H.Z., Y.C., and S.C.; Writing—original draft preparation, H.Z., Y.C., and S.C.; Writing—review & editing, H.Z., S.C., and A.W.; Resource, Y.C. and S.C.; Software, H.Z. and A.W.; Visualization, H.Z. and A.W.; Validation, H.Z., S.C., and A.W.; Funding acquisition, Y.C., S.C., X.T., and A.W. All authors have read and agreed to the published version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Major Program of Philosophy and Social Sciences in Jiangsu Province (2024SJZD137), the Suzhou Science and Technology Development Plan [People’s Livelihood Science and Technology: SKY2022113], the Humanities and Social Sciences Research Project of Soochow University (22XM0017) and Interdisciplinary Research Team of Humanities and Social Sciences of Soochow University (2022). Y.C. was supported by the Maternal and Child Health Scientific Research Project of Jiangsu Province (F202147). S.C. was supported by Gusu Health Talent Project of Suzhou City (GSWS2020049) and Suzhou Science and Technology Development Plan [Innovation in medical and health technology (SKY2022173)]. X.T. was supported by Natural Science Foundation of Liaoning Province (2022-MS-312).
Institutional Review Board Statement
This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of the Academic Committee of the Department of Psychology, Soochow University, China.
Informed Consent Statement
Informed consent was obtained from all parents of children according to the Declaration of Helsinki.
