Abstract
Motion-defined transparency is the perception of two or more distinct moving surfaces at the same retinal location. We explored the limits of motion transparency using superimposed surfaces of randomly positioned dots defined by differences in motion direction and colour. In one experiment, dots were red or green and we varied the proportion of dots of a single colour that moved in a single direction (‘colour-motion coherence’) and measured the threshold direction difference for discriminating between two directions. When colour-motion coherences were high (e.g., 90% of red dots moving in one direction), a smaller direction difference was required to correctly bind colour with direction than at low coherences. In another experiment, we varied the direction difference between the surfaces and measured the threshold colour-motion coherence required to discriminate between them. Generally, colour-motion coherence thresholds decreased with increasing direction differences, stabilising at direction differences around 45°. Different stimulus durations were compared, and thresholds were higher at the shortest (150 ms) compared with the longest (1,000 ms) duration. These results highlight different yet interrelated aspects of the task and the fundamental limits of the mechanisms involved: the resolution of narrowly separated directions in motion processing and the local sampling of dot colours from each surface.
Introduction
Motion transparency refers to the perception of two or more superimposed surfaces moving smoothly in different directions across the same spatial location. Crowds of people walking in different directions on a busy street as viewed from above, or a flock of birds passing in front of drifting clouds are some everyday examples. In the laboratory, motion transparency is typically studied using overlapping sets of randomly positioned, moving dots (random dot kinematograms or RDKs). The representation of such multiple velocity fields at the same region of space is a challenging computational feat, both for the visual system as well as for models that attempt to simulate it (Braddick, Wishart, & Curran, 2002; Garcia & Grossman, 2009; McDonald, Clifford, Solomon, Chen, & Solomon, 2014; Qian, Andersen, & Adelson, 1994; Snowden, Treue, Erickson, & Andersen, 1991; Snowden & Verstraten, 1999). As such, the constraints by which the perception of transparency collapses and the separate moving surfaces are instead misperceived as a single or ‘coherent’ pattern comprise an important focus of research (Suzuki & Watanabe, 2009; Takemura, Tajima, & Murakami, 2011; Williams & Sekuler, 1984).
Our understanding of motion transparency is shaped by a few key conceptual and phenomenological aspects that underlie its importance and complexity as a problem in vision. One such aspect concerns the local and global interactions at play and the neural machinery that support them. Snowden et al. (1991) recorded from cells in macaque Vl and V5/MT and reported that Vl neurons responded to their preferred direction under conditions of motion transparency, whereas V5/MT neurons showed a suppressed response compared with conditions where only the preferred direction of motion was in their receptive field. They proposed a hierarchical stream for motion processing to account for transparency that involves neural computations in both Vl and V5/MT.
Qian et al. (1994) examined the early psychophysical constraints for perceiving transparent motion and reported that when dot motion stimuli are balanced with opposing or orthogonal motions at a very fine, local level, observers perceive nontransparent flicker rather than transparently moving surfaces. On the basis of these results and recordings from single units in V1 and V5/MT, these authors suggested that at an early (local) stage of processing at the spatial scale of V1 receptive fields, the information from moving dots is pooled together. In this scenario, transparency only occurs when the local motions are unbalanced. Signals from Vl neurons feed forward onto a later stage (V5/MT) where receptive fields are larger and multiple directions of motion can be represented by neurons with overlapping receptive fields (Qian & Andersen, 1994; Treue, Hol, & Rauber, 2000). Functional magnetic resonance imaging studies in humans have also indicated that haemodynamic activity in area V5/MT+ is modulated according to whether transparent motion of two surfaces or motion of a single surface is perceived (Garcia & Grossman, 2009; Muckli, Singer, Zanella, & Goebel, 2002). Subsequent findings have highlighted the importance of global motion operations in transparency. For example, McOwan and Johnston (1996) found that whether a stimulus was perceived as transparent or not depended on how the grouping of low-level motion signals affected the global percept. Edwards and Greenwood (2005) reported that signal strength was important in setting a limit for how many motions can be perceived; a finding they proposed highlights the role of cortical areas involved in processing the global properties of a stimulus.
The nature of the computations underlying motion transparency can be informed by their tendency to produce an illusory exaggeration of the angular difference between the two component motion directions, known as the ‘direction illusion’ or ‘direction repulsion’ (Marshak & Sekuler, 1979; Mather & Moulden, 1980). The repulsion between the directions of motion (at acute veridical angular separations) of the two surfaces can often be quite large, up to around 20° (Benton & Curran, 2003; Braddick et al., 2002; Curran & Benton, 2003; Farrell-Whelan, Wenderoth, & Brooks, 2012; Grunewald, 2004; Hiris & Blake, 1996; Marshak & Sekuler, 1979; Rauber & Treue, 1999; Wiese & Wenderoth, 2007). This has been explained in terms of mutual inhibition between neural populations representing the two directions (Marshak & Sekuler, 1979; Mather & Moulden, 1980; Perry, Tahiri, & Fallah, 2014; Rauber & Treue, 1999; Treue et al., 2000; Wilson & Kim, 1994); however, there is also evidence to suggest that the illusion could result from estimation of target surface motion relative to inferred background motion (Dakin & Mareschal, 2000; Farrell-Whelan et al., 2012). The role of local and global interactions in determining direction repulsion is not entirely resolved, with some authors maintaining that it derives from early, local processes (Grunewald, 2004; Hiris & Blake, 1996; Marshak & Sekuler, 1979; Wiese & Wenderoth, 2007) and others that it has a later, more global locus (Benton & Curran, 2003; Curran & Benton, 2003; Curran, Clifford, & Benton, 2009; Kim & Wilson, 1996, 1997; Wilson & Kim, 1994). More recent reports suggest that it probably arises from activity at multiple stages (Chen, Maloney, & Clifford, 2014; Wiese & Wenderoth, 2010).
Another key aspect of motion transparency concerns how the component motions are segregated into surfaces and, in particular, how other features associated with the surfaces (such as depth, speed, spatial frequency content or colour) aid in the segregation and subsequent discrimination of the motion directions, or vice versa (Perry & Fallah, 2012; Snowden & Verstraten, 1999; Vigano, Maloney, & Clifford, 2017). Clues to the structure of the underlying mechanisms can be revealed by probing the influence of the associated features upon the perception of motion transparency or, in the opposite case, by probing the influence of motion transparency upon the perception of the associated features. These questions are particularly pertinent when surfaces serve as the unit guiding the global allocation of either feature- or surface-based visual attention (e.g., Ernst, Boynton, & Jazayeri, 2013; Mitchell, Stoner, Fallah, & Reynolds, 2003; Perry & Fallah, 2012; Reynolds, Alborzian, & Stoner, 2003; Stoner & Blanc, 2010; Valdes-Sosa, Cobo, & Pinilla, 2000) or in understanding the neural coding of colour-motion conjunctions (e.g., Seymour, Clifford, Logothetis, & Bartels, 2009; Zhang, Qiu, Zhang, Han, & Fang, 2014).
Differences in speed between the two surfaces are known to reduce direction repulsion (Braddick et al., 2002; Curran & Benton, 2003; Marshak & Sekuler, 1979; Perry et al., 2014), arguably because the targeted direction-selective mechanisms differ according to their speed tunings and thus show a reduced overlap in the population response. Although placing the two surfaces at different depth planes causes a reduction in suppression in V5/MT neurons (Bradley, Qian, & Andersen, 1995), it does not alter direction repulsion (Hiris & Blake, 1996), possibly because the transparent surfaces are already associated with some inferred difference in depth (Curran, Hibbard, & Johnston, 2007; Hiris & Blake, 1996; Qian et al., 1994). Despite this divergence in their influence upon direction repulsion, both speed (Perry et al., 2014) and stereoscopic depth (Curran et al., 2007) can improve observers’ direction discrimination when they differ in the two surfaces. Similarly, a difference in colour between surfaces does not reduce direction repulsion, but does reduce the time needed to process and report both directions (Perry & Fallah, 2012). In feature binding, colour (or contrast polarity) is generally more accurately combined with motion when the moving patterns are perceived as transparent (Clifford, Spehar, & Pearson, 2004; Moradi & Shimojo, 2004; Vigano, Maloney, & Clifford, 2014; although see Wu, Kanai, & Shimojo, 2004, for a dramatic case of transparency in central vision inducing colour-motion misbinding in the periphery). Together, these results suggest that the rapid segregation of superimposed dot fields into transparent surfaces (incorporating both their speed and direction) can allow for the appropriate assignment of other features associated with the surfaces, such as colour (Clifford et al., 2004; Moradi & Shimojo, 2004; Perry & Fallah, 2012; Perry et al., 2014; Vigano et al., 2014), as long as the surface representations are stable over time (Moradi & Shimojo, 2004; Vigano et al., 2014, 2017). This complicated interrelationship between colour and surface segregation was the impetus for this study: A novel opportunity to probe the directional limitations of motion transparency could be provided by systematic manipulations of colour coherence.
Here, we asked participants to discriminate the directions of two RDKs defined by different colours and indicate which of the two directions was more ‘clockwise’. This judgment of the relative (rather than absolute) directions ensures that both directions of motion must be encoded to perform the task accurately and can be used to quantify motion transparency (cf. Braddick et al., 2002). We carried out our measurements in two experiments. In one we varied the colour-motion coherence, or the proportion of dots within each RDK assigned to one or the other colour (red or green), and measured the threshold direction difference required for transparency (Experiment 1a). A high colour-motion coherence signifies that most dots of the same colour (e.g., red) are drifting in the same direction, whereas a low colour-motion coherence signifies that a similar number of dots of the same colour drift in the two different directions (or at 0% coherence, the number of dots of a given colour in each RDK are equal). In the other experiment, we did the opposite: We varied the direction difference between the RDKs and measured the threshold colour-motion coherence required for transparency (Experiment 1b). Thresholds were estimated at three different stimulus durations in order to place the system under further pressure to tease apart the underlying mechanisms and gauge their time course. A series of control conditions similar to those used by Braddick et al. (2002) and Curran et al. (2007) also allowed us to establish a baseline direction difference threshold when the motions were not transparent, but rather occupied adjacent, nonoverlapping spatial locations.
Using this type of stimulus, we predicted that performance should be determined by two different factors. At one extreme, when the two surfaces are fully segregated by their colour (high colour-motion coherence), the limit in performance as measured by transparency should reflect an observer’s ability to discriminate the component directions of motion: hence the directional limit of transparency. At very small direction differences (e.g., 15–20° or less), transparency should collapse and a single pattern motion will be perceived, despite the colour differences in the two surfaces. At the other extreme, when the surfaces are weakly differentiated by colour (e.g., for colour-motion coherences of 50% or less), performance should be impaired, even for large directional separations. In this case, performance is limited by the sampling of dot colours from each surface.
Methods
Participants
Informed written consent for participation was obtained from a total of seven adult participants (four female): the same four in Experiments 1a and 1b and five in the control conditions. All were experienced psychophysical observers with normal or corrected-to-normal visual acuity and normal trichromacy. Two of the authors (I. M. and R. T. M.) took part in all procedures while the remaining participants were naïve to the theoretical motivation for the study. All experimental protocols were approved by the University of Sydney Human Research Ethics Committee and met the requirements of the Code of Ethics of the World Medical Association (Declaration of Helsinki).
Apparatus
Stimulus generation, experiment control and the recording of participants’ responses were run under MATLAB™ 7.13 (The MathWorks, Ltd, Natick, MA), incorporating elements of the PsychToolbox 3.0 (Brainard, 1997; Kleiner, 2013; Pelli, 1997). Stimuli were displayed on a Sony Trinitron 20SE monitor (resolution: 1024 × 768 pixels; refresh rate: 75 Hz) driven by an NVIDIA GeForce GTS 240 graphics card. The display was calibrated using a ColourCAL colourimeter (Cambridge Research Systems, Rochester, UK) and linearised using look-up tables in software. Participants were positioned at a viewing distance of 57 cm, where one pixel subtended 2.2 arcmin.
Stimulus and Procedure
In Experiments 1a and 1b, two spatially and temporally overlapping RDKs were translated linearly along two different trajectories. Each RDK consisted of 250 dots drifting at a constant speed of 5 pixels/frame (13.5°/s) within a circular annulus centred on a black fixation square. The distance from the square to the inner radius of the annulus was 2.28° and was devoid of dots to circumvent tracking of any individual dot motion. The outer radius of the annulus was 12.35° and both edges of the annulus were multiplied by a raised cosine contrast ramp of 0.85°. The circular dots (diameter, 0.46°) were either red or green and were set to each participant’s isoluminant point using the method of minimum heterochromatic flicker (Ives, 1912; Walsh, 1953). The participants’ task across all experiments was to fixate the black square and determine, by pressing one of two keys, whether the RDK containing the greater proportion of red dots was drifting in either a more clockwise or more counter-clockwise direction relative to the RDK containing the greater proportion of green dots. The base direction (i.e., the direction that bisected the two RDK’s unique directions) was selected at random on each trial. Across different blocks, the stimuli were presented at full contrast for short (150 ms) or long (1,000 ms) durations (while the authors I. M. and R. T. M. also undertook an intermediate duration of 300 ms). To minimise transients, stimulus contrast was ramped from zero to full contrast and back across a raised cosine temporal window that lasted half the nominal stimulus duration.
Two parameters were independently manipulated (see Figure 1). In Experiment 1a, the colour-motion coherence of the RDKs was systematically varied. This was done in fixed steps of 10% ranging from 40% to 90% colour-motion coherence for all participants (the authors I. M. and R. T. M. also performed a 100% coherence condition that formed part of the control experiments; detailed later in this article). Colour-motion coherence determined the proportion of dots of a given RDK that were correctly assigned their colour. For instance, at 100% colour-motion coherence, all dots in the ‘red’ RDK were correctly assigned the colour red. At 50% colour-motion coherence, 50% of the dots were correctly assigned the colour red, while the remaining 50% of the dots were randomly assigned their colour in equal proportions of red and green. Two interleaved adaptive staircases varied the direction offset of the two RDKs (to a maximum of 90°) at each fixed colour-motion coherence to converge on a threshold performance of 80.3% correct according to the Weibull (i.e., Quick, 1974) psychometric function, following the Bayesian ‘Psi’ procedure of Kontsevich and Tyler (1999).
Schematic single frames from two extreme example stimuli. Two random dot kinematograms (only 10 dots in each RDK are illustrated for clarity), whose colour-motion coherence could vary (abscissa) or whose direction difference could vary (ordinate) were used. The dotted black arrow indicates the base (i.e., bisecting) direction between the two motion directions, which was selected at random on each trial. Note that this line, the dashed circular outlines and the arrowheads are for illustration only and were not present in the actual stimulus. (a) High direction difference but low colour-motion coherence. In this example, the two RDKs are weakly segregated by colour (20% colour-motion coherence), though the direction difference is large (75°). Here, performance is mainly limited by the sampling of dot colours from each surface. (b) High colour-motion coherence (80%) but low direction difference (15°). In this example, the two RDKs are nearly perfectly segregated in terms of their colour, and performance is mainly limited by direction discrimination. The stacked bar plots show the proportions of red or green dots for each direction represented. In both (a) and (b), the correct response would be that the direction of the predominantly red RDK is ‘more clockwise’ than the direction of the predominantly green RDK.
In Experiment 1b, we systematically varied the angular direction offset between the two RDKs from low (15°) to high (90°) in fixed 15° steps. Similarly, two interleaved adaptive Psi staircases were used at each fixed direction difference to converge on a threshold colour-motion coherence (yielding 80.3% correct performance). A schematic example of one frame of the different stimulus types is shown in Figure 1. In each experiment, participants completed a minimum of three runs consisting of two interleaved staircases (30 trials each), yielding two estimates of the threshold direction difference (1a) or colour-motion coherence (1b) required for accurate colour-motion binding per condition, per run. Each staircase also provided a standard error of the estimate that we converted into threshold 95% confidence intervals.
Control Conditions
We implemented two control conditions similar to those used by Braddick et al. (2002) to establish baseline measurements of direction difference thresholds with RDKs that did not share the same spatial location (and hence would not be perceived as transparent). Participants performed the same task as Experiments 1a and 1b (i.e., discriminated the directions of the RDKs by indicating whether the red RDK direction was more clockwise or more counter-clockwise to the green RDK direction). Apart from the modifications described later, the stimuli and procedure were otherwise identical to those of Experiment 1. Colour-motion coherence was fixed at 100%, and again two interleaved staircases were used to converge upon the threshold direction difference in all control conditions. The control conditions took the form of a 3 (spatial arrangement) × 3 (duration) design. The following were the three spatial arrangements:
A centre-surround stimulus where one RDK occupied a central region within an annular surround. The radius of the central region extended to 6.2°. The (target) colour in the centre position was alternated across interleaved, separate blocks of trials. A ‘bipartite’ stimulus whereby the display region was divided in half and the two RDKs occupied opposite halves. The orientation of the (illusory) midline varied randomly across trials (as did the base direction between the two motions). The third spatial arrangement was the transparent motion display identical to that of Experiment 1a.
In separate blocks, all participants performed the direction discrimination task at each of the spatial arrangements, at stimulus durations of 150, 300 or 1,000 ms.
Results
Experiment 1a: Direction Difference Thresholds at Fixed Colour-Motion Coherences
Figure 2 plots the threshold direction differences for each participant and the group average as a function of different colour-motion coherences, for each duration condition. All participants showed a similar general trend. Thresholds for accurately discriminating the two directions of motion tended to decrease with increasing colour-motion coherence and were lower for longer stimulus duration presentations. Conversely, at the shortest duration (150 ms) and lowest colour-motion coherence level (40%), participants were performing nearer to chance (according to 1,000 simulated staircases with random responding; indicated by the dashed horizontal line).
Results of Experiment 1a: direction difference thresholds at fixed values of colour-motion coherence. Each panel gives the individual participant data, while the rightmost bottom panel gives the group mean for 150 and 1,000 ms conditions. Threshold values are the mean of those obtained from each run of the staircase. Error bars give the 95% confidence intervals of the thresholds for each participant, while for the group-averaged data they give ±1 standard error. The dashed horizontal line indicates chance direction discrimination. Note that data for the 100% colour-motion coherence conditions performed by participants I. M. and R. T. M. are the same as those from the control experiment.
Experiment 1b: Colour-Motion Coherence Thresholds at Fixed Direction Differences
Figure 3 gives the threshold colour-motion coherence values for each participant and the group average as a function of RDK direction differences. For long presentation durations (white symbols), the colour-motion coherence required to accurately distinguish the two directions was low (roughly 30%) at direction differences of 45° or greater. This remained fairly constant across direction differences for participants H. R. and R. M., while participants I. M. and R. T. M. displayed a clearer decrease in performance at smaller direction differences. When presentation duration was reduced to 150 ms (black symbols), overall performance was worse (colour-motion coherence thresholds were higher) across all direction differences and all participants showed a steady increase in threshold as the direction difference decreased. At the smallest direction difference (15°), direction discrimination performance was at chance (indicated by the dashed horizontal line) at durations of 150 ms for all participants and 300 ms where it was measured for participants R. T. M. and I. M.
Results of Experiment 1b: colour-motion coherence thresholds at fixed direction differences for the four participants and the group average (for two durations only). Threshold values are the mean of those obtained in each run of the staircase. The dashed horizontal line indicates chance performance as estimated from the mean of 1,000 simulations of the staircase with random responding. Error bars give the 95% confidence intervals of the thresholds for each participant, while for the group-averaged data they give ±1 standard error.
Probing the Limits of Motion Transparency
In order to quantify the limits of motion transparency perception, the data obtained in Experiments 1a and 1b were replotted on the same graph. All direction difference values (whether measured thresholds or fixed values) therefore varied along the ordinate while all colour-motion coherence values (whether measured thresholds or fixed values) varied along the abscissa (Figure 4). When replotted in this format, the data display a general ‘L’ shape with two sections: a horizontal portion reflecting more or less constant direction differences across most colour-motion coherence levels and a vertical section reflecting more or less constant colour-motion coherence for most direction difference levels. The shape of this curve suggests that performance is limited by a critical direction difference for motion transparency perception until colour-motion coherence levels approach roughly 30% (long durations) or 50% (short durations), at which point performance is limited by the difficulty in identifying the predominant colours of the motion-defined surfaces.
Data from Experiments 1a and 1b with direction difference (measured thresholds and fixed values) plotted on the ordinate against colour-motion coherence (measured thresholds and fixed values) on the abscissa for each individual participant and the group-averaged data. Solid lines are fits through the data for the different stimulus durations. Error bars give the 95% confidence intervals of the measured thresholds: vertical for direction differences or horizontal for colour-motion coherences (or ±1 SE in the case of the group data).
Asymptotic Ordinate (d0) and Abscissa (c0) Values in the Curve Fits to the Data.
What limits performance on this task? When the colour-motion coherence is high, this corresponds to two surfaces fully segregated by colour and drifting in different directions. In this case, performance should only be limited by the observer’s ability to discriminate the directions of the two transparent motions. When colour-motion coherence is low, the two surfaces are poorly segregated by their colour and performance is limited by the inability to determine which surface contains more red (or green) dots (i.e., dot sampling). From Figure 4, it can be observed that the direction differences tend to converge to similar asymptotes at high colour-motion coherences across the different durations (around 15°; compare d0 values in Table 1), but colour-motion coherences remain higher at shorter compared with longer durations, even at high direction differences (c0 values in Table 1, about 50% and 30% colour-motion coherence, respectively). This was confirmed in paired t tests across the fitted asymptotic values for the four participants, where there was a significant difference between colour-motion coherence asymptotes at the shortest (150) and longest (1,000 ms) durations, t3 = 6.35, p = .008, but not for the direction difference asymptotes, t3 = 1.56, p = .22 (although admittedly in this latter result there is possible danger of a Type II error, since in two of the four participants, the asymptote in the short duration condition is two to three times greater than that of the long duration condition).
Control Conditions
To determine the extent to which discrimination in transparent motion is worse than for nontransparent motion, we ran a control experiment where the motions were not superimposed but instead occupied adjacent spatial regions, and colour-motion coherence was fixed at 100%. Note that, although they were no longer transparent, the two motions in the centre-surround annular display should still be prone to the same interactions underlying direction repulsion, as many studies using this type of stimulus arrangement have shown (e.g., Chen et al., 2014; Kim & Wilson, 1997; Takemura et al., 2011; Wiese & Wenderoth, 2010). Data from five participants as well as the group average in the control experiment are plotted in Figure 5.
Direction difference thresholds for the control experiment using spatially segregated motions in a centre-surround or bipartite display, as well as in the transparent motion case. Colour-motion coherence was fixed at 100% in all conditions. Threshold values are the mean of those obtained in each run of the staircase. The dashed horizontal line indicates chance direction discrimination as estimated from the mean of 1,000 simulations of the staircase with random responding. Error bars give the 95% confidence intervals of the thresholds for each participant, while for the group-averaged data they give ±1 standard error. Note that the data for the transparent motion condition at each duration for participants I. M. and R. T. M. are the same as those given in Figure 2.
Overall, from Figure 5, it appears there is a similar decrease in direction difference threshold as a function of stimulus duration (as in Experiment 1a), and generally higher thresholds in the transparent condition compared with the spatially segregated (centre-surround and bipartite) conditions. A 3 × 3 (spatial arrangement × duration) factorial repeated-measures analysis of variance (ANOVA) indicated a significant main effect of duration, F(2, 8) = 5.67, p = .029; but not of spatial arrangement, F(2, 8) = 0.97, p = .42. There was also a significant interaction between duration and spatial arrangement, F(4, 16) = 4.21, p = .016. These results indicate that if there is any perceptual ‘cost’ of transparency compared with spatially segregated motions, it is more apparent at very short durations. We looked at this further by submitting the data at the shortest duration tested (150 ms) to a separate one-way ANOVA, which indicated a significant effect of spatial arrangement at this duration, F(2, 8) = 9.19, p = .009. Post hoc tests using the Tukey’s honest significant difference correction indicated no difference in direction discrimination thresholds between the two spatially segregated displays (p = .40). Thresholds were significantly higher in the transparent motion display compared with the bipartite display (p = .007), although they failed to be significant when compared with the centre-surround display (p = .052).
Discussion
We used the discrimination of the directions of two superimposed moving random dot fields to quantify perceptual motion transparency. The limits of motion-defined transparency for differently coloured surfaces were estimated by varying the colour-motion coherence between surfaces and measuring the required direction difference (Experiment 1a), or by varying the direction difference between surfaces and measuring the required colour-motion coherence (Experiment 1b). We found that colour segregation cues for the two surfaces could only go so far in aiding direction discrimination: Even for very high colour-motion coherences, the lower limit of motion transparency is about a 15° angular direction difference across different stimulus durations (150–1,000 ms). Conversely, direction discrimination was no longer supported at around 50% colour-motion coherence at the shortest duration (150 ms) and 30% for the longest duration (1,000 ms), where performance collapsed. In agreement with these results, a control experiment using spatially segregated (and hence nontransparent) motions at 100% colour-motion coherence further revealed that the perceptual cost of transparency is most apparent at very short stimulus durations (150 ms).
In both Experiments 1a and 1b, the perceptual direction discrimination performed is the same, but the quality of the sensory information upon which the discrimination is based varies in different ways. At low colour-motion coherences, it may be trivial to distinguish between the two directions when the difference is large, but limitations in dot sampling make it very difficult to assign the correct colours to the motion-defined surfaces (i.e., which one is ‘more red’). In other words, under these circumstances a given local sample of dots may have an ambiguous (or even opposite) assignment of colour compared with the global surface as a whole: there may be more (or the same amount of) green dots travelling in the ‘red’ surface’s direction. Although the colours may be correctly assigned to the directions in these crude local samples, they contribute to a noisy global representation of each surface because the visual system is unable to sample every dot across the display (especially at very short durations), and the relative direction judgement becomes very difficult. At very low direction differences, motion transparency collapses and the task becomes impossible even when the surfaces are highly differentiated by colour. This may reflect fundamental limitations of the motion-processing system in resolving narrowly separated directions that cannot be overcome even with the addition of colour segregation information (Mitchell et al., 2003; Perry & Fallah, 2012; Suzuki & Watanabe, 2009). Although it does not change the main thesis of this article, an alternative method of measuring directional limits on motion transparency such as that employed by Braddick et al. (2002, Experiment 2) could potentially be used to circumvent such limitations of local dot sampling. They required subjects to set the orientation of two lines to correspond to the perceived direction of two superimposed random dot patterns translating in different directions. They found that for small angles between the motions, up to around 22.5°, observers often set the two lines to the same orientation, indicating that the motion was not perceived as transparent. To investigate the role of colour in the segregation of bi-vectorial motion stimuli one could compare performance on this task between conditions where colour coherence also varied. The key difference to our task is that it would not require observers to identify the (predominant) colour associated with each direction of motion, so could in principle be performed over the full range of colour coherence (0–100%). Any effect of colour coherence on performance could then be attributed to its role in segregating (or integrating) the stimulus into two (or one) discrete sets of dots (cf. Perry & Fallah, 2012; Vigano et al., 2014, 2017).
Colour is known to be an effective cue to surface segregation in motion-defined transparency (Croner & Albright, 1997, 1999; Snowden & Edmunds, 1999), though the precise nature and locus of this colour advantage is not well understood (Edwards & Badcock, 1996; Li & Kingdom, 2001; Perry & Fallah, 2012; Poom & Börjesson, 2005; Snowden & Edmunds, 1999). Furthermore, the segregation of coherent transparent motion from incoherent motion noise in the responses of V5/MT neurons is improved by colour, albeit not enough to account for the improvements as measured psychophysically (Croner & Albright, 1999). While features traditionally held to be processed in the V5/MT ‘dorsal’ stream (such as speed, stereoscopic depth and spatial frequency) are thought to be integrated with direction early in processing, ‘ventral’ stream features (such as colour or contrast polarity) are thought to have more of a role in the allocation of attention to the surfaces, guiding their segregation (Bradley et al., 1995; Croner & Albright, 1999; Ernst et al., 2013; Fallah, Stoner, & Reynolds, 2007; Hibbard & Bradshaw, 1999; Li & Kingdom, 2001; Perry et al., 2014; Poom & Börjesson, 2005; Snowden & Edmunds, 1999; Snowden & Rossiter, 1999). In particular, this helps explain why fully differentiating surfaces by colour (100% colour-motion coherence) can improve the accuracy and reduce the time needed for direction discrimination, without affecting the perceived direction difference (i.e., the direction repulsion; Perry & Fallah, 2012).
Although we did not measure attentional variables directly in this study, the data do afford us some room to speculate on the role of feature-based attention in light of the fundamental limitations of the motion-processing system revealed and its role in the binding of colour with motion. Global feature-based attention to a target colour (e.g., ‘red’) may guide the segregation of differently coloured surfaces and promote the appropriate assignment (or ‘binding’) of their associated global direction (Clifford, 2010; Moradi & Shimojo, 2004; Reynolds et al., 2003; Stoner & Blanc, 2010; Vigano et al., 2014; Vigano, Maloney, & Clifford, 2015). When direction differences reach around 15° or less, the two directions are prone to interference or mutual inhibition purely in the motion pathway that is immune to any advantage conferred by a colour-based attentional gain (Perry et al., 2014). Similarly, much higher colour-motion coherences were required for accurate direction discrimination at shorter durations. This is consistent with evidence that feature-based attention operates at a slower time scale than its spatial analogue (Hayden & Gallant, 2005; Liu, Stevens, & Carrasco, 2007): Any advantages conferred by feature-based attention in surface segregation can only emerge if adequate time is made available for feature-based attention to engage (cf. Snowden & Edmunds, 1999). The processing of colour information appears to precede motion (Arnold, 2005; Arnold & Clifford, 2002; Moutoussis & Zeki, 1997; Viviani & Aymoz, 2001), indicating that it should be available early in processing to guide surface segregation (Perry & Fallah, 2012). Accordingly, the time required to discriminate the directions of two motion-defined surfaces that are entirely differentiated by colour (100% colour-motion coherence) is substantially shorter (an average of about 840 ms) than when the surfaces are the same colour (an average of approximately 1,500 ms; Perry & Fallah, 2012).
Feature-based selection of one surface over the other in performing the direction discrimination could operate in the form of targeted feedback from higher posterior parietal and frontal cortical centres responsible for the allocation of attention (both spatial and feature-based; Egner et al., 2008; Freedman & Assad, 2009; Liu, Slotnick, Serences, & Yantis, 2003; Ogawa & Komatsu, 2009), to earlier visual areas with finer resolution maps of feature tunings, such as V1 (Bisley & Goldberg, 2010). Reciprocal connections between early visual cortex and parietofrontal centres, including feedback, are a key component in many models of visual feature binding (Bouvier & Treisman, 2010; Clifford, 2010; Di Lollo, Enns, & Rensink, 2000; Hochstein & Ahissar, 2002; Juan & Walsh, 2003; Shipp, Adams, Moutoussis, & Zeki, 2009; Vigano et al., 2015). Feature-based attention would increase the gain of cells throughout visual cortex selective for the target colour (e.g., red) in a spatially global manner (Bichot, Rossi, & Desimone, 2005; Boynton, 2005; Hayden & Gallant, 2005; Maunsell & Treue, 2006; Saenz, Buracas, & Boynton, 2002, 2003; Serences & Boynton, 2007). Amongst this population of targeted neurons would be those that have been identified throughout primate early visual cortex to show dual selectivity to combinations of colour and motion (e.g., Gegenfurtner, Kiper, & Fenstemaker, 1996, Gegenfurtner, Kiper, & Levitt, 1997; Leventhal, Thompson, Liu, Zhou, & Ault, 1995; Shipp et al., 2009). Targeted enhancement of such ‘double duty’ cells strengthens the representation of the target colour surface, segregating it and its (globally averaged) associated direction from the other surface (Clifford, 2010; Vigano et al., 2015). Once the surfaces are segregated, a comparison of the two directions can be made based on the global activity of direction-selective cells (Braddick et al., 2002; Perry & Fallah, 2012; Stoner & Blanc, 2010). Any influence of feature-based attention on the direction discrimination task will still be limited, however, by the strength or quality of the inputs: As we show here, 50% or 30% colour-motion coherences or less prove to be too noisy to support the task at durations of 150 or 1,000 ms, respectively, while a direction difference of 15° appears to be the lower limit for motion transparency, even when the surfaces are otherwise very strongly differentiated by colour (high colour-motion coherence).
The demands of the direction discrimination task performed in Experiment 1 are complex: The surfaces must be segregated, the colour and motion inputs integrated and a relative judgement of the two directions must be made. In principle, the effects of colour-tuned feature-based attention in the direction discrimination task could apply at a number of points in this process. These include at the level of activity of early spatiotemporal filters selective for the colour, direction and spatial frequency of the surfaces; in the consolidation of the surfaces into a durable, posticonic short-term memory trace that provides the substrate for the subsequent relative direction judgement; or by influencing the perceptual decision itself (Awh, Vogel, & Oh, 2006; Bundesen, Habekost, & Kyllingsbaek, 2011; Reeves & Sperling, 1986; Smith & Ratcliff, 2009; Sperling & Weichselgartner, 1995; White, Rolfs, & Carrasco, 2015). Any such interactions occur at a very fine time scale upon which we can only speculate on the basis of the present results. However, there are models of attention and perceptual decision-making out there that could allow these processes to be isolated at such fine temporal scales by taking into account both response accuracy and reaction time distributions, and by modelling the decision in terms of a stochastic diffusion process, whereby noisy sensory evidence for the different stimulus alternatives accumulates until a decision criterion or boundary is reached that determines the behavioural choice (e.g., Ratcliff & Rouder, 2000; Ratcliff & Smith, 2004; Smith & Ratcliff, 2004, 2009; Smith & Sewell, 2013).
Such models are neurally principled (Gold & Shadlen, 2007; Huk & Meister, 2012; Smith, 2010; Smith & Ratcliff, 2004) and have found wide application in describing both patterns of behavioural responses in simple perceptual tasks (e.g., Palmer, Huk, & Shadlen, 2005; Ratcliff & Rouder, 2000; Sewell & Smith, 2012; Smith, Ellis, Sewell, & Wolfgang, 2010; Smith, Ratcliff, & Wolfgang, 2004; Vigano et al., 2017) and single-unit recordings made in areas implicated in attention, oculomotor control and decision-making, such as the lateral intraparietal area, the frontal eye fields and the superior colliculus (e.g., Basso & Wurtz, 1998; Churchland, Kiani, & Shadlen, 2008; Hanes & Schall, 1996; Hanks, Ditterich, & Shadlen, 2006; Huk & Shadlen, 2005; Kiani & Shadlen, 2009; Ratcliff, Cherian, & Segraves, 2003; Ratcliff, Hasegawa, Hasegawa, Smith, & Segraves, 2007; Roitman & Shadlen, 2002; Shadlen & Newsome, 2001; Yang & Shadlen, 2007). These recordings are typically made while monkeys perform forced-choice psychophysical discriminations, such as judgements on the direction of transparent motion fields (e.g., Hanks et al., 2006; Roitman & Shadlen, 2002). In general, these studies indicate that the firing rates of cells gradually increase as the stimulus information favours a choice target presented within their receptive fields. The rate of this build-up in firing is proportional to the strength of the stimulus information. In particular, the behavioural response (usually a saccade) is made shortly after the firing rate reaches a specific level, suggestive of a decision criterion; this level is about the same across grades of task difficulty and reaction times (Churchland et al., 2008; Roitman & Shadlen, 2002). In the direction discrimination task used here, feature-based attention to the target colour could manifest itself at the decision level, either by speeding up the rate at which evidence for the target surface is sampled (Perry & Fallah, 2012) or by improving the strength or quality of the global surface representations upon which the decision is based (Smith & Ratcliff, 2009; Vigano et al., 2014, 2015, 2017). Either way, decreasing the colour-motion coherence has the effect of adding noise to the decision process and delaying or disrupting the evidence accumulation of the attended-colour surface. Because of their ability to jointly account for accuracy, response times and single-unit activity, decision models of the type described earlier offer important tools for future investigations in isolating the processes underlying performance in motion transparency, especially with respect to the role of colour.
To summarise, the perception of motion transparency represents a considerable challenge for the visual system. While colour can improve the segregation of surfaces and subsequent discrimination of the associated directions (Perry & Fallah, 2012; Stoner & Blanc, 2010), this advantage diminishes at direction differences of 15° or less across a range of stimulus durations, even for surfaces fully differentiated by their colour. Furthermore, segregation of the surfaces by their colour reaches its limit at colour-motion coherences of around 50% for short duration stimuli (150 ms) or 30% for longer duration stimuli (1,000 ms), even at high direction differences. Both aspects isolate different yet interrelated aspects of the task and the fundamental limits of the mechanisms involved: motion-defined surface transparency and sampling of dot colours from each surface. Feature-based attention to the colour of a target surface might guide surface segregation and the subsequent assignment of direction, though further work is needed to pinpoint the precise locus of these effects, where we hope the relative direction discrimination task introduced here might find utility.
Footnotes
Acknowledgements
The authors thank the participants of this experiment for their time and effort in taking part in the study. The authors also acknowledge the helpful suggestions of anonymous reviewers on an earlier version of this manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by an Australian Research Council Future Fellowship Grant FT110100150 (to C.W.G.C.), an Australian National Health and Medical Research Council Grant APP1027258 (to C.W.G.C.) and the Australian Research Council Centre of Excellence in Vision Science. I. M. is supported by a Leverhulme Project Grant RPG-2013-218.
