Abstract
During the observation of a single object, orientation and spatial frequency are jointly coded in an early stage of visual processing, as is evident from studies on the aftereffects of specific combinations of both these features. However, they become independent in the decision-making stage because observers can identify one feature while ignoring the other. Does this separability expand into the perception of ensemble representations? This study investigated the effect of the spatial frequencies of Gabor patches on orientation averaging. In the experiment, the average orientations of all eight patches composed of either homogeneous (i.e., eight 3 cycles/degree or 0.8 cycles/degree patches) or heterogeneous (i.e., four 3 cycles/degree and four 0.8 cycles/degree patches) spatial frequencies were stably estimated if the orientation varied within the range of ±7.7° around the true mean. However, when the range was extended to ±14°, we found that the averaging performance was better in the homogeneous lower spatial frequency than in the homogeneous higher spatial frequency and heterogeneous spatial frequency conditions. These results suggest that an ensemble perception of orientation is modulated by spatial frequency components.
Keywords
It is crucial for human beings to gain abundant information about the environment while conserving limited cognitive resources (Rosch, 1978). To maintain the cognitive economy, the human brain must employ minimal attention and manage enormous visual information within an extremely brief period. Humans use ensemble perception, which involves the acquisition of summary representations from scattered visual stimuli, as a practical method for cognitive economy. It is a feasible short-term process wherein the human visual system receives discrete stimuli, detects similar or redundant values of their features, and integrates them, to finally produce summary representations such as seeing the flight direction of a flock of birds (Alvarez, 2011; Ariely, 2001; Chong & Treisman, 2005; Robitaille & Harris, 2011; Whitney & Leib, 2018). Previous studies reveal human observers’ accuracy in determining the average of both lower- and higher-order visual features, such as brightness (Bauer, 2009), location, and the centroid among multiple stimuli (Alvarez & Oliva, 2008; Vishwanath & Kowler, 2003). Similar results have been observed for orientation (Dakin, 2001; Dakin & Watt, 1997; Parkes et al., 2001), hue (Webster et al., 2014), size (Ariely, 2001; Chong & Treisman, 2003, 2005), motion direction (Sweeny et al., 2013; Watamaniuk & Mckee, 1998), and family facial resemblance (Bai et al., 2015; Haberman et al., 2015).
However, it remains unclear how the visual system allows a target feature to interact with or filter out other features in an ensemble computation. For example, do combined visual components interfere with the computation of a group orientation? Our objective was to investigate the effect of the interaction between multiple visual features within objects during an ensemble perception. Orientation and spatial frequency, in particular, are basic components with selective sensitivity characteristics at the primary stage of visual processing. The spatial distribution of these features plays an important role in the perception of texture (Dakin, 2001; Dakin & Watt, 1997; Parkes, et al., 2001) and the spatial layout of scenes (Alvarez & Oliva, 2009; Brady et al., 2017). Moreover, studies on these features are well established (Blakemore et al., 1970; Blakemore & Campbell, 1969; De Valois et al., 1974; De Valois, Albrecht et al., 1982; De Valois, Yund et al., 1982; Hancock et al., 2010). Furthermore, it is helpful to estimate the effect of one feature on the other in an ensemble computation because each variable can be manipulated independently.
Differences in feature processing are determined by integrality and separability (Garner, 1974a; Garner, 1974b; Garner, 1976). The attributes of an object are represented in an integral dimension (i.e., Euclidean metric space) if observers cannot disregard other properties when perceiving or attending to one feature (e.g., saturation and brightness). However, an object's attribute is represented in a separable dimension (i.e., city-block metric space) if observers can perceive and attend to one feature without considering others (e.g., size and orientation). It is extremely difficult to generate independent summary statistical representations of either saturation or brightness from colored stimuli because these indivisible characteristics are coded and subsequently processed as whole color information, whose properties are dissociated only by cognitive effort (Algom & Fitousi, 2016; Garner, 1978; Garner & Felfoldy, 1970; Jones & Goldstone, 2013). On the other hand, for size and orientation in a separable dimension, each summary statistical representation is simultaneously produced (Attarha & Moore, 2015; Yörük & Boduroglu, 2020); summary representations of two or more features can be separately generated if distinct attentional resources are allocated to them. Intriguingly, the orientation and spatial frequency within a single object are jointly coded (e.g., Burr & Wijesundra, 1991; Caelli & Moraglia, 1985; Gilinsky, 1968; Phillips & Wilson, 1984). For instance, the aftereffect of contrast sensitivity is maximized when both the orientations and spatial frequencies of test gratings are identical to those of adapting gratings; the aftereffect selective to spatial frequencies weakens if the orientations of both the test and adaptation gratings are inconsistent (Blakemore & Campbell, 1969; Blakemore & Nachmias, 1971). Whereas when these components interact in a stage of sensory input, they become independent of each other in the recognition stage. Chua (1990) suggested that one aspect of the features could be individually extracted. He asked observers to pay attention to either spatial frequency or orientation in a Gabor patch before the test blocks and required the observers to identify the patch's frequency or orientation using numerical labels assigned to each test patch in advance. In this judgment experiment, the prior allocation of distinct attentional resources allowed observers to discriminate between the range of orientations and spatial frequencies (Chua, 1990).
We can hypothesize that the average orientation is distinguished from the spatial frequency component because observers are required to solely look at the entire patch's orientation in advance. If human observers can judge a single orientation without any influence of spatial frequency, they might recognize the average orientation in the same manner. In contrast, if the two visual components interact during an ensemble computation, the average estimation fluctuates within the range of spatial frequencies. In this study, the orientations of eight Gabor patches were averaged by observers. There were three spatial frequency conditions: homogeneous higher spatial frequency (HSF) (three cycles/degree), homogeneous lower spatial frequency (LSF) (0.8 cycles/degree), and heterogeneous spatial frequencies (four 3 cycles/degree and four 0.8 cycles/degree patches in the test set). To test steadiness in performance, the orientation variability was small (the range of ±7.7° around the true mean) in Experiment 1 and expanded (to ±14°) in Experiment 2. The averaging task was conducted using the method of adjustment, and the point of subjective equality (PSE) was collected from every trial. Relative angular errors seemed more informative than absolute angular errors with respect to considering the anisotropy of global bias and the expanse of individual errors. Hence, we fitted the von Mises distribution, also called the circular normal distribution, to the errors between the physical mean and the participant's estimate, and then obtained two parameters: one was the mean direction μ that was a position where all errors were averaged on the circumference, and the other was the concentration parameter κ representing how broadly the errors scattered around the mean direction μ. The concentration parameter plays a similar role to the inverse of standard deviation of the Gaussian distribution, such that a higher concentration represents the convergence of errors. In accordance with the experiment, these parameters reflect a bias in estimated average from the physical average of eight orientations, and the precision of the estimated average, respectively. Considering these parameters independently tells us about the type of feature interaction that will appear. Especially, we focus on the concentration parameters because previous studies on the ensemble of orientations have already revealed that observers could identify the mean accurately (Dakin, 2001; Parkes et al., 2001). If spatial frequency can be filtered out in orientation ensemble perception, there will be no effect of spatial frequencies on the concentration parameters. If orientation and spatial frequency interact in ensemble perception, the concentration parameters for the orientation ensemble will be modulated by spatial frequencies.
General Method
Participants
Twenty naive observers participated in Experiment 1 (age range 19–43 years; 10 men and 10 women) and another 20 naive observers participated in Experiment 2 (age range 18–36 years; 11 men and nine women), all of whom had a normal or corrected-to-normal vision. We calculated the sample size in advance by G*Power 3.1.9.4 for Windows 10 (Faul et al., 2007) with the following parameters: effect size f = 0.35, α = 0.05, power (
Apparatus and Stimuli
Observers individually viewed stimuli on a 17-inch cathode ray tube monitor (EIZO FlexScan E57Ts) at a constant distance of 57 cm in a dark room. As illustrated in Figure 1, the stimuli for the two experiments were Gabor patches with spatial frequencies of three cycles/degree (HSF) and 0.8 cycles/degree (LSF), which were created by MATLAB R2019b (MathWorks, Natick, MA) using the Psychtoolbox extensions (Brainard, 1997). We fixed the Michelson contrast level of the LSF patches at 0.4, and observers calibrated the contrast of the HSF to the same level as the method of adjustment, including 36 trials before the orientation-averaging task. This procedure was conducted for every observer to prevent the detectability bias of one party pertaining to the spatial frequencies in peripheral vision. In the contrast calibration task, a pair of HSF and LSF patches was presented horizontally at a visual angle eccentricity of

Examples of stimuli in Experiment 1 (Top) and Experiment 2 (Bottom). Eight patches were randomly oriented within the range of ±7.7° (Experiment 1) and ±14° (Experiment 2) around the true mean. Column 1: a set of homogeneous Gabor patches of higher spatial frequency (HSF; 3 cycles/degree). Column 2: a set of homogeneous patches of lower spatial frequency (LSF; 0.8 cycles/degree). Column 3: a set of heterogeneous patches of HSF and LSF. SF: spatial frequency.
Procedure
The task was to indicate the average orientation of the eight patches. Observers looked at a fixation point for 800–1200 ms randomly, varied across trials, and then a circular configuration of the patches was presented for 250 ms. After the test stimuli, the observers adjusted a black bar with a
Experiment 1
Experiment 1 aimed to measure the estimation sensitivity of the average orientation and to clarify whether there was a difference between spatial frequencies. The spatial frequency factor included three conditions: the HSF, LSF, and SF-mixed conditions as a within-subject factor. We added the SF-mixed condition to ascertain the presence of cognitive costs to integrate orientations from different frequencies. Even if the average orientation estimation was constant between the homogeneous HSF and LSF conditions, it might not always imply that there was no effort to calculate the average orientation from mixed frequencies. The orientation range of the eight patches was ±7.7° around the true mean. The experimental session was composed of three blocks of 54 trials, and the spatial frequency condition was randomized across trials (see Figure 1).
Data Analysis
We used relative angular errors against the sample mean orientations. Data from every observer rejected a circular uniform distribution from Watson’s (1962)
Results and Discussion
The mean contrast of every observer in the preliminary task of contrast calibration is plotted in Figure 2 (light gray plot). The mean angular errors of all observers for the three spatial frequency conditions are plotted in Figure 3a, and the mean concentrations of all observers are shown in Figure 3b. The frequency conditions had no significant effect on the mean angular errors,

Michelson contrast of higher spatial frequency (HSF; 3 cycles/degree) patches in Experiments 1 and 2. The horizontal dotted line represents the contrast of lower spatial frequency (LSF; 0.8 cycles/degree) patches, which is constant through all trials in the contrast calibration task. Each colored circle represents the mean contrast of every observer. Diamonds represent the mean of all observers. The bottom of the box represents the first quartile, and the top of the box represents the third quartile. The center horizontal bars represent the median. The ends of the two vertical bars represent the maximum and minimum values, respectively, which do not include outliers. Back curves represent the probability density.

(a) Angular errors of average orientations in Experiment 1. Each colored circle represents every observer's mean angular error. Diamonds represent the mean of all observers. The bottom of the box represents the first quartile, and the top of the box represents the third quartile. The center horizontal bars represent the median. The ends of two vertical bars represent the maximum and minimum values, respectively, which do not include outliers. Back curves represent the probability density. There was no significant effect of the spatial frequencies on the angular errors. (b) Mean concentration parameters
Experiment 2
Experiment 2 extended the orientation range of the patches to examine their robustness. To test the equivalence in performance, another 20 naive observers participated in Experiment 2, where we again utilized the HSF, LSF, and SF-mixed conditions as a within-subject factor, consisting of eight patches randomly varying within a range of ±14° around the true mean orientation (see Figure 1). The number of trials and blocks were identical to those in Experiment 1.
Data Analysis
One observer's data of the SF-mixed condition statistically accepted a circular uniform distribution resulting from Watson’s (1962) U2 test,
Results and Discussion
The mean contrast of every observer in the preliminary task of contrast calibration is plotted in Figure 2 (dark gray plot). Figure 4a shows the mean angular errors of every observer. The repeated-measures ANOVA revealed no significant effect of the spatial frequency conditions on the mean angular errors,

(a) Angular errors of average orientations in Experiment 2. Each symbol representation is identical to those in Figure 3a. There was no significant effect of the spatial frequencies on the angular errors. (b) Mean concentration parameters
General Discussion
This study aimed to clarify the interrelationship between orientation and spatial frequency during ensemble perception. From the two averaging tasks, we examined whether the independent recognition of two visual features within a single item could be generalized to ensemble perception. The mean angular error indexed as directional bias did not show a significant effect of the spatial frequencies. This result meant that observers had accurate representations of average orientations regardless of the current conditions, which was consistent with previous studies (e.g., Dakin, 2001; Dakin & Watt, 1997; Haberman et al., 2015; Parkes et al., 2001; Solomon, 2010). The chief finding was that the concentration parameter indexed as the reliability of responses was affected by the range of spatial frequency of stimuli, and this phenomenon became clear when the orientation variability of the patches increased. It is interpreted that converged (i.e., higher reliability) or dispersed (i.e., lower reliability) responses appear depending on spatial frequency ranges. Especially, the higher concentration in the LSF condition implies a constant estimation performance for the averaging; however, the lower concentration in the other spatial frequency conditions represents the potential that observers probabilistically estimate wrong average orientations. The outcome that orientation could not be decomposed from spatial frequency for the average estimation suggested that calculating the summary representation of orientations was processed at an early stage. As mentioned in the introduction, specific combinations between the two features are jointly coded, and this effect is often revealed as a behavioral phenomenon such as the aftereffect (Blakemore & Campbell, 1969; Blakemore & Nachmias, 1971; Hancock et al., 2010). A calculation of average orientations might have used these elementary representations instead of outputs, in which individual items were separately processed as recognizable objects.
To account for the current results, interactive processing across different channels requires consideration. The primary visual cortex (V1), in which orientations are detected per spatial frequency, utilizes population coding by which a valid orientation is obtained by summing the outputs of different orientation filters (Chen et al., 2006; Hubel & Wiesel, 1968; Livingstone & Hubel, 1984; Paradiso, 1988). Because orientation signals from different visual field locations are largely independent, multiple orientation signals in an ensemble perception should simply be pooled and averaged. Based on this assumption, we need to consider why there is a significant difference in the concentration parameter between the three conditions: a higher concentration of errors in the LSF condition in Experiment 2. The present finding seems incongruent with the single orientation observation that observers generally have better discriminability to an orientation of higher spatial frequencies (Burr & Wijesundra, 1991). This inconsistency may relate to the orientation bandwidth per spatial frequency. Phillips and Wilson’s (1984) psychophysical study showed that lower spatial frequency channels had a broader bandwidth to respond to orientations, whereas higher spatial frequency channels had a narrower bandwidth. These varying characteristics meant that a detector of the former channels probabilistically responded to a distant (i.e., dissimilar) orientation from the optimum, but that of the latter channels responded only to the optimum or extremely adjacent (i.e., similar) orientations (Phillips & Wilson, 1984). Hence, the higher detectability of orientation at higher spatial frequencies is explained by the orientation bandwidth. At the same time, it might influence the strength of correlation between each orientation detector within the same spatial frequency channel. For example, when one detector is tuned to any direction of response, the other detector tuned to a slightly distant direction from the former's optimum may tend to respond as well, and vice versa. This co-occurrence can be easily observed in the lower spatial frequency channels. If higher order detectors capture the co-occurrence of these signals of lower spatial frequencies, orientation integration among different visual locations can be vigorously facilitated.
Considering this assumption, in Experiment 1, where the orientation variability of the set was small, the concentration could have been almost constant regardless of spatial frequency conditions. On the other hand, when the variability was large in Experiment 2, the HSF condition required more resources to integrate orientations because each orientation detector in the higher spatial frequency channels was not sensitive to dissimilar orientations, owing to the narrow response bandwidth in V1. However, in the LSF condition, signals from different orientations might have been easily integrated because of the broad response bandwidth in V1.
The results with the SF-mixed condition in the current study still have some ambiguities, even if we assume an overestimation of the contrast of either frequency patches in the contrast calibration task. Why was the concentration in the SF-mixed condition modulated by orientation variability? Note that in the current study, unlike the orientations with a relatively small range, the test spatial frequencies are quite different (approximately a two-octave difference), implying that correlational processing across two distant spatial frequencies may be less effective. Therefore, higher order detectors suffer from the additional cost of pooling two spatial frequency signals, which may account for the poor concentration in Experiment 2. The lack of concentration impairment in the SF-mixed condition of Experiment 1 may be due to a ceiling effect. Since orientation averaging at each spatial frequency is efficient, the cost of pooling across spatial frequency channels may be reduced. These interpretations regarding the present findings are speculative and further quantitative examinations are necessary. Our research objective of determining whether spatial frequencies as a nontarget feature influenced ensemble perception of orientations as a target feature was attained by the result in Experiment 2. The next step is to explore the factors that affect the average estimation of orientation.
Considering our visual environment, for instance, the object surface repeats similar orientations to form a pattern, such as a texture gradient. In this case, the visual system simultaneously captures different spatial frequencies that smoothly shift on a surface, as well as different orientations. Integrating information from a broad area should allow observers to perceive detailed textures and spatial scales. However, with respect to these fundamental visual components and their channel structures, a question emerges: are signals from remote channels distantly related to each other even when both near (lower spatial frequency in the visual angle) and distant (higher spatial frequency in the visual angle) features constitute a continuum on a surface? Further investigation is needed to test the rule of interaction and the mechanisms that the visual system uses to produce ensemble representation.
Existing studies have reported observers’ accurate estimations of the ensemble mean for a variety of visual features. However, the interrelationship of multiple features within an object has often been beyond the range of consideration. This study focused on the feature interaction between two visual components, which have been considered as separable features in Garner’s (1976) theory, during ensemble perception, and revealed that they could not be fully decomposed. This result suggests that a separable/integral relationship in a specific feature combination during a single object observation could not always be applied to the ensemble observation. Some studies have already revealed that summary statistics on multiple dimensions could be obtained independently, but other studies reported that the multiple summary representations were more impaired than that on a single dimension (e.g., average size and speed by Emmanouil & Treisman, 2008; average size and orientation by Attarha & Moore, 2015; Yörük & Boduroglu, 2020). When and how is a summary statistical representation affected by other components? Whether the feature interaction can be observed may depend on the condition that experimenters set up or the analytical indicator that they focus on.
Perceiving summary statistical representations contributes to saving the cognitive economy and the quick prehension of restless visual scenes; however, the possibility of a change in the accuracy and credibility based on situations should be brought into view. Moreover, if they change depending on situations, we can consider deeper research on issues concerning the factors behind the change and the related generation processes therein.
Conclusion
The study's chief finding was that the reliability of the average orientation estimation was influenced by the spatial frequency ranges. The effect was prominent when the orientation variability of the test stimuli was expanded. In this case, the reliability typically decreased throughout the three spatial frequency conditions; however, it was relatively high if the test stimuli consisted of a homogeneous lower spatial frequency compared with that of the homogeneous higher spatial frequency and heterogeneous spatial frequencies. These results indicated that the two basic visual features in multiple stimuli might have been coded interactively, and these representations were used to calculate the average at lower order stages of visual processing prior to the observer's separate consciousness of properties inside each stimulus.
Footnotes
Author Contributions
Takebayashi and Saiki developed the study concept and contributed to its design. Building the experimental programs, data collection, data analysis, interpretation, and drafting of the manuscript were performed by H. Takebayashi under the supervision of J. Saiki. All authors approved the final version of the manuscript for submission. The research complies with the Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments involving humans. All participants provided written informed consent. All experiments were approved by the Institutional Review Board of Kyoto University. None of the experiments reported in this article have been formally registered. Neither the data nor the materials have been made available on a permanent third-party archive; requests for the data or materials can be sent via email to the lead author at hikari.takebayashi@gmail.com (H. Takebayashi).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Japan Society for the Promotion of Science, (grant number 16H01727, 20J20010).
