Abstract
The visual world is flooded with ambiguity. Generally, people can resolve the ambiguity almost instantaneously, as when they distinguish at a glance whether a maiden in a portrait by Picasso is in profile or facing front. However, perception of the same reality, though relatively stable at the individual level, can vary dramatically from person to person, manifesting idiosyncratic perceptual biases. What drives the heterogeneity of human vision as reflected in the resolution of visual ambiguity? Using the twin method, we demonstrated a significant genetic contribution to individual differences in the visual disambiguation of bistable biological-motion stimuli but not inanimate motion stimuli. These findings challenge the prevailing view that the way the human brain makes sense of visual input is largely shaped by a person’s perceptual history. Rather, the visual perception of biologically salient information can be guided by adaptive mental “priors” that are genetically transmitted.
Visual perception is believed to be qualitatively similar across individuals, unlike psychological traits such as personality and intelligence. Nevertheless, given the ambiguous nature of the visual environment, it is not rare that an identical physical appearance can give rise to divergent visual experiences and evoke radically different percepts for different observers. For example, in the case of 3-D object perception, a pyramid and a triangular prism may share the same 2-D projection when viewed from particular perspectives. Imagine how many 3-D structures can be reconstructed from the same retinal images. In this light, visual perception is not merely a single reflection of sensory input but relies crucially on how the brain is equipped to overcome ambiguities and make sense of visual objects.
Ambiguous visual patterns that are open to two exclusive interpretations, known as bistable stimuli, have been widely used to study a range of visual processes and the neural circuits involved in the construction of conscious awareness (Leopold & Logothetis, 1999; Long & Toppino, 2004; Sterzer, Kleinschmidt, & Rees, 2009). A fascinating aspect of bistable perception is that even when the two interpretations are equally compatible with the stimulus’s physical property, they are not guaranteed to come to the viewer’s mind at equal chance, especially during the initial stage of perception (Stanley, Forte, Cavanagh, & Carter, 2011). For a certain bistable pattern, people may have a general inclination to experience one possible perceptual state over the other (Dobbins & Grossmann, 2010). At the individual level, however, observers may come to different solutions and exhibit distinct yet stable perceptual biases as if they have certain inherent “perceptual traits” (Brascamp, Kanai, Walsh, & van Ee, 2010; Mamassian & Wallace, 2010; Raemaekers, van der Schaaf, van Ee, & van Wezel, 2009). For some observers, the percept is a stochastic distribution of the two interpretations, while others may have varying degrees of perceptual bias toward a specific interpretation. What disposes people to perceive things in such differentiated manners? How do they develop various perceptual biases for different categories of stimuli?
A putative and intensively studied factor that biases bistable perception is visual experience (Brascamp et al., 2008; Harrison & Backus, 2010; Morikawa & McBeath, 1992). It has been reported that the obtained visual norms ranging from the statistics of the visual environment (Dobbins & Grossmann, 2010) to the viewer’s recent perceptual history (Daelli, van Rijsbergen, & Treves, 2010; Maloney, Dal Martello, Sahm, & Spillmann, 2005; Zhang et al., 2012) are associated with the consequences of visual disambiguation. For example, the phenomenon that people generally take the “viewing-from-above” perspective more often than the “viewing-from-below” perspective in Necker cube perception is consistent with people’s experience that objects usually appear on the surface below eye level (Dobbins & Grossmann, 2010). Within a smaller time scale, presentation of an unambiguous primer can significantly bias the perception of and the cortical responses to the succeeding bistable stimulus (Zhang et al., 2012). Apparently, learning-based mechanisms are involved in the formation of both the long-term group-level perceptual bias and the transient individual-level bias in bistable perception. What remains unknown is the source of individual variation in the perceptual bias intrinsic to the visual system. Do genes, as distinct from the effect of environment, contribute to the variance of such perceptual bias? If so, is there a common genetic basis for the biases related to different categories of bistable stimuli?
To investigate these questions, we examined the heritability of perceptual traits related to the perception of bistable animate (i.e., biological) and inanimate motion patterns. Both of these stimuli can elicit perceptual biases that are stable within individuals but vary across individuals. Using these stimuli allowed us to explore not only the role of genes in individual variation in bistable perception in terms of the intrinsic bias, but also whether the formation of such bias is supported by a domain-specific or a domain-general genetic mechanism. Heritability was estimated using the twin method, whereby the contributions of genes and the environment to individual differences in a given trait can be dissected. Specifically, we recruited both monozygotic (MZ) and dizygotic (DZ) twin pairs. While both twins share their environments to the same extent, MZ twins share more genes than DZ twins (100% vs. 50% on average). Therefore, if MZ twins are considerably more similar than DZ twins on a given trait, this suggests that genes play an important role in determining the variance of this trait.
Method
Participants
One hundred and sixty same-gender twin pairs (a total of 320 participants, 184 female, 136 male; age range = 15–27 years, mean age = 18.5 years), consisting of 82 pairs of MZ twins and 78 pairs of DZ twins, were recruited for payment from a twin database (Beijing Twin Study) maintained by the Institute of Psychology, Chinese Academy of Sciences (IPCAS). All twin pairs participated in Experiment 1, a random subgroup of 44 twin pairs (22 MZ and 22 DZ) participated in Experiment 2, and another subgroup of 85 (41 MZ and 44 DZ) participated in Experiment 3. Zygosity was determined by DNA genotyping on nine short-tandem-repeat loci, with near-100% classification accuracy. All the participants were naive to the aim of the study and gave written informed consent. The study was approved by the institutional review board of the IPCAS.
Stimuli and apparatus
The experiments were programed using MATLAB (The MathWorks, Natick, MA) together with extensions for the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). Stimuli were point-light animations that mimicked human walking or sphere rotation. Point-light walkers (Johansson, 1973) were generated by orthographically projecting the head and the main joints of a human walker onto a 2-D plane (depth orders = 0). The stimulus appeared as 15 white dots moving against a gray background with a gait cycle of 1 s (Troje, 2002; see Fig. 1). Without any depth cues, the 3-D orientation of the walker (facing toward or away from the viewer) is physically ambiguous. A point-light sphere was also created by randomly distributing 100 white dots on the surface of a virtual sphere, which rotated around the horizontal axis at 90°/second, with the apex being perceived as moving toward or away from the observer (i.e., the frontal surface moving downward or upward).

A schematic representation of a single frame from a biological-motion sequence used in Experiment 1 (middle) and the two feasible interpretations of the stimulus, either moving toward the observer (left) or away (right).
Procedure
Each trial started with a moving point-light walker (6.34° × 1.79°; Experiment 1) or a rotating sphere (diameter = 2.69°; Experiment 2) displayed for 1 s. The stimuli were presented at the center of the screen, and observers sat at a viewing distance of 80 cm. Observers were required to report the perceived facing direction of the walker or rotation direction of the sphere in terms of depth (i.e., moving toward or away) after the stimulus disappeared by pressing one of two keys. To avoid potential interference from the previous trials, we randomized the initial frame of the point-light motion sequence in each trial and also slightly offset the sequence in a random direction from the center of the screen. In addition, the facing direction of the walker (in Experiment 1) and the rotation direction of the sphere (in Experiment 2) were rotated −5°, −2.5°, 0°, 2.5°, or 5° with respect to the normal orientation of the screen, to obtain five different exemplars of the stimuli. Each exemplar was repeated eight times, which generated 40 trials in total for each experiment. These trials were presented in random order. The intertrial interval was 1 s.
Experiment 3 consisted of two blocks. In each block, a point-light biological-motion sequence (3.17° × 0.90°) was displayed for 60 s, with the facing direction deviating by −30° or 30° from the normal orientation of the screen. Observers were not told about the bistable nature of the stimuli and were required to press one of two keys when they initially perceived the facing direction of the walker and each time when it “reversed” in depth.
Genetic-modeling analysis
Using intraclass correlation analysis, we measured the resemblance between the members within MZ and DZ twin pairs, respectively. Heritability of the behavioral traits was then estimated using the ADE genetic model if the correlation between the members within MZ twin pairs was more than twice the correlation within DZ twin pairs; otherwise the ACE model was adopted (Neale & Maes, 2004). The ADE model assumes that the variance of a trait arises from an additive genetic factor (A), a nonadditive genetic factor (D), and a factor that combines both the unique environment and measurement error (E). The ACE model assumes that trait variance arises from a common environmental factor (C), as well as genetic (A) and unique environment (E) factors. After fitting the full ADE/ACE model to the data, we also separately tested the AE, CE, and E submodels. Chi-square statistics were used to examine the goodness of fit for each model and to compare the submodels with the saturated models to assess the contribution of the dropped factors. Subsequently, we estimated the heritability of a trait using the best model selected based on both the goodness of fit and parameter parsimony according to the Akaike information criterion (AIC).
For all data submitted to the intraclass correlation and the genetic-modeling analysis, we controlled for age, gender, and zygosity through multivariable regression. The modeling analysis was performed using the statistical package Mx (http://www.vcu.edu/mx/).
Results
Experiment 1: genes contribute to the perception of animate bistable motion
In Experiment 1, we examined the genetic contribution to the perceptual bias of a type of animate bistable stimuli (i.e., biological motion). Facing-direction bias was measured by taking the proportion of the “facing-toward-viewer” responses among the total trials minus .5, which resulted in an index ranging from −.5 to .5 (0 indicates no bias, .5 indicates an extreme bias toward the viewer, and −.5 indicates an extreme bias away from the viewer). On average, observers showed a facing-toward-viewer bias (mean of the facing-direction-bias index = .26) with a considerably large individual variation (SD = .26), which is consistent with previous reports (Vanrie, Dekeyser, & Verfaillie, 2004). Moreover, the individual-level facing-direction bias was stable within test, as evidenced by the highly significant correlation between the first and the second half of the trials (r = .81, p < .001).
Crucially, the intraclass correlation analysis revealed that the similarity of members within MZ twin pairs—rMZ = .50, 95% confidence interval (CI) = [.32, .64], p < .001—was more than double the similarity within DZ twin pairs (r DZ = .22, 95% CI = [−.01, .42], p = .03; rMZ vs. rDZ: p = .02; Fig. 2a); this indicates a substantial genetic contribution to the variance of the phenotype (Plomin, DeFries, McClearn, & McGuffin, 2008). To quantify the respective contributions from genes and the environment, we then submitted the data to the ADE genetic model, which decomposes the variance of a measure into the variance of A, D, and E effects using maximum-likelihood estimation (see the Method section and Table 1 for more details of genetic modeling). Heritability of the facing-direction bias (i.e., the percentage of the overall variance attributable to the genetic component) was 53.8% (95% CI = [36.5%, 67.1%]), goodness of fit of the AE model: χ2(4) = 3.37, p = .50, AIC = −4.63.

Mean intraclass correlation coefficients (ICCs) for the members of monozygotic (MZ) and dizygotic (DZ) twin pairs (vertical bars) and the genetic and environmental contributions to the variance in behavioral traits (horizontal bars). Results for the facing-direction bias for bistable biological-motion perception in Experiment 1 are presented in (a). Two subsamples of the observers from Experiment 1 also participated in Experiments 2 and 3, respectively. For Experiment 2 (b), results are shown for the motion-direction bias for nonbiological-motion perception and the facing-direction bias for biological-motion perception; for Experiment 3 (c), results are shown for the duration of the initial perceptual state for bistable biological-motion perception.
Goodness-of-Fit Statistics for the Full and the Best-Fitting Models for the Phenotypes of Facing-Direction Bias (Experiment 1), Motion-Direction Bias (Experiment 2), and the Duration of the Initial Perceptual State for Biological-Motion Perception (Experiment 3)
Note: See the Method section for explanations of the models. The chi-square difference between the full and best-fitting model did not reach significance for any of the three variables—facing-direction bias: Δχ2(1) = 0.72, p = .40; motion-direction bias: Δχ2(2) = 0, p = 1.0; onset-state duration: Δχ2(1) = 0.09, p = .76. AIC = Akaike information criterion.
To further explore whether genes were responsible for the strength of this bias, we derived the absolute-bias-strength index by combining the directions of the facing bias and normalized the extent of the bias to a range from 0 (no bias) to 1 (extreme bias). This index again showed a large individual variance (M = .64, SD = .34) and a higher similarity between the members of MZ than DZ twin pairs (rMZ = .34, 95% CI = [.14, .52], p = .001; rDZ = .01, 95% CI = [−.22, .23], p = .48; rMZ vs. rDZ: p = .01), with 30.8% of the variance accounted for by genetic factors (95% CI = [10.1%, 49.0%]), goodness of fit of the AE model: χ2(4) = 5.49, p = .24, AIC = −2.52.
Experiment 2: no genetic influence on the perception of inanimate bistable motion
The heritability of the facing-direction bias provides novel evidence that the perceptual resolution of ambiguous biological motion, a type of bistable stimuli, has a genetic basis. As we mentioned before, reconstructing the 3-D world from 2-D images is fundamental to human vision. Does the finding from Experiment 1 reflect the inherent nature of a general mechanism underlying bistable motion perception or a more specific mechanism engaged in the disambiguation of bistable biological motion? To test this issue, we employed point-light spheres that could be perceived as either rotating toward or away from the viewer due to the kinetic depth effect. If genes determine the individual differences in domain-general processes of 3-D reconstruction from motion, it would be expected that the perception of depth-ambiguous spheres is also under genetic influence. Otherwise, genes should not account for the individual variation in generic inanimate bistable motion perception.
The index of the motion-direction bias was measured in accordance with the facing-direction bias as the proportion of “rotating-toward-viewer” responses minus .5 (0 = no bias; .5 = extreme bias toward the viewer; −.5 = extreme bias away from the viewer). Similar to findings for biological-motion perception, results for inanimate-motion perception showed a toward-viewer bias in general (M = .25) with a large individual variance (SD = .39), and the individual-level bias was also highly correlated between the first and the second half of the trials (r = .98, p < .001). However, in contrast to the facing-direction bias, the intraclass correlation for the motion-direction bias was not evident either within MZ pairs (rMZ = −.04, 95% CI = [−.44, .38], p = .56) or within DZ pairs (rDZ = .12, 95% CI = [−.30, .51], p = .28; Fig. 2b), which suggests that the motion bias was not heritable (heritability = 0%, 95% CI = [0%, 0%]), goodness of fit of the E model: χ2(5) = 3.82, p = .58, AIC = −6.18.
Notably, the absence of genetic effects was not due to the smaller sample size of participants. Looking at the facing-direction bias in biological-motion perception of these same twin pairs yielded results similar to those obtained from the whole sample in Experiment 1 (Fig. 2b). There were significant intraclass correlations both within MZ twin pairs (rMZ = .56, 95% CI = [.19, .79], p = .003) and within DZ twin pairs (rDZ = .31, 95% CI = [−.11, .64], p = .07; rMZ vs. rDZ: p = .17), and genes could explain 54.4% (95% CI = [21.6%, 75.2%]) of the observed variance, estimated by the best-fitting genetic model, goodness of fit of the AE model: χ2(4) = 1.89, p = .87, AIC = −6.11. Hence, the results so far clearly converged to suggest that genes influence the specific processing related to biological-motion perception instead of the general structure-from-motion process in terms of perceptual bias.
Experiment 3: stability of the onset state of ambiguous biological-motion perception is heritable
After a brief viewing of a bistable stimulus, people usually experience a definite percept representing the first conscious representation of the ambiguous object (i.e., the onset or initial state), which will last for several seconds and thereafter lapse into a spontaneous alternation between the two possible interpretations (i.e., the sustained-rivalry stage). The disambiguation process during the initial stage of bistable perception can be characterized by two features. One is the predominance of a conscious state (e.g., the facing-direction bias in biological-motion perception); the other is the stability of the initial state (e.g., the first percept usually lasts longer than the subsequent ones; Pressnitzer & Hupe, 2006). The second feature may reflect the continuation of the disambiguation process, during which the brain actively figures out the alternative interpretation of the bistable stimuli (Long & Toppino, 2004). The first two experiments demonstrated a genetic effect on bistable perception specifically for biological motion with respect to state dominance. In Experiment 3, we examined whether the stability of the onset perceptual state, a different aspect of visual disambiguation, was also influenced by genes in ambiguous biological-motion perception.
The stability of the onset state was measured by the duration from the beginning of the initial percept (first key press) to the time point it switched (second key press) or to the end of the block if no switch occurred. The average switch time was 40.5 s, with a standard deviation of 17.3 s. Similar to the results for the facing-direction bias, the correlation within MZ twin pairs (rMZ = .31, 95% CI = [.01, .56], p = .02) was more than twice the correlation within DZ twin pairs (rDZ = .07, 95% CI = [−.22, .36], p = .31; rMZ vs. rDZ: p = .14) for the duration of the initial perception of bistable biological motion (Fig. 2c), which suggests that genes contribute to the variance of this trait. The estimated heritability was 26.1% (95% CI = [0.4%, 48.4%]), goodness of fit of the AE model: χ2(4) = 1.35, p = .85, AIC = −6.65. These results demonstrate that not only the predominance of perceptual state, but also the broader process of visual disambiguation during the initial stage of bistable biological-motion perception is under genetic influence.
Discussion
The visual perception of ambiguous stimuli is usually characterized by idiosyncratic perceptual biases. Although previous studies have mostly examined and explained perceptual biases in the context of visual experience (Daelli et al., 2010; Dobbins & Grossmann, 2010; Harrison & Backus, 2010; Maloney et al., 2005; Morikawa & McBeath, 1992), the current study underscores the contribution of genes to individual differences in the perceptual bias in bistable perception. Specifically, our results show that genes influence the initial perceptual state as well as its duration in the perception of depth-ambiguous biological motion. Furthermore, the heritability of the perceptual bias is specific to biological motion and is not attributable to the general process of structure-from-motion perception. Together, these results provide clear evidence for the influence of genes on the construction of visual perception, particularly on how the human brain resolves the ambiguity in biologically salient information.
Our results are in line with the proposition that the human brain is inherently wired to efficiently detect and identify information of great biological significance (Mahon, Anzellotti, Schwarzbach, Zampini, & Caramazza, 2009; New, Cosmides, & Tooby, 2007). According to the domain-specific hypothesis (Caramazza & Shelton, 1998; Downing, Chan, Peelen, Dodds, & Kanwisher, 2006), the cognitive and neural architecture underlying the processing of various categories of information may be different and partially specialized (Barrett & Kurzban, 2006; Kanwisher, 2010). Recent studies have demonstrated that genes may play a substantial role in some of the highly specialized processes, such as the perception of faces (Wilmer et al., 2010; Zhu et al., 2010). Similar to faces, biological motion enjoys privileged visual processing that engages a specialized neural network (Grossman & Blake, 2002; Vaina, Solomon, Chowdhury, Sinha, & Belliveau, 2001); however, it is still unknown whether the perception of biological motion is heritable. The findings that newborns exhibit a spontaneous preference for biological-motion patterns over other inanimate motions argue for the existence of an innate device tuned to biological-motion signals (Bardi, Regolin, & Simion, 2011; Simion, Regolin, & Bulf, 2008). Beyond these findings, the current study reveals the heritability of a perceptual trait directly related to biological-motion perception (i.e., the perceptual bias elicited by the depth-ambiguous biological-motion patterns).
Previous research has consistently found a remarkable group-level bias toward perceiving the depth-ambiguous biological motions as facing toward the viewer, and this finding indicates that such bias may, to some extent, result from the social and biological significance of approaching versus receding living organisms (Brooks et al., 2008; Schouten, Troje, Brooks, van der Zwan, & Verfaillie, 2010; Vanrie et al., 2004). Consistent with this hypothesis, other results show that people are more sensitive to point-light walkers facing the viewer than to their depth-opposite counterparts (Doi & Shinohara, 2012; Wang & Jiang, 2014). Although the exact neural mechanism that mediates the facing-direction bias has not been elucidated, recent evidence suggests that the disambiguation of depth-ambiguous biological motion is supported by specialized neural substrates (Jackson & Blake, 2010). The heritability of the facing-direction bias, combined with the aforementioned evidence, suggests that the perception of bistable biological motion is guided by a domain-specific genetic mechanism, which may be responsible for resolving the ambiguity in the actions of other biological entities.
The observed genetic effect on the perception of ambiguous biological motion also extends the understanding of the role of genes in bistable perception. Despite the tremendous progress in research on bistable perception (Leopold & Logothetis, 1999; Long & Toppino, 2004; Sterzer et al., 2009), the genetic bases for individual variation in the disambiguation of bistable figures remain largely unexplored. The only available behavioral genetic studies reported substantial heritability of the switch rate in binocular rivalry (Miller et al., 2010), as well as in other forms of bistable perception during the sustained-rivalry stage (Shannon, Patrick, Jiang, Bernat, & He, 2011). The current study, on the other hand, demonstrated the heritability of the idiosyncratic perceptual bias and highlights a critical genetic influence on the observer’s innate inclination to resolve the ambiguity in one way or another, especially during the onset stage of bistable perception when the conscious representation of the ambiguous stimulus initially emerges.
Our finding that the onset bias in inanimate motion perception is not heritable is in general compatible with the observation from a sustained-rivalry study that the predominance of grating stimuli is not heritable (Miller et al., 2010), and the genetic effects that we obtained with biological motion lend support to the hypothesis that a domain-specific rule regulates the perceptual bias in bistable perception. Admittedly, however, there are apparent differences between the onset and the sustained-rivalry stages of bistable perception, as prolonged viewing will lead to the fading of the onset perceptual bias or to a sustained predominance bias that lacks evident correlation with the onset bias (see Stanley et al., 2011, for a review). Hence, the observations of the onset bias might not directly relate to those obtained from the sustained-rivalry studies (Miller et al., 2010). From an interpretive perspective, the initial stage of bistable perception, from the emergence to the fade-out of the first percept, is highly associated with the disambiguation process during which a proper interpretation of the biological signals may contribute to the successful detection of potential threat or interpersonal interaction, and to the effective inhibition of the unfavorable percept. Such evolutionary pressure may account for the heritability of the facing-direction bias and the duration of the onset percept that relate to the initial disambiguation stage of bistable biological-motion perception. By contrast, the sustained-rivalry stage involves additional mechanisms, including adaptation (Long & Toppino, 2004) and attentional switching (Miller, Ngo, & van Swinderen, 2011), and may have less to do with visual disambiguation. To better characterize the role of genes in bistable perception, future studies should test whether the domain-specific genetic mechanism can be extended to the perceptual bias in the sustained stage of bistable perception, or whether the onset and sustained-rivalry stages have distinct genetic bases.
In sum, although visual experience is demonstrably essential to the construction of perception for ambiguous stimuli, we provide the first evidence for the effect of genes on individual variation in visual-ambiguity resolution. The heritability of the perceptual bias specifically elicited by biological-motion stimuli suggests that mental “priors” about biologically meaningful entities may have been built innately and guide conscious perception at its initial forming stage. This is an important step toward understanding why people are born to see certain objects but not others with inherent bias, not only from the cognitive but also from the genetic perspective.
Footnotes
Acknowledgements
We thank Nikolaus Troje for providing the biological-motion stimuli and two anonymous reviewers for their helpful comments.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This research was supported by the National Basic Research Program of China (Grant No. 2011CB711000), the National Key Technologies R&D Program of China (Grant No. 2012BAI36B00), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDB02010003), and the National Natural Science Foundation of China (Grant Nos. 31100733 and 31070903).
