Abstract
When we experience our environment, we do so by combining sensory inputs with expectations derived from our prior knowledge, which can lead to surprising perceptual effects such as small objects feeling heavier than equally weighted large objects (the size–weight illusion (SWI)). Interestingly, there is evidence that the way in which the volume of an object is experienced can affect the strength of the illusion, with a SWI induced by exclusively haptic volume cues feeling stronger than a SWI induced with only visual volume cues. Furthermore, visual cues appear to add nothing over and above haptic size cues in terms of the strength of the induced weight illusion–findings which are difficult to reconcile with work using cue-conflict paradigms where visual cues usually dominate haptic cues. Here, virtual reality was used to place these senses in conflict with one another. Participants (N = 22) judged the heaviness of identically weighted cylinders across three conditions: (1) objects appeared different sizes but were physically the same size, (2) objects were physically different sizes but appeared to be the same size, or (3) objects which looked and felt different sizes from one another. Consistent with prior work, haptic size cues induced a larger SWI than that induced by visual size differences. In contrast to prior work, however, congruent vision and haptic size cues yielded a larger still SWI. These findings not only add to our understanding of how different modalities combine to influence our hedonic perception but also showcase how virtual reality can develop novel cue-conflict paradigms.
The size–weight illusion (SWI) describes the effect whereby small objects feel heavier than large objects of the same mass. First described by Charpentier (1891), it has been a consistent topic of study for cognitive scientists for the past 100 years. The psychology and physiology underpinning this powerful perceptual effect, however, remain something of a mystery. It is present across the lifespan, having been demonstrated in both young (Pick & Pick, 1967) and old individuals (Buckingham, Reid, & Potter, 2018). It seems largely unaffected by different types of brain injury (Buckingham, Bieńkiewicz, Rohrbach, & Hermsdörfer, 2015; Li, Randerath, Goldenberg, & Hermsdörfer, 2011) and even appears to be experienced by some animals (Jablonski et al., 2015). In addition, it is cognitively impenetrable and does not stem from variations in lifting behaviour (Flanagan & Beltzner, 2000; Grandy & Westwood, 2006).
As with all compelling perceptual phenomenon, many scientists have proposed explanations at a range of mechanistic levels (for review, see Buckingham, 2014). Cases have been made to explain the SWI in terms lifting behaviour (Davis & Roberts, 1976), low-level integration of information (Anderson, 1970) and even suggestions that the effect may be a consequence of an evolutionary mechanism to detect the ‘throwability’ of objects (Zhu & Bingham, 2011). One popular, more cognitive explanation for this phenomenon is that the SWI reflects the contrastive way in which a lifter’s expectations about an object’s likely weight or density are integrated with sensory input for the subjective experience of an object’s weight (Flanagan, Bittner, & Johansson, 2008; Peters, Ma, & Shams, 2016). In this context, the illusion-causing expectations derive from the lifter’s experience of how size and weight covary in commonly lifted objects. This explanation is consistent with experiments showing that the SWI can be reduced, and eventually reversed, with experience in an environment with a negative size–weight correlation (Flanagan et al., 2008) and also that the felt heaviness of a single unchanging object can be changed by priming individuals to expect to be lifting a larger or smaller object than they actually do (Buckingham & Goodale, 2010; Buckingham, Ranger, & Goodale, 2011a). The exact mechanism by which expectations modulate the sensory input to induce the illusory misperception of weight is, however, still unclear.
Although most SWI experiments induce the illusion with visual volume cues alone, by having participants lift the objects with a handle, it has been well established that the SWI can be induced with size cues delivered through a range of modalities in a variety of ways. For example, a brief visual presentation allowing a participant to see or feel the size of an object prior to lift-off can readily induce a robust SWI (Buckingham & Goodale, 2010; Buckingham, Ranger, & Goodale, 2011b). Indeed, even ‘substituted’ senses can induce the SWI, as has recently been shown in the case of human echolocation, in which, blind individuals’ perceptions of the object weight are modulated by their ability to establish the object size by decoding the echoes from sounds made towards objects in their environment (Buckingham, Milne, Byrne, & Goodale, 2015). It is worth noting that, in all of these cases, the SWI tends to vary in strength, suggesting that the reliability or robustness of the illusion-inducing size cue modulates the strength of the SWI.
In the context of evaluating object size, and a range of other perceptual tasks, vision is typically considered as the most reliable sensory modality – an assertion backed up by a large body of empirical work stretching back more than 100 years (Bowditch & Southard, 1882; Ernst & Banks, 2002; Rock & Victor, 1964). The dominance of vision over the other senses, however, is difficult to reconcile with the findings of Ellis and Lederman’s (1993) thorough examination of the magnitude of SWI across a range of different configurations. The authors showed that the SWI experienced by individuals lifting blindfolded (i.e., experiencing only haptic volume cues) was substantially larger than that induced when lifting the objects with a string (i.e., experiencing only visual volume cues). Furthermore, they found that combining visual and haptic volume cues yielded an equivalent illusion to haptic volume cues alone. It is this puzzling set of findings which motivates this work.
One way to reconcile the findings of Ellis and Lederman (1993) with the body of work highlighting visual dominance over haptics might be to examine the psychophysical techniques employed by the researchers. Ellis and Lederman used a method of eliminating one sense to evaluate the strength of the remaining sense as an inducer. By contrast, more contemporary work highlighting visual dominance over haptics (for review, see Ernst, 2006) have used clever manipulations with mirrors and graphical displays to place the senses in conflict with one another. While these techniques are suitable for static perceptual tasks, they are often difficult to scale up to dynamic, unconstrained, perceptual tasks, in which participants are able to freely experience the dynamics of objects they are judging the properties of – a factor which is clearly key for experiencing the full-strength SWI (Buckingham, 2014; Ellis and Lederman, 1993). Recently, however, immersive virtual reality (VR) has provided new avenues to bridge the gap between conventional psychophysical paradigms and more real-world tasks (Wilson & Soranzo, 2015). Indeed, the SWI paradigm has been used as a test bed for the use of VR in human factors settings, with scientists demonstrating that the strength of the SWI increases with the degree of immersion in a virtual environment (Heineken & Schulte, 2007). Here, I describe a novel ‘mixed reality’ setup where the positions of objects are tracked, and their kinematic information is used to deliver a real-time visualisation of their movements. Critically, this setup allows for full control over the visual properties of the objects while they are being interacted with, such that a visuo-haptic conflict can be induced in a reasonably naturalistic task. This setup was used to conduct a follow-up of the surprising findings of Ellis and Lederman (1993), by examining the relative strength of the SWI induced by visual size cues, haptic size cues and combined visual and haptic size cues.
Materials and methods
Participants
In total, 24 undergraduate students from the University of Exeter were tested in this study (three females, mean age = 21.0 years, SD = 1.0). Two participants were removed from the final analysis due to protocol errors during testing, leaving a sample of 22. Participants gave written informed consent prior to testing, and all procedures were approved by the local research ethics board.
Materials
Participants lifted and judged the weight of two different sets of black polylactic acid (PLA) cylinders printed on an Ultimaker 2 (Ultimaker B.V., Geldermalsenhe, the Netherlands) 3D printer. The first trio of cylinders were all 7.5-cm high and had three different diameters (small = 5 cm, medium = 7.5 cm and large = 10 cm). The second trio of cylinders all had the same dimensions as the medium cylinder – 7.5 cm in diameter and 7.5-cm tall. All the cylinders were filled with packing foam and lead shot to weigh 486 g, with the centre of mass balanced around the centre of the object. The cylinders had plastic mounts integrated in the middle of their top surface to allow the easy attachment of a 10-cm-tall rod with a series of retro-reflective cylindrical markers attached (Figure 1). Participants wore an Oculus Rift CV1 (Oculus VR, Irvine, CA) head-mounted display (HMD), through which they were able to see the cylinders in a bespoke immersive VR game environment designed to look like a simulacrum of the testing laboratory, programmed in the Unity game engine (Unity Technologies, San Francisco, CA) (v5.6.0f), which allowed the user to see a simplified version of the testing environment, with movements of the various dynamic elements in the scene conveyed at a very low latency. To achieve this effect, an eight-camera Optitrack Flex 13 (NaturalPoint Inc., Corvallis, OR) motion capture system tracked the position of a series of unique 5/6-marker-configuration rigid bodies attached to the Oculus headset, straps fastened around the wrists of the participants, the table surface and each of the three SWI-inducing cylinders. The positions of the rigid bodies were tracked at 120 Hz on a Toshiba Portege i7 laptop using Motive software (v1.10; NaturalPoint Inc.) and streamed through a local area network (LAN) cable to a desktop PC with an AMD Radeon™ 8-GB RX480 (Advanced Micro Devices, Inc., Santa Clara, CA) graphics card running Microsoft Windows 10, which rendered the virtual environment inside the Oculus headset through a standalone Microsoft Windows build of each of the game environments (compiled the Unity game engine [Unity Technologies, San Francisco, CA] v5.6.0f) using the Optitrack Unity Plugin (v1.0.1).

The mixed-reality setup used in this study where participants lifted and judged the weights of physical objects in virtual reality. The positions of the rigid body marker configurations attached to the Oculus HMD, the wrist straps and the test cylinders were tracked with Optitrack cameras and Motive software.
The positions of the HMD, the table and wrist straps were tracked and rendered in equivalent relative positions to one another. Graphically, the HMD’s position was defined as the scene viewing camera, allowing participants to freely move their head and see visual changes concomitant to their head movement. The table was rendered as a dark grey/brown rectangle of roughly the same proportions as the real table surface, and the wrist positions were rendered as small orange spheres. The three SWI-inducing cylinders were rendered as white cylinders with shading and cast shadows from the default light source position. These cylinders were displayed 10 cm below the position of the rigid bodies (which were physically 10 cm above the objects’ top surfaces to ensure participants wouldn’t bump them when grasping the objects), so as for virtual cylinders’ bases to appear on the surface of the virtual table when the physical cylinders were rested on the physical table. To ensure the virtual cylinders approximately matched their physical counterparts, a virtual cylinder was manually adjusted in size by the experimenter until it was felt to match the size of the large physical cylinder. The medium and small cylinders were then created by copying this large cylinder within Unity and applying a scaling factor of 0.75 and 0.5, respectively (as was done to create the physical stimuli). The Unity (i.e., development) versions of the environments, as well as the Motive rigid body configuration files, can be found online (https://osf.io/2x3ju/).
Procedure
Participants were seated in front of the table in a height-adjustable chair, and the experimenter attached the wrist straps to which the wrist rigid bodies were attached. The participants then placed the Oculus HMD on their head and were instructed to tighten the straps such that it was comfortable. Through the HMD, participants saw small orange spheres representing the position of their wrists and the three test cylinders atop the virtual table. The experimenter then asked the participant to confirm that the HMD was fitted appropriately and the view of the scene was in focus. On each trial, all three objects were visible to the participant, but one of the objects was placed centrally in front of the participant (a comfortable distance from their trunk), with the other two out of reach. The participant was then asked to reach out with their preferred hand to the object which was closest to them and to pick it up and lift it a short distance off the table. No instructions about the mode of lifting were given other than to ensure that during the reach towards the object, their hand was kept close to the table surface, so as not to bump the rigid bodies attached to the physical objects’ top surface (NB the virtual object was presented only as a cylinder). After each lift, the participant was asked to give a verbal rating of how heavy the object felt using a numerical scale with no upper or lower limits, where larger numbers would represent heavier-feeling weights (i.e., an arbitrary magnitude estimation; Zwislocki & Goodman, 1980).
All the participants lifted the objects in three different blocked conditions (Figure 2), presented in a counterbalanced order. In Condition 1, participants lifted the small, medium and large cylinders and observed lifts of congruently sized virtual objects (i.e., the small physical object appeared small in VR, whereas the large physical object appeared large in VR), allowing participants to both see and feel the diameter of the cylinder concurrently. In Condition 2, participants also lifted the small, medium and large cylinders, but this time, each of the rigid bodies were associated with the same image such that all the cylinders presented through the HMD were the same, medium, size. Thus, in this condition, size differences were only experienced with the hand. In Condition three, the rigid bodies were attached to three medium-sized cylinders, which were associated with images of the large, medium and small cylinders. In this condition, therefore, size differences were only seen, but not felt. Participants were not given any information as to the nature of the conditions nor the possibility of a visuo-haptic mismatch.

Images of the physical (felt) and virtual (seen) stimuli used each of the conditions in this study. Although no effort was made in the figure to capture the physical stimuli from the viewer’s perspective, in Condition 1, the objects looked and felt approximately the same size. In the other conditions, this visual-haptic congruence was broken.
In each condition, participants lifted each of the three objects 10 times apiece (total of 30 lifts) in one of the three pseudo-randomised orders, taking approximately 20 min. Participants were given a short break in between conditions during which they removed the headset. Prior to undertaking the experimental trials, participants had the task explained to them, and then, they were given several practice lifts of the medium-sized cylinder outside of the virtual environment. The perceptual ratings in each condition were transformed into z-scores within subject to account for individual differences in the range of scale used to rate the felt heaviness of the cylinders. These z-scores were then examined in a 3 (object size) × 3 (condition) repeated-measures analysis of variance (ANOVA). Violations of sphericity were addressed with the Greenhouse–Geisser correction. Significant interactions were followed up with Bonferroni-corrected t-tests on the relevant difference scores. All analyses were performed in jamovi, version 0.9.1.11. Data for individual participants, as well as the group-level analysis, can be found online (https://osf.io/236zg/).
Results
Although we observed no main effect of condition, F(1.56, 32.83) = 1.11, p = 0.33, η2 = 0.05, we did observe a significant main effect of object size, F(1.47, 30.84) = 204.25, p < 0.001, η2 = 0.907, as well as a condition by size interaction, F(2.76, 57.94) = 28.99, p < 0.001, η2 = 0.58. We examined this interaction (illustrated in Figure 3a) by calculating a metric of the magnitude of the SWI in each condition by subtracting the rating given to the large object from the rating given to the small object (Figure 3b). We then compared these difference scores in each condition to one another with paired-sample t-tests. These tests indicated that the SWI experienced by participants in Condition 1 was significantly larger than that experienced in Condition 2, t(21) = 4.25, p < 0.001, Cohen’s d = 0.905, or Condition 3, t(21) = 8.37, p < 0.001, Cohen’s d = 1.785. Furthermore, the SWI in Condition 2 was significantly larger than the SWI in Condition 3, t(21) = 3.87, p < 0.001, Cohen’s d = 0.825. The outcomes of the statistical analyses remained unchanged (i.e., all p values remained <0.001) with the removal of the clear outlier which can be seen in Condition 2 of Figure 3b.

(a) The average normalised heaviness ratings for the objects given in each condition. (b) The magnitude of the size–weight illusion (average normalised rating given to small object – average normalised rating given to large object) experienced in each condition. In Figure 3a, errors bars show 95% confidence intervals; in Figure 3b, positive values indicate the presence of a conventional size–weight illusion (i.e., small objects felt heavier than large objects) and circles indicate individual participants’ data points.
Discussion
This study describes an experiment using immersive VR to place the senses in conflict with one another to evaluate the strengths of visual and haptic cues to size as inducers for the SWI. Participants lifted and judged the weights of three cylinders which either differed in their visual and felt size (Condition 1), differed in only their physical (felt) size with no visual indication of the size differences (Condition 2) or differed only in their visual size with no haptic indication of the size differences (Condition 3).
In all conditions, almost all of our participants experienced a robust SWI (Figure 2). Consistent with prior work (Ellis & Lederman, 1993), it was noted that the SWI induced with haptic volume cues was substantially larger than the SWI induced with visual volume cues. Indeed, in this study, the haptic SWI was found to be over twice as large as the visual SWI (1.53 vs 0.76 arbitrary units). Thus, in the context of the SWI, it would seem that our sense of touch provides a more reliable signal to object size. It would be easy to imagine that this finding may simply be a consequence of the relatively low-fidelity visual environment in which participants were making their judgement. It is, however, worth noting that the same visual environment was used across all conditions, and the stripped-back graphical background might well have served to emphasise the differences in size between the objects.
The apparent dominance of touch information over visual information in this work is surprising in the context of much of the other work on the psychophysics of visual/touch interactions, where it has been shown that vision is generally considered the more reliable sensory input and is thus given more weight in cue-conflict tasks (Ernst, 2006). It is worth noting that the SWI is a phenomenon which scientists have had trouble incorporating into traditional perceptual frameworks, to such an extent that it has been referred to as ‘anti-Bayesian’ (Brayanov & Smith, 2010). While more recent attempts to incorporate the SWI into Bayesian frameworks have emphasised the role of size as a cue to density (Peters et al., 2016; Wolf, Tiest, & Drewing, 2018), capitalising on the observation that small hand-held objects are generally denser than large hand-held objects (Peters, Balzer, & Shams, 2015), this effect appears to be reasonably high level (i.e., modulated by some form of memory) and thus unclear why the inducing modality would affect the strength of the subsequent weight illusion. It is worth noting that our paradigm did not explicitly evaluate participants’ judgements of object size, which might well vary as a function of modality in VR (Wuillemin, Doorn, van Richardson, & Symmons, 2005). There is, however, no study to our knowledge suggesting that modality in the context of physical or virtual environments might influence relative size judgements, which seems key to affecting the magnitude of the illusory weight differences in the SWI. Furthermore, this dominance of haptic over visual cues to size in a dynamic task is particularly surprising in light of various studies showing the suppression of external tactile stimulation which occurs when the limbs are in motion (Colino, Buckingham, Cheng, van Donkelaar, & Binsted, 2014; Shergill, Bays, Frith, & Wolpert, 2003) – under these circumstances, it is counter-intuitive for the tactile (specifically the static size of the grip aperture) information to be weighted so strongly. Clearly, the functional role of sensory suppression, in the context of perceiving object properties, during object manipulation warrants further study.
In contrast to the findings of Ellis and Lederman (1993), it was found that the SWI induced with concurrent visual and haptic cues to size (i.e., a ‘natural’ lift) was substantially larger than that induced by visual cues alone (2.23 vs 1.53 arbitrary units). Indeed, the magnitude of this multimodal SWI (Condition 1) was numerically identical to the sum of the haptic (Condition 2) and visual (Condition 3) variants of the SWI combined. These findings suggest that, in the context of virtual visual cues and real-world haptic cues that affect weight perception, these sensory inputs are combined in a purely additive fashion – a conclusion which is reconcilable with cognitive ‘expectation’ (Flanagan et al., 2008; Ross, 1969) or information integration (Anderson, 1970; Masin & Crestoni, 1988) theories of the SWI. Future work using more high-level object properties to induce illusions, such as material (Buckingham, Cant, & Goodale, 2009; Ellis & Lederman, 1999) or object identity (Buckingham & MacDonald, 2016), using the methods outlined in this article, will go some way to furthering our understanding of this phenomenon. In addition, it is unclear the degree to which the effects reported in this article are due to the fact that only the position, and not the aperture of the hands during lifts, were rendered. This limitation was largely due to the technical challenges associated with visualising the complexity of individual digits’ movements with optical motion capture – a clear next step for future research which would allow us not only to evaluate the effect of grasp kinematic in VR but to also examine the effects of manipulations of factors such as hand size and embodiment.
The final point to discuss is the viability of using immersive VR to answer fundamental questions of perception. The enthusiasm for using VR to answer questions about human perception must, of course, be tempered until we have a better understanding of whether the human sensorimotor system operates in a similar fashion in VR to how it operates in the real world, which may readily account for the differences between the outcomes of this work and those of the real-world paradigm of Ellis and Lederman (1993). More broadly, however, perceptual science has struggled with the tension between traditional static psychophysical methods and ecological studies of perception. In the former, the methods of experiencing and reporting changes in the physical properties of stimuli are tightly constrained, with modern-day visual psychophysics studies requiring participants to sit in a head-fixed orientation and respond to images on a computer screen. In the latter, inspired by the work of James Gibson (1962, 1979), the manipulations and measurement paradigms focus on elucidating the information which can be garnered from ecologically relevant variables – a topic which has been particularly fruitful in the study of weight perception (Amazeen & Turvey, 1996; Zhu, Shockley, Riley, Tolston, & Bingham, 2013). While it is clearly difficult to bridge the conceptual gap between these two very philosophically different viewpoints, immersive VR might offer a way to bridge the methodological gap, allowing for tight control of visual manipulations in a context which allows the observer unfettered access to the information inherent in the task, rather than the cues provided by the experimenter – a distinction which has recently shown to be important in the context of how humans perceive images of objects versus real objects themselves (Snow, Skiba, Coleman, & Berryhill, 2014; Squires, Macdonald, Culham, & Snow, 2016).
To summarise, this work has confirmed that the SWI can be induced in VR and has highlighted that, in a virtual environment, haptic cues to object size induce a larger illusion than visual cues to object size. Importantly, when combined, these cues elicit a stronger SWI in VR than when the size differences are experienced by either modality in isolation.
Footnotes
Acknowledgements
This research project would not have been possible without the support of the Experimental Psychology Society’s Small Grants Scheme. The author thanks all the participants who volunteered their time to take part in this study, Tobias Eibeck for his assistance with data collection, as well as Lucia van Eimeren for her comments on an earlier draft of the manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
