Abstract
This study investigates the crossmodal associations between naturally occurring sound textures and tactile textures. Previous research has demonstrated the association between low-level sensory features of sound and touch, as well as higher-level, cognitively mediated associations involving language, emotions, and metaphors. However, stimuli like textures, which are found in both modalities have received less attention. In this study, we conducted two experiments: a free association task and a two alternate forced choice task using everyday tactile textures and sound textures selected from natural sound categories. The results revealed consistent crossmodal associations reported by participants between the textures of the two modalities. They tended to associate more sound textures (e.g., wood shavings and sandpaper) with tactile surfaces that were rated as harder, rougher, and intermediate on the sticky-slippery scale. While some participants based the auditory-tactile association on sensory features, others made the associations based on semantic relationships, co-occurrence in nature, and emotional mediation. Interestingly, the statistical features of the sound textures (mean, variance, kurtosis, power, autocorrelation, and correlation) did not show significant correlations with the crossmodal associations, indicating a higher-level association. This study provides insights into auditory-tactile associations by highlighting the role of sensory and emotional (or cognitive) factors in prompting these associations.
A crossmodal association is defined as a non-arbitrary, non-redundant association between features of two or more sensory modalities that is consistent over time and population (Spence, 2011). The most popular demonstration of this phenomenon is the “bouba-kiki” effect, where the sound “bouba” is associated with smooth shapes and “kiki” with angular shapes. This effect is seen almost universally across cultures (Ćwiek et al., 2022; Koriat & Levy, 1979; Kovic et al., 2010; Maurer et al., 2006; Parise & Spence, 2012). Notably, this effect was also found in people of the Himba tribe in Namibia who do not have a written script (Bremner et al., 2013) and in pre-verbal children and infants (Asano et al., 2015; Imai & Kita, 2014; Maurer et al., 2006). These associations between different sensory stimuli can be between specific sub-modalities; for example, visual stimuli of a higher luminance are always correlated with sound stimuli of higher loudness (Lewkowicz & Turkewitz, 1980). Numerous crossmodal associations have been found across all sensory modalities, including olfactory and gustatory stimuli (Deroy et al., 2013; Gal et al., 2011; Sakamoto & Watanabe, 2016; Seo et al., 2010; Simner et al., 2010). Despite its prevalence and intense research, the origins of these associations are not clearly known. They are thought to arise from one of three theories: the co-occurrence of sensory features in the environment (Parise et al., 2014), or shared linguistic, semantic, or emotional features (Martino & Marks, 2000; Palmer et al., 2013; Spence, 2020a) or the innate synesthetic wiring in the brain (Deroy et al., 2013; Maurer & Maurer, 1988; Ramachandran & Hubbard, 2001). In this study, we are interested in the crossmodal associations between textural stimuli from sound and touch. We used tactile textures encountered in everyday life, like cotton, thermocol, wood, etc., and for sound textures, we used recordings of naturally occurring sounds with overlapping events, like rain, fire, insect buzzing, etc.
The inter-relationship between touch and sound is dramatically demonstrated in the “Parchment Skin” Illusion (Jousmäki & Hari, 1998). Here, subjects were asked to rub their hands together while they heard an altered sound feedback of the same event. The auditory feedback was manipulated to selectively enhance high-frequency components, which led the participants to report that their hands felt rougher, like parchment paper. This demonstrated that the perception of tactile roughness is influenced by simultaneous auditory stimuli, although roughness in the auditory domain is mainly a temporal feature, and in tactile domain it is spatiotemporal (Di Stefano & Spence, 2022). In everyday life, it is difficult to disentangle the effect of sound and touch in identifying objects, as each interaction between the skin and an object produces sound. Participants use this auditory information to reliably identify many material properties based on the sound features that arise from touching the objects (Klatzky & Lederman, 2010). Thus, sound and touch co-occur in most natural situations to form a multisensory perception.
Many psychophysical studies have shown interactions between the perception of vibrotactile frequency and sound frequency (Bensmaïa & Hollins, 2003; Bernard et al., 2022; Convento et al., 2019; Crommett et al., 2017; Guest et al., 2002; Jousmäki & Hari, 1998; Schroeder & Foxe, 2005; Zampini et al., 2003). Importantly, Yau et al., (2009) showed that the processing of vibrotactile stimuli on the skin and pure tones in the ear interact in a frequency-dependent manner. Also, most people associate a rounder shape (2D and 3D object presented for tactile exploration in hand) with the word “Bouba” and a spikier shape with the word “kiki” (Bottini et al., 2019; Fryer et al., 2014; Hamilton-Fletcher et al., 2018; Ramachandran & Hubbard, 2001). Using materials from everyday use, smooth textures are associated with words like “bouba,” “lula,” and “maluma,” whereas rougher textures are associated with words such as “kiki,” “takete,” and “ruki” (Etzi et al., 2016). The association between some features of tactile stimuli can be attributed to phonetic features of the sound stimuli. Plosive consonants are shown to be associated with curved objects, and fricative consonants are associated with rougher materials (Lo et al., 2017). In the Japanese language, voiced consonants were associated with roughness, and voiceless consonants were associated with smoothness (Sakamoto & Watanabe, 2017). Winter et al., (2022) proposed an analogous relation between the broken airflow while pronouncing trilled /r/, and the discontinuous feeling while touching the surface perturbations of a rough surface. They found consistent features in the phonetics of words denoting roughness across 332 languages across the world and across 6,000 years. There are associations at an even higher abstract level. For example, musical features are systematically associated with tactile metaphors and adjectives. Higher pitches were rated as significantly sharper, rougher, harder, colder, drier, and lighter than lower pitches (Eitan & Rothschild, 2011). Films with different emotional content were systematically associated with physical tactile textures; tragedy was associated with granite, marble, and glass, and elements of comedy were associated with glass pebbles, plasticine, and slime (Iosifyan, 2020). Also, there exists correspondences between features of abstract art and tactile textural adjectives (Albertazzi et al., 2016).
Thus, associations between sound and touch are observed between stimuli with a wide variety of features and complexity. In this paper, we are interested in the associations between sound and tactile textures. Textures are amodal percepts that can be found in many sensory modalities. The origin of the word derives from weaving and refers to the quality of woven material, a purely tactile quality. Later, it was used to refer to visual features, sound features, and complex sensory experiences like taste (Djonov & Van Leeuwen, 2011). In sound, textures are a class of stimuli that are commonly found in the environment, like rain falling, crickets chirping, flowing water, fire, etc. They arise from the superimposition of multiple similar, small auditory events. Unlike single auditory events like spoken words which are characterized by small bursts of energy above the background, textures are defined by constant amplitude and frequency variation over time (McDermott & Simoncelli, 2011).
In the tactile sense, textures are defined by multiple surface properties, like roughness, softness, friction, temperature, etc. In perceptual rating studies, it was found that humans have a multidimensional perceptual space to perceive tactile textures, which can be explained by variations in three or four major perceptual axes. Most studies find that these major axes are: “hard-soft,” “rough-smooth,” “slippery-wet,” and “hot-cold” (Hollins et al., 1993, 2000; Okamoto et al., 2013; Picard et al., 2003). In the brain, the perception of tactile texture is constructed from inputs from three peripheral streams of information: slowly adapting (SA), rapidly adapting (RA), and Pacinian (PC), each conveying different sensory information like form, roughness, vibration, etc. (Friedman et al., 2004). In the primary somatosensory cortex, this information is combined to create the perception of texture (Lieber & Bensmaïa, 2019).
Crossmodal interactions and association studies between textures in sound and touch mostly used controlled and simple stimuli. In a speeded classification task (Guest et al., 2002), participants were asked to rate the roughness of sandpapers of different grit sizes while they listened to audio feedback of pure tones of different frequencies. Their roughness discrimination was influenced by the frequency of auditory feedback. Even though the participants touched a complex stimulus, they were asked for their roughness ratings alone. The sound stimulus was also simple, pure tones. In another study examining the crossmodal mapping between sound timbre and tactile texture in young children (3–6 years old), there was an association between smoother textures for sinusoidal tones and rougher textures for sawtooth tones (both at 311 Hz) (Wallmark & Allen, 2020). In most of these studies, the study concentrated on one component of texture, like roughness, or used only pure tones. However, the tactile experience of texture is multidimensional, so in this study, we explore the crossmodal associations between naturally occurring sound textures and naturally occurring tactile textures.
We conducted two experiments, a free association task and a two-alternative forced choice task using naturally occurring sound textures and naturally occurring tactile textures. We chose the tactile textures, like in previous studies, from everyday materials (Etzi et al., 2016; Picard et al., 2003; Sakamoto & Watanabe, 2016, 2017) and natural sound textures from various natural categories from published literature (Mishra et al., 2021). These sound textures are recorded from nature and show textural quality as defined by the statistical parameter of McDermott & Simoncelli (2011), for example, the sound of wind moaning, water bubbling, geese cackling, etc. We show that there are consistent crossmodal associations between textures of the two modalities. Participants associated a higher number of sound textures with tactile surfaces that were rated as harder, rougher, and of intermediate rating in the sticky-slippery scale; however, they vary depending on the class of sound textures. We did not find many significant correlations between the associations and the statistical features of sound textures derived from the sound texture synthesis algorithm developed by McDermott & Simoncelli (2011).
We interpret our results in the context of perceptual similarity between stimuli from different senses. Helmholtz famously declared that there cannot be any similarity between stimuli of different sensory modalities (Spence & Di Stefano, 2022). However, later research has shown consistent mapping or associations between stimuli from different modalities (Marks, 1978). These are present in combinations of almost all sensory modalities and are described as various concepts like harmony, congruency, sensory intricacy, etc. (Spence, 2022; Spence & Di Stefano, 2022). When two stimuli from different modalities are reported to have an association, it could be because of the perception of similarity between them. This perception of similarity could happen for multiple reasons: phenomenological similarity, amodal property, relative positioning, common affective quality, analogical reasoning, or statistical learning (Di Stefano & Spence, 2023). In our study, we show that the crossmodal associations between sound and tactile textures show a variety of strategies, mainly emotional mediation and cognitive strategies.
Materials and Methods
All study protocols were approved by the Institute Ethics Committee at the Indian Institute of Information Technology, Gandhinagar, following the guidelines established by the Indian Council of Medical Research (approval number: IEC/2022-2023/F/LL/007). We performed experiments on a total of 75 participants. We chose the number of participants based on previous crossmodal association studies (n = 25, Etzi et al., 2016; n = 30, Falcón et al., 2019; n = 34, Iosifyan, 2020; n = 20, 60, Sakamoto & Watanabe, 2016, 2017). We conducted two experiments: (i) crossmodal matching task (n = 30) and (ii) two alternate forced choice task (2AFC) (n = 25). Prior to the tasks, we conducted two pilot experiments to select the appropriate stimuli. Before each experiment, participants were provided with an information sheet detailing the task, and they gave their written consent.
Stimuli Selection Pilot: Tactile Texture Selection & Rating
In order to select tactile textures for the study, we conducted a pilot grouping experiment with five participants (age 21–23, mean = 21.2, SD = 1.10; male = 20%). Seventy-four tactile textures were obtained from everyday materials such as cotton, textured sofa covers, and metal pieces (refer to Supplemental Figure (S1) for a complete list). The textures were cut into 3 cm×3 cm squares and affixed on a base made of MDF.
The tactile textures were placed behind a black, rectangular cardboard box. Participants placed their fingers through a hole at the bottom of the box. This ensured that the participants could not see the textures. Participants were asked to touch the textures with their fingers and group them based on their perceived similarity. Textures were presented one by one. When participants decided on a grouping, they verbally reported their grouping. After every 10 textures or upon request, participants were allowed to revisit their groups. They could also reassign textures to different groups. Finally, participants were asked to provide a description for each group. They had the freedom to repeat this process as many times as desired, and there was no time limit. Each participant was required to create at least two groups, with each group containing at least one texture.
The grouping data was then utilized to construct a 73 × 73 similarity matrix, where the similarity between two textures was determined by the number of times they were grouped together. A multidimensional analysis (MDS) was performed on the similarity matrix to identify perceptual groupings among the textures. The resulting MDS plot (Supplemental Figure S2) visually displayed four distinct clusters. Using this number, a hierarchical clustering analysis (S3) was conducted. From the clustering graph that was obtained, five textures were selected from each cluster group (Supplemental Figure S3). They were further chosen from branches of the dendrogram that were as far apart as possible, ensuring that the chosen tactile textures were sufficiently diverse. Twenty textures were selected for the subsequent experiments (Table 1).
List of tactile textures presented in the matching task.
Column, “Index on board” refers to the x,y position (from the top left corner) of the texture on the texture board presented in the crossmodal matching task. The positions were chosen randomly (the board is shown in Figure 1).
Rating Tactile and Sound Textures. In this pilot, participants (N = 15, mean age = 20.32, SD = 1.52, male = 56%) rated the selected tactile textures on three salient perceptual dimensions: Hard-Soft (H-S), Rough-Smooth (R-S), and Slippery-Sticky (Sl-St) (Hollins et al., 1993; Okamoto et al., 2013). Participants were seated as in the previous experiment and touched the samples one by one with their dominant hand while holding a computer mouse with their other hand. We used a Visual Analog Scale to collect the ratings. A continuous line with adjective labels on each end was displayed on a computer screen. The order in which the scales appeared on the screen was randomized for each texture, and the presentation side of each adjective was also randomized across participants (e.g., “extremely hard” appeared on the left end of the scale for half the participants and on the right for the others). Before the rating task, participants familiarized themselves with the 20 textures by exploring them individually. We calculated the average distance from “extremely hard,” “extremely rough,” and “extremely slippery” for each participant, resulting in measures for the perceptual dimensions of each texture. The values obtained are presented in Supplemental Table S4. We then performed the rating task for the sound textures with the same parameters and methodology (N = 30, mean age = 20.72, SD = 1.65, male = 56.6%). The mean values for each of the three perceptual axes are shown in Supplemental Table S5.
Sound Texture Stimuli. For the sound stimuli, we selected a set of 98 textural sounds from a publicly available dataset (McDermott & Simoncelli, 2011). These sounds were provided as .wav files, each with a duration of 7 s. Based on previous work by Mishra et al. (2021), which highlighted similar sound statistics within thematic groups such as animals, people, environment sounds, and mechanical sounds, we randomly chose three sounds from each of these groups using a random number generator from random.org. In total, we selected 15 sounds, which are listed in Table 2.
List of sound texture stimuli used in the matching task.
Experiment 1: Crossmodal Matching Task
This experiment contains a quantitative experiment and qualitative data collection. For the quantitative experiment, participants (N = 30, age = 18–31, mean = 22.4, SD = 3.42, male = 46.7%) were seated at a comfortable distance from the texture board (Figure 1) containing 20 textures arranged in a 5 × 4 matrix. For each trial, a sound texture was played through the headphones, and the participants had to select a texture on the board that best matched the sound they heard. To indicate their responses, participants pointed to the textures they found similar.

Texture stimuli presented in the crossmodal matching task.
The textures were pasted on the board with 1.5 cm between each texture horizontally and vertically. The participants could not see them as the board was placed under a cardboard box. They had to insert their hand through a hole in the box to explore each texture one by one. They were instructed to explore as they would a surface in their daily life. Before the experiment began, participants familiarized themselves with the textures on the board for 5 min. Participants wore noise-canceling headphones (SONY WH-1000XM4) during the entire experiment. During the experiment, they could listen to the sound and touch the textures as many times as needed. The qualitative data was collected at the end of the experiment, where the participants were asked to provide qualitative descriptions of the intuition behind their associations.
Experiment 2: Two Alternate Forced Choice Task
The 2AFC experiment was conducted to confirm the findings of the previous task in a different set of participants (N = 25, mean = 20.32, SD = 1.52, male = 56%). The ten sounds for which the participants showed a significant preference for certain tactile textures were chosen (Figure 2A). For each sound, the most similar and dissimilar textures were presented as the tactile stimuli. Two tactile textures were placed in 3 cm×3 cm square-shaped slots, separated by a 2 cm gap. In each trial, a sound file was played, and both tactile textures were presented to the participant's fingers. Participants were asked to choose the texture they felt was most similar to the sound. A keyboard was placed next to the setup for recording the responses. Participants indicated their choice by pressing “f” or “j” for the left texture or right texture, respectively. Each sound was repeated six times, and the order of sounds and presentation of textures (left or right) were randomized. The textures were not visible to them as it was covered by a cardboard box, and the participants placed their index and middle fingers through a hole at the bottom of the cardboard box. They also wore noise-canceling headphones (SONY WH-1000XM4) during the entire experiment.

(A) Association frequencies of tactile and sound texture associations. Each number indicates the number of times the sound texture indicated in the y-axis is associated with tactile textures indicated in the x-axis. Darker colors represent higher frequencies. (B) p-values for Cochran's Q test conducted on the distribution of associated tactile textures for each sound. This tests whether the frequencies of choosing sound-texture pairs (within a sound) were evenly distributed or whether some textures were chosen significantly more often than others. * p < .05, ** p < .01, ***p < .001. Effect size is given by Eta-squared (η 2 Q ). η 2 Q = 0.01 indicates a small effect size, η 2 Q = 0.06 indicates a medium effect size, and η 2 Q = 0.14 indicates a large effect size.
Results
All data were analyzed on MATLAB R2022a and RStudio. Cochran's test was conducted with R package RVAideMemoire (version 0.9–81-2) and Chi Square Test on R package Stats (version 3.6.2). The Kullback-Leibler divergence (KL divergence) was conducted in MATLAB R2022a.
Crossmodal Matching Between Sound and Tactile Textures
Participants performed a crossmodal matching task by listening to audio files of sound textures through headphones and selecting a tactile texture that they perceived as similar to the sound. A total of 20 different tactile textures were presented. For each sound texture, we calculated the average number of times each tactile texture was chosen across 30 participants, called association frequency (Figure 2A). Our results indicated that certain tactile textures were preferred over others for specific sound textures, which is shown as a darker color in the matrix plot in Figure 2A. We conducted Cochran's Q tests on the average number of times a texture was chosen for a sound. Cochran's Q is a test of homogeneity applicable to frequency data. Among the 15 sound textures, ten exhibited a statistically significant preference for a particular tactile texture (Cochran's test for significance, p < .012, see all the Cochran's coefficient values below in Figure 2B), while five sound textures (insects during the day in the South, Wind moaning, Applause, Bath being drawn, and Surf hitting beach) did not show any preference. Figure 2B provides the p-values for the Cochran's tests.
Furthermore, we found that some tactile textures were associated with a greater number of sound textures compared to others. For instance, “wood shavings” were reported as similar to eight different sound textures, while ten other tactile textures showed no association with any sound texture at all. Figure 3A shows the tactile textures that were associated with many sound textures; we used a threshold of more than four different associations with sound textures (five is half of the highest frequency of associations). We observed a significant correlation between the hardness rating (r = −0.488, p = .029) of the tactile textures and the number of associated sound textures. However, the ratings on the other perceptual scales, “rough-smooth” (r = −0.113, p = .635) and “slippery-sticky” (r = −0.134, p = .572), did not correlate significantly (Figure 3B). This shows that harder tactile textures are associated with more sound textures.

(A) The bar graph shows the number of sounds textures for which tactile textures were associated more than four times. The x-axis represents the number of sound textures. For example, wood shavings were associated with eight sound textures more than four times. (B) Correlation between perceptual ratings of tactile textures and a number of associations with sound textures. Each graph shows the correlation between one of the tactile perceptual features (hard-soft, rough-smooth, sticky-slippery). The y-axis represents the number of times a texture was chosen (count), and the x-axis represents their ratings on the perceptual dimension. The trend line is plotted in blue. The grey area around the trend line represents the 95% confidence interval.
Confirmatory Experiment: Two Alternate Forced Choice Task
To confirm the correspondences observed in the initial matching task, we conducted a 2AFC task. Participants were presented with two tactile textures and asked to select the one that was most similar to the sound texture they heard. We performed this experiment for all sound textures with the two tactile textures that were rated as the most similar and dissimilar in the previous experiment.
The results of the 2AFC task consistently showed that participants chose one tactile texture over the other for each sound texture, ranging from 74.6% to 87.3%, with an average of 81.43% (Figure 4A). Chi-square tests confirmed the statistical significance of all these choices (Figure 4B). Also, participants consistently chose the tactile texture that had been rated as most similar in the previous rating experiment for all sound textures, confirming the findings of the previous experiment.

(A) Percentage of a texture being chosen in the 2-AFC task. Blue bars represent the highest-rated texture (similar) in the correspondence task, and red represents the lowest-rated (dissimilar). (B). Chi-square test results for each of the sound textures. The table shows significant differences in choosing one tactile texture over the other for all sound textures.
Influence of Tactile and Sound Perceptual Dimensions
Next, we examined whether these correspondences were influenced by the major perceptual dimensions of tactile textures (hard-soft, rough-smooth, slippery-sticky). We categorized participant ratings from the pilot experiment (see section Rating Tactile and Sound Textures in Materials and Methods for details) for each tactile texture's dimensions into five levels (1–5). For each sound texture, we conducted chi-squared tests to determine if the correspondence frequency was influenced by the ratings of the tactile texture on each of the three perceptual dimensions (Table 3). Only those sounds that exhibited significant preferences for specific tactile textures were included in this analysis.
The total number of tactile textures chosen for each sound texture for particular perceptual dimensions for all participants, with associated p-values for the chi-squared test.
Effect sizes are given by Cramer's V. V ≤ 0.2 indicates a weak effect, V ≤ 0.6 indicates a moderate effect, and V > 0.6 indicates a strong effect.
Note: All p-values are Dunn–Šidák corrected. *p < .05, **p < .01, ***p < .001.
The results indicated that all three dimensions were significant for all sounds: Rough-Smooth (p < .001), Hard-Soft (p < .05), and Slippery-Sticky (p < .001) (see the chi-square values below in Table 3).
In the hard-soft scale, the majority of sounds were associated with tactile textures rated as extremely hard. Only “geese cackling” and “bubbling water” were associated with soft textures. Similarly, in the rough-smooth scale, most sounds were associated with rough tactile textures, except for “bubbling water” and “cat lapping milk.” In the slippery-sticky scale, most sounds were associated with tactile textures rated at an intermediate level 3, except for “bubbling water,” which correlated with textures rated as 4 (Table 3).
We wanted to know if the associations were caused by the perceptual ratings of the sound textures alone. We used the perceptual rating of the sound textures on the three axes collected in the pilot experiment (see section Rating Tactile and Sound Textures in Materials and Methods for details) (Supplemental Table S5). We then performed a KL divergence to know if the distribution of the sound texture rating was different from the distribution of the sound-tactile associations ratings, which were collected in Experiment 1. We found a high similarity between the distributions for most of the associations in the “rough-smooth” and “hard-soft” axes (Supplemental Figure S6). So, the participants associated sounds with tactile textures because they rated them both similarly, at least in these two axes.
Finally, we explored whether the correspondences were correlated to any statistical features of the sound textures. We used the sound synthesis algorithm from McDermott and Simoncelli (2011) to calculate the statistical features of sound textures. We calculated a set of statistics that described the sound textures (mean, variance, kurtosis, power, autocorrelation, and correlation). Since the algorithm calculates these statistics for 30 frequency bands, resulting in a very large number of features, we used the mean values for the six statistics. We correlated this with the association frequency we computed for the sound-tactile texture associations. We did not find significant correlations in most of the associations. Only 6 out of 60 correlations were significant (Table 4).
Correlation between sound statistics and sound-tactile texture association frequency.
*p < .05, **p < .01, ***p < .001.
Discussion
We show that participants reported crossmodal associations between naturally occurring sound textures and tactile textures. They associated surfaces that were rated as hard, rough, and of intermediate slipperiness with more sound textures. The statistical features of the sound textures did not correlate with the crossmodal associations. Each sound texture category has a different pattern of association with tactile textures, which are explained below.
Let us first examine the category of “liquid sounds.” In our data, only one of the three water sounds (bubbling water) exhibited significant correspondences with tactile textures. The “bubbling water” consistently correlated with materials that were highly slippery (rated 2 out of 5, with none of the textures rated as 1, which is extremely slippery). No other sound was matched with tactile textures that had higher values on the “slipperiness” ratings. Furthermore, this sound texture was only correlated with extremely smooth and soft tactile textures. Another example is the sound of “cat lapping milk,” which is classified as an animal sound but shares some properties of a liquid sound due to the sound of lapping milk. This sound was associated with tactile textures rated as slippery (2 out of 5), smooth, and at an intermediate level between hard and soft. The participants heard a liquid sound and attributed it to materials rated as slippery, soft, and smooth. Interestingly, animal sounds such as “cat lapping milk,” “geese cackling,” and “horse buggy” were also associated with materials with high “slipperiness” ratings.
In contrast, mechanical sounds (such as “radio static,” “electric adding machine,” and “pneumatic drill”) exhibited a unique association profile contrasting to the “liquid sounds.” These sounds are produced through mechanical interactions between two solids or, in the case of “radio static,” electronically mimicking such an interaction. Sound textures in this category were consistently associated with tactile textures rated as extremely hard, rough, and scoring at an intermediate level on the slippery-sticky scale. These sounds show a clear preference for harder and rougher materials.
Another category of sounds, “people sounds,” and “fast breathing,” was associated with tactile textures that were extremely hard and rough and with an intermediate rating on the slippery-sticky scale. Thus, even though there is an overall preference for rough, hard, and intermediate slippery textures, the preference for sounds from different sound groups is different. The starkest difference is found in the associations between liquid sounds and mechanical sounds, as mentioned above.
Another observation is that the harder tactile textures are associated with a greater number of sound textures. For example, wood shavings and sandpaper are associated with eight sounds, while pillow cotton is associated with fewer sounds. This could be because harder surfaces are more likely to produce sounds than soft materials in a sound-producing event. This could be because harder surfaces are more likely to produce sounds that are louder and easily distinguishable from the background noise (Gaver, 1993).
The overall preference for harder, rougher, and intermediate slippery tactile textures could be because of a biased sampling of tactile textures. This is unlikely as we selected from clearly separated groups in our initial grouping experiment of 73 textures (refer to the selection of tactile textures in Materials and Methods). Another possibility is that the sound textures might themselves be rated as rough, hard, and of intermediate slipperiness. This is unlikely as the sound textures were rated in a wide range in the “hard-soft” and “rough-smooth” axes (range: 0.2–0.6). However, most sound textures were rated as intermediate (0.5) in the “slippery-sticky” ratings (Supplemental Table S5). So, there is a possibility that the associations are mainly in the intermediate range of “sticky-slippery” axes because the sound textures used were mostly in that range. The qualitative responses collected after the grouping experiment show that they used a diverse strategy (see below).
Perceptual Similarity and Crossmodal Associations
To understand the nature of these crossmodal associations, we interpret the qualitative results from the perspective of the perception of similarity. The concept of similarity is a fundamental aspect of perception (Goldstone & Barsalou, 1998; Tversky, 1977), and there are multiple ways for the perception of similarity between stimuli of two modalities to occur (Di Stefano & Spence, 2023). It can range from a similarity in phenomenological experience to the conceptual connection between the stimuli. Alistair (2013) classified the perception of similarity into “property-based” when two objects share phenomenological qualities and “isomorphism-based” when properties are mapped between the two objects. The property-based similarity focuses on the stimulus features (including amodal features) and emotional attribution of the stimulus. Roughness is amodal stimuli common to both sound and tactile textures. In our experimental stimuli, we did not study the perception of roughness separately, so it is difficult to delineate the contribution of the amodal quality of roughness in these associations, although the perceptual ratings show a strong influence of roughness ratings.
We found emotional mediation in many of the participant's responses. For example, participants reported that “sounds of water always feel smooth or polished to me” or “this sounds like water, I love water, and this feels good.” Participants noted that “if it is pleasant, I am choosing the soft one” or “when the sound is noisy and bad, then I feel the sound is hard, irregular; which are okay and not that noisy, I am choosing sounds that are soft.” This is similar to an earlier study by Etzi et al. (2016), which showed that everyday materials similar to those used in our study were associated with adjectives like bright, loud, feminine, and beautiful, as well as a wide range of emotions such as sadness, happiness, comfort, joy, and relief. Emotional mediation is one of the most common accounts of crossmodal similarity perception and has been seen in various domains (Guetta & Loui, 2017; Spence, 2020a, 2020b).
The next class of perception of crossmodal similarity is because of “isomorphism-based” reasons. Here, stimuli from two modalities can be mapped to a common structural scale, which leads to the perception of similarity (Spence, 2020a; Spence & Di Stefano, 2022). The associations could have occurred if the participants had mapped the perceptual features of the sound and tactile textures on the same common scale. In our experiment, we tested both stimuli on three axes (hard-soft, rough-smooth, slippery-sticky). The results of the KL divergence analysis show that the distribution of perceptual rating of sound texture is similar to the sound-tactile texture association ratings in the “hard-soft” and “rough-smooth” axes. So, the participants could have used the rating of the sound texture and tactile textures to report these associations. However, we cannot access which perceptual scales the participants used to report the associations, so it is difficult to conclude if they used a common structural scale.
Furthermore, “isomorphism-based” similarity can occur because of implicit or explicit cognitive criteria (Spence & Di Stefano, 2022). Despite the absence of phenomenological similarity between the two stimuli, the similarity can be inferred cognitively. An example of this is when participants imagined scenarios related to the origin of the sound or interactions with solid materials that could have created the sound. For instance, some responses were, “if it drops on the floor what kind of sound it makes” or “on which surface it would drop on” or “imagined water falling on a tile making a similar sound.” Some participants reported that when they heard the sound of a bath being drawn, they imagined the sound of the liquid hitting a tile and associated the sound with the texture of acrylic. These associations are formed not because of their phenomenological similarity but because of cognitive factors. For example, one participant associated the sounds of “fire” with trees burning and chose “wood shavings” in their association. Similarly, for “geese cackling” sound, a participant associated it with textures related to trees because they reasoned that birds live on trees. In another case, the animal sound “geese cackling” was associated with cotton because they reported that birds have soft feathers, so they chose a soft texture. These observations argue that associations are driven by higher-level associations between sound and touch, in line with the “semantic coding hypothesis” (Martino & Marks, 2000), which suggests that stimuli from two sensory modalities can be associated if they share a meaning or property. Previous studies have shown that participants were faster and more accurate in identifying images of animals when animal sounds were presented with the corresponding animal picture compared to when only the sound or picture was presented (Hein et al., 2007; Molholm et al., 2004). Thus, participants, without any prompting, are able to use cognitive strategies to report associations between sound and tactile textures.
Limitations
Although significant crossmodal associations are shown in this study, they might not be generalizable as these associations might be dependent on the stimuli presented in this experiment. This is especially true in the 2 AFC experiment, where the participant has to choose from one of the two options. Even though we selected the tactile stimuli systematically based on previous studies, it might be insufficient in capturing all the tactile qualities possible. The generalizability of the results might also be limited because of the low number of participants in the matching task and two alternative forces choice tests. However, post-hoc power analysis showed a moderate to high power (between 0.86 and 0.72 for the 2AFC) for the tasks (Supplemental Tables S7 and S8). We could not calculate the post-hoc power for Cochran's test as we could not find a sufficient power test (see Figure 2B for effect sizes). Another problem in the generationability of this study is the fact that many emotional and cognitive strategies were used for the crossmodal association. This could depend on many sociocultural, educational, and linguistic factors that our study did not control for. Lastly, the demonstrated associations could be formed because of a strong similarity between similar stimuli or through a strong feeling of dissimilarity between the far-away stimuli. Our study does not delineate between these two possibilities.
We also found that the variations in six statistical features that best describe the sound textures did not correlate with the pattern of associations. This is a low-level feature of the sound stimuli and cannot be directly compared with the tactile perceptual ratings. We have not looked at the low-level statistical features of tactile textures as they are multidimensional and technically challenging to study. It is difficult to quantify the roughness, viscosity, slipperiness, temperature, etc., of naturally occurring textures. Interestingly, Yau et al. (2009) have proposed that a signature for tactile textures could be the spectral signature of skin vibrations when participants stroke a fine texture, which they call a “timbre” quality (Bensmaïa & Hollins, 2003). According to their argument, fine textures are perceived by decoding the spectral quality of the vibrations produced in the skin. It will be interesting to see if crossmodal associations between complex textures correspond to the statistics of the “tactile timbre” akin to the frequency of vibration in the vibrotactile stimulations.
Conclusion
We show that participants report crossmodal correspondences between naturally occurring tactile textures and sound textures based on multiple strategies. Departing from the unidimensional stimuli used in the past, we have shown that associations exist in complex tactile and sound stimuli. For natural sound textures, participants prefer to associate them with everyday materials that are hard, rough, and moderately slippery/sticky. However, they show category-specific preferences such that mechanical sounds are associated with hard, rough, and intermediate slippery/sticky sounds, and water sounds are associated with soft, smooth, and slippery textures. This pattern of association is not correlated with the statistical feature of sound textures, ruling out a low-level association, at least in terms of sound textures. The crossmodal associations shed light on the perception of similarity between these two sensory modalities, where we found many instances of emotional mediation and cognitive reasoning. We believe that the sensory pathways in sound and touch are yet to be explored, and more studies are needed using controlled textural stimuli. Recent advances in algorithms to create textures, along with advances in 3D printed and organic haptics technologies, can produce textures with specific textural properties (Heeger & Bergen, 1995; Kuroki et al., 2021; Metzger et al., 2021; Sahli et al., 2020; Tymms et al., 2018). Further experiments with these artificially created textures can unearth feature-based interaction between textures from different modalities. Also, more work is needed to understand the hierarchy of strategies (sensory, emotional, cognitive) that underlie the perception of similarity between audiotactile stimuli (Armary et al., 2018).
Supplemental Material
sj-docx-1-pec-10.1177_03010066231224557 - Supplemental material for Crossmodal associations between naturally occurring tactile and sound textures
Supplemental material, sj-docx-1-pec-10.1177_03010066231224557 for Crossmodal associations between naturally occurring tactile and sound textures by Vanalata Bulusu and Leslee Lazar in Perception
Footnotes
Acknowledgement
We would like to acknowledge Mr. Rakesh Srisai Kottu for his help with the statistical analysis.
Author contribution(s)
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the IIT Gandhinagar Internal Research Funds.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
