Abstract
Abstract
Traditionally, virtual reality (VR) exposure-based treatment concentrates primarily on the presentation of a high-fidelity visual experience. However, adequately combining the visual and the auditory experience provides a powerful tool to enhance sensory processing and modulate attention. We present the design and usability testing of an auditory–visual interactive environment for investigating VR exposure-based treatment for cynophobia. The specificity of our application involves 3D sound, allowing the presentation and spatial manipulations of a fearful stimulus in the auditory modality and in the visual modality. We conducted an evaluation test with 10 participants who fear dogs to assess the capacity of our auditory–visual virtual environment (VE) to generate fear reactions. The specific perceptual characteristics of the dog model that were implemented in the VE were highly arousing, suggesting that VR is a promising tool to treat cynophobia.
Introduction
Diverse audio-based applications have been implemented in the last few years, involving 3D sound. However, the majority of these applications have been designed to work with blind persons (e.g., Refs.8,9) and/or individuals with a severe disability. 10 Auditory augmentation of visual environments is known to improve presence and immersion. 11 However, in our application, the auditory information is not used as a way to supplement the visual information. Because strong emotional reactions can easily be elicited through audition, 12 we want to fully exploit the potentiality of 3D audio to increase the realism and richness of the immersive environment, but also to understand how different sensory information should be combined to trigger affective reactions in a virtual environment (VE). Humans are easily distracted by an irrelevant auditory event when their visual attention is focused elsewhere. 13 This implies that endogenous orienting does not suppress auditory exogenous effects and that audition can therefore serve as a strong cue to induce emotional reactions in individuals immersed in a visual environment. Furthermore, animal sounds seem to have a stronger influence in alerting than any other sounds. 14
We want to know whether dog phobia could potentially be successfully ameliorated with VR treatment. To achieve this goal, we need to be able to generate emotional reactions when facing virtual dogs. The primary aim of the current study is to identify the situations in which emotional reactions can be evoked in individuals who fear dogs. A secondary aim is to test the impact of features that can be manipulated in VR only (e.g., visual contrast of the scene, the presence or absence of visuals when there is a sound, sound source localization control and dog behavioral control). We will present here all the necessary steps that guided us to choose the different components of the VEs, in which participants are immersed and confronted to virtual dogs.
VR exposure therapy aims to reduce fear and anxiety responses. It consists in a progressive confrontation to fearful situations with the objective to trigger a phenomenon of habituation. We have chosen to develop gradation techniques, which vary along several dimensions to manipulate the emotional component of the stimulus. We used different sounds of dogs, accompanied or not with visuals, graded in emotional valence through the manipulation of light, and the composition of the visual environment. Previous research has suggested that several perceptual features, such as movement and the physical appearance as well as auditory aspects of the phobic animal, are critical in controlling fear behavior.15–17 To create the content of the environment, several concepts needed to be researched. They will be presented in the Methods section. We want to investigate whether adequately combining the visual and the auditory experiences provides a powerful tool to modulate attention and to make the exposure more efficient.
To conduct our study, we needed to gather information on the fear of dogs to design our VE and animated dogs. We also needed to select participants reporting an unusual fear of dogs. We therefore conducted a two-stage screening process. On the basis of the pre-evaluation, we selected one dog model and invited 10 participants fearful of dogs to take part in the evaluation procedure of our application.
Methods
Pre-evaluation
Fear of dogs screening
We devised a questionnaire to explore the fear of dogs (possible range for this dog phobia score: 0–42). This questionnaire consisted of two sections. The first section asked four yes/no questions about reactions to dogs: “Do you fear dogs more than other people do?” “Do you endure the presence of dogs with anxiety?” “Are you afraid of a specific dog breed and if yes, which one?” “Does the size of the dog have an effect on your fear?” The second section comprised 14 questions rated on a scale of 0 (no fear) to 3 (extreme fear), assessing fear in response to the size of a dog, the activity level of a dog, and the physical restraint of a dog (e.g., leash).
One hundred and fifteen individuals (54 females) have participated in this screening (Fig. 1). The mean age of the sample was 31.7 years (SD=9.7). The preliminary questionnaire obtained a mean rating of 10.9 (SD=9.3). There was no difference between the ratings of males and females.

Distribution of the dog phobia score in a sample of 115 individuals.
Twenty four individuals responded “yes” to the first question (“Do you fear dogs more than other people do?”). These individuals had scores ranging from 8 to 34 (mean=23.3, SD=9.3). Therefore, this yes/no question was not completely discriminative and the total score was used to select dog-fearful participants.
Forty eight individuals reported that they were afraid of a specific breed, specifying pit bull 22 times, Doberman 10 times, rottweiler (8), German shepherd (7), and bulldog (3). Seventy individuals reported that the size of a dog had an impact on their emotional reaction.
Virtual dogs selection
To validate the threatening dog model on which our exposure protocol would be based, we first built eight different dog models and animated them with a growling movement. We used the following dog breeds: boxer, German shepherd, pit-bull, Staffordshire, Doberman, miniature pinscher, malamute, and Great Dane. Ten randomly chosen participants who took part in the fear of dogs screening were invited to evaluate these eight animated dogs. They were asked to rate the valence and arousal of animations of these different dog models. On the computer screen, each of the eight dogs was presented for 1 second. At the picture offset, the self-assessment rulers for valence and arousal were presented. The presentation of dog models was pseudorandomized.
Surprisingly, the pit-bull model was not judged as the most negatively valenced and arousing. The Doberman model was the most uniformly rated and was therefore selected as the threatening stimulus for the exposure sessions (Fig. 2A).

Evaluation procedure
The aim of this evaluation was to assess the emotional reactions that were evoked during immersion. The session started with a training immersion in a dog-free environment, so that the dog-fearful participant could get used to the setup and learn to navigate easily. Afterward, the participant was gradually exposed to anxiogenic situations with virtual dogs using a variety of animations to simulate the dynamic behavior of the dogs. Each evaluation session lasted about one hour and a half, and included an interview, a training immersion without virtual dogs, an immersion with virtual dogs (10 minutes), and a thorough debriefing.
Virtual environment
Several animations of the dog model have been developed: lying, walking, seating, and jumping. The dog model could growl and bark and the experimenter could control the dog animations with keys. The dog's barking and growling were spatialized in 3D. We used three different textures to modulate the arousing effect of the dog model: malamut, minpin, and Doberman (Fig. 2A).
The VE was an outdoor scene at daylight with a light fog floating in the air composed of connected gardens surrounding two large houses (Fig. 2B). The contrast condition was manipulated at a precise point during the navigation so as to reach low luminance contrast (heavy fog), making it hard to visually distinguish the surroundings. The ambient audio environment was a peaceful outdoor sound with singing birds and distant voices.
VR setup
The sessions took place in an acoustically damped and sound proof recording studio with the light switched off. The visual scene was presented on a 300×225-cm2 stereoscopic passive screen, corresponding to 90×74 degrees at the viewing distance of 1.5 m, and was projected with two F2 SXGA+Projection Design projectors. Participants wore polarized stereoscopic viewing glasses (Fig. 2C).
The most natural audio technique for VR applications is the binaural rendering on headphones that relies on the use of HRTFs. HRTF refers to the Head Related Transfer Function, which is a set of filters measured on an individual or artificial head and used to reproduce all the directional cues involved in auditory localization.
18
The auditory scene was presented through Sennheiser HD650 headphones and the sound stimuli were processed through binaural rendering using a selected nonindividual HRTF of the LISTEN HRTF database (
Participants
Participants who obtained a total dog phobia score of 20 or above 20 on the 14-item pretesting questionnaire were invited to take part in a diagnostic interview with a clinical psychologist that was based on the Mini International Neuropsychiatric Interview. 19
Ten selected dog-fearful individuals (see details in Table 1) were then invited to further participate in exposure sessions aiming at assessing their emotional reactions when facing virtual dogs. None were under medical treatment. Participants had no history of any anxiety disorder nor did they currently meet the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria for any psychiatric disorder. The study was carried out in accordance with the Declaration of Helsinki. The participants were given information about the procedure and signed an informed consent form.
The population was homogeneous in terms of age and education. The dog phobia score refers to the score obtained during the pre-evaluation screening. Trait anxiety varied among participants, ranging from low (26) to high (62).
Questionnaires and interview measures related to the exposure sessions
We used the State Trait Anxiety Inventory (STAI) to measure the anxiety levels. 20 This scale differentiates between the temporary condition of state anxiety and the more general and long-standing quality of trait anxiety. The state portion of the STAI was used upon arrival at the laboratory and after completion of the exposure session.
A 22-item cybersickness scale was used to assess the level of discomfort immediately after exposure. 21 The presence questionnaire from the I-group 22 was presented after immersion.
Before exposure, participants were asked to evaluate on a three-point rating scale (1=not afraid, 2=quite, 3=very) their apprehension of virtual dogs.
During the exposure immersion, interventions of the experimenter were avoided to maximize engagement. Interventions concerned only anxiety ratings that were collected during the exposure with the Subjective Unit of Distress (SUD), a self-report measurement of anxiety on a 0–100-point scale. 23 The participant was asked at the beginning, when facing the first brown dog (SUD1), at the middle, when facing the second dark dog (SUD2) and at the end of immersion (SUD3) about his/her level of anxiety.
During exposure, the experimenter rated the participant's reaction in front of each visual dog. Ratings ranged from 1 to 6 (1=dog not noticed; 2=dog noticed, no specific reaction; 3=dog noticed, verbal reaction; 4=dog noticed, behavioral reaction; 5=dog noticed, verbal and behavioral reaction; 6=dog noticed, flight or freeze).
The debriefing focused on the emotional reactions evoked by the different events and by the encounter with each virtual dog.
Procedure
All participants first took part in a training session completed in the garden scene described previously, without fog and in which no dogs were present. During training, the experimenter interacted with the participant to assist him/her in his/her first navigation.
The participant was then invited to experience the second immersion aiming to expose him/her to the scene with virtual dogs. He/she was instructed that there was a frog somewhere in the environment and that his/her task was to explore it to find the frog. The frog was an auditory-visual object and could be both seen and heard. The participant was informed that the frog would be found in the surroundings of a dog. Therefore, he/she had to go as close as possible to each dog encountered to check whether there was a frog nearby. The sound spatialization played a major role in this scenario, as the participant could rely on the auditory information to locate both the dogs and the frog.
The participant started in the middle of an open square, in which there were several benches and a large tree. A brown dog (Fig. 3A) was sitting behind the fences surrounding the square. When the participant walked close to the fences, the dog reacted by jumping and barking (visual and audio exposure with movement toward the participant, who is separated from the dog by a fence). A dark dog (Fig. 3B) was sitting in a corner of the square; it stood up and growled when the participant approached it (visual and audio exposure with movement toward the participant).

The participant had to go on looking for the frog by walking to a second square connected to the first one by a small alley. At this point, the fog became thicker and made it impossible to see anything clearly. The participant could hear a dog coming toward him/her, and then away from him/her (looming and receding sound, auditory-only exposure, with movement toward the participant). When the participant would reach the second square, he/she could see on his/her right-hand side a white dog (Fig. 3C) lying down and moving its head peacefully from side to side (visual-only exposure, no reaction to the participant). At this point, the intensity of the fog returned to the initial level, making it easy to see again. The dark dog (Fig. 3D) was met again, but now standing in front of the fence behind a house, and next to it was the frog. When the participant approached the frog, he/she could hear simultaneously the sound of the frog, see the dog standing up, and see and hear it growling (auditory and visual exposure with movement toward the participant).
Results
The 10 participants completed the evaluation and succeeded in the task. There was no significant difference between men and women. Cybersickness symptoms were experienced during and after exposure, with different levels of sickness depending on the participant (see Table 2). Presence scores were all satisfactory, except for two participants (F3 and M1). These two participants with low presence scores denied any fear reaction, but demonstrated small behavioral reactions when facing virtual dogs (see Table 3).
State 1 refers to the state anxiety score measured before immersion. State 2 refers to the state anxiety score measured after immersion. SUD1 refers to the SUD collected after the encounter with the first dog (brown dog). SUD2 refers to the SUD collected after the encounter with the second dog (dark dog). SUD3 refers to the SUD collected at the end of the immersion.
SUD, subjective unit of distress
Dog 1 refers to the brown dog. Dog 2 refers to the first encounter with the dark dog. Dog 3 refers to the white dog. Dog 4 refers to the second encounter with the dark dog.
The ratings of self-evaluation of fear of virtual dogs before immersion were very low: most of the participants thought they would not be afraid of a dog model and apprehension of virtual dogs was not linked to the dog phobia score, suggesting a resistance to an emotional involvement in the VE. However, actual reactions to virtual dogs (see Table 3) demonstrated that virtual dogs were powerful enough to generate at least surprise reactions. Furthermore, state anxiety scores differed significantly before and after completion of the exposure (the Wilcoxon test, Z=2.7, p<0.01). They were higher after exposure than before, suggesting an emotional effect of the immersion.
There was a significant correlation between the presence scores and the self-reported fear levels (SUDs) taken during navigation (correlation with SUD1: r=0.9, p<0.001; correlation with SUD2: r=0.7, p<0.05; correlation with SUD3: r=0.7, p<0.05), as well as a significant correlation between the presence scores and the apprehension of virtual dogs scores (r=0.7, p<0.05) (see Fig. 4 for illustration of these data). The participants who anticipated fear in relationship to virtual dogs were indeed those with the highest presence scores (see Fig. 4A). In the group of participants anticipating no fear, the lowest presence scores were observed, suggesting again a resistance to any emotional involvement in the VE. Those with the lowest presence scores were also the ones reporting less discomfort when they encountered the first virtual dog (see Fig. 4B). These observations confirm the consistency of our measurements, underlying the potential link between emotion and presence. 24

The different dogs evoked different reactions (see Table 3), mainly idiosyncratic (the Friedman test (10, 3)=10.06, p=0.02, Kendall's coefficient of concordance W=0.33). The first dog encountered, the brown dog, was an auditory-visual object, which jumped when the participant was coming close to it. The participant was, however, protected from the dog by a fence. The second dog encountered, the dark dog, (visual and audio exposure with movement toward the participant) generated the strongest reactions with several participants freezing or virtually fleeing when facing the growling dog. This time, the participant was not protected by a fence. The third dog encountered, the white dog, was a visual-only object, without any reaction to the participant. It provoked the smallest reactions. Then, the dark dog was encountered again; it was an auditory-visual object animated in reaction to the approach of the participant. However, this time, the dark dog was associated with success in the task, since it was standing close to the frog to be found. Indeed, fear reactions were overall milder than during the first encounter with this dark dog. The diminution of the reactions in front of this dog can also reflect a habituation already taking place. During debriefing, participants reported that the dark dog was more frightening than the white dog and brown dog, suggesting a strong impact of texture attributes corresponding to the fur of the dogs.
One participant reported three times a null fear level during the evaluation. He commented at debriefing that his immersion was prevented by the lack of realism of the visual environment, which contrasted with the high realism of audio. He stated that as a result, he perceived the scene as if he was watching something behind a window pane.
After having encountered the dark dog for the first time, the scene was covered with a thick fog and the participants were exposed to an auditory dog (looming and receding barking sound). After that, the participants would walk past the white dog, which provoked modest fear reactions. Unfortunately, we did not rate the behavioral reactions of the participants to the events preceding the encounter with the white dog. However, at debriefing, this particular moment has been described as quite stressful.
Finally, we evaluated the success of our application through the link between the presence score and the objective measure score (i.e., the score summarizing the participants' observable behaviors when encountering virtual dogs; see Table 3). Interestingly, these two scores were not linked. It means that, for example, although some participants showed resistance to emotional involvement in the application, they still reacted when they saw the virtual dogs. For example, participant M1 had a very low presence score (21) and reported a null anxiety when encountering the virtual dogs (SUD measures of 0). He also anticipated no fear to virtual dogs. However, he had some reactions during the scenario, particularly, in response to the dark dog, to which he showed a verbal and a behavioral reaction. This is a very promising result as it highlights the fact that it is possible to provoke some emotional reactions, even with participants who exhibit a resistance to the VR scenes.
Discussion and Conclusion
We have developed an application aiming at adequately combining visual and auditory experiences during immersion in a VE, so as to modulate attention and emotion of dog-fearful participants. It has been suggested that animal fears may be based on aversion to certain attributes. 15 We first explored the fear of dogs to screen the different features that are potentially frightening, and we evaluated different virtual dog models. We included the attributes that obtained the highest scores in the preliminary screening in our VE to maximize emotional responses. A scenario has been constructed with the selected dog model presented with three different textures and different levels of activity. The VE was also composed of 3D sound: an ambient sound accompanied the whole scene, barking and growling sounds were spatialized at precise points in the environment.
We were able to generate situations in which emotional reactions can be evoked in individuals who fear dogs. First, SUD scores were moderately high (overall mean of 33; see Table 2), demonstrating that participants felt anxious in the proximity of a virtual dog. Second, the state anxiety score measured after the immersion was significantly higher than the one measured before (see Table 2). Third, verbal and sometimes behavioral reactions were noted when participants were confronted to the dogs (see Table 3). Fear reactions were expressed differently according to the participant. They could be verbal reactions of surprise, discomfort and/or fear, and they could trigger postural reactions, and going away hastily from the dogs. Participants tended to avoid approaching dogs spontaneously and needed encouragement to approach them to complete the scenario's objectives. Fear reactions were higher when participants went closer to dogs, especially when the dog had an animated behavior associated with the distance of the participant. Furthermore, all participants, even those who had low presence scores and those who did not report any apprehension toward virtual dogs, reported that the virtuality of the dog could be easily forgotten and made them experience at least some unpleasant feeling when they met the dogs.
We could also identify the different fear-evoking features that can be manipulated in VR only. The visual fear-evoking perceptual properties that we included in our VE, for example, speed and suddenness of movement of the dogs were proven to be very efficient. The white dog that had no reaction in response to the approach of the participant evoked the lowest reaction (see Table 3). Interestingly, this dog was also the only visual-only dog; the three other dogs were audio-visual and all provoked at least verbal reactions. The importance of the audio features was also reported by the participants during the debriefing. They said that barking and growling were highly challenging features. They reported that dogs growling or barking were at least as frightening as visual dogs. Several participants spontaneously reported that auditory stimulations were extremely arousing and that they purposefully adapted their navigation to avoid walking with a growling or barking sound in their back.
This evaluation confirms the powerful role of auditory stimuli in the generation of fear reactions. 3D sound is therefore extremely promising to drive the progressive exposure of the participants. However, incorporating a real-time updated 3D sound to VR technologies addresses several practical issues. If there is a consensus on the fact that the presence is improved by 3D sound, little is known about how an auditory VE should be designed so that it does not interfere with the visual VE, 25 but enhances the sensorial realism of the simulation. In a previous experiment involving navigation within a visual and auditory VE, several patients with agoraphobia reported that the auditory and visual worlds could not fulfill a sense of realism when presented together. 26 This effect is due to the fact that it is still computationally easier to achieve a high resolution and realism in an auditory VE than in a visual VE. The sensation of coherence at the semantic level between the visual scene and the auditory scene is crucial to obtain a multisensory integration of the two sensory channels. 27 This semantic coherence should be carefully tuned so as to enhance the emotional experience of the participants and to avoid the pitfall of a dual perception mode in which the auditory and visual channels are perceived separately.
During this evaluation, we worked with a nonphobic sample, which demonstrated fear reactions to virtual dogs. Our results should generalize to other populations, including a number of populations who need to face dogs in their everyday activity (e.g., postmen) and would benefit from training. Few studies have examined the treatment of cynophobia, but one of them suggests that adults presenting with a marked fear and avoidance of dogs achieve significant benefit from brief exposure-based treatments. 28 These treatments might benefit from the use of VR, which would aim at desensitizing fear responses to specific perceptual characteristics of the animal, like the one we used in our evaluation.
Footnotes
Acknowledgments
This research was supported by the EU IST FP6 FET Open project CROSSMOD and by the ARC INRIA NIEVE. We thank Feryel Znaïdi, Aymeric Faye, David Doukhan, Nicolas Bonneel, and Khoa-Van N'Guyen for their help with the technical and practical aspects of the evaluations.
Author Disclosure Statement
The authors report no financial or other conflict of interests relevant to the subject of this article.
