Abstract
Objective:
In this study, the impact of including a bone conduction transducer in a three-channel spatialized communication system was investigated.
Background:
Several military and security forces situations require concurrent listening to three or more radio channels. In such radio systems, spatial separation between three concurrent radio channels can be achieved by delivering separate signals to the left and right earphone independently and both earphones simultaneously. This method appears to be effective; however, the use of bone conduction as one channel may provide both operational and performance benefits.
Method:
Three three-channel communication systems were used to collect speech intelligibility data from 18 listeners (System 1, three loudspeakers; System 2, stereo headphones; System 3, stereo headphones and a bone conduction vibrator). Each channel presented signals perceived to originate from separate locations. Volunteers listened to three sets of competing sentences and identified a number, color, and object spoken in the target sentence. Each listener participated in three trials (one per system). Each trial consisted of 48 competing sentence sets.
Results:
Systems 2 and 3 were more intelligible than System 1. Systems 2 and 3 were overall equally intelligible; however, the intelligibility of all three channels was significantly more balanced in System 3.
Conclusion:
Replacing an air conduction transducer with a bone conduction transducer in a multichannel audio device can provide a more effective and balanced simultaneous monitoring auditory environment.
Application:
These results have important design and implementation implications for spatial auditory communication equipment.
Introduction
Audio communication interfaces traditionally involve the use of air microphones (e.g., handheld telephones or boom microphones) and earphones or loudspeakers for sound reception and delivery, respectively. A typical communication headset connected to a personal radio, portable computer, or cellular phone incorporates a noise-canceling boom microphone and a monophonic set of earphones. This type of headset works very well with single-channel radios, which still dominate current personal communication systems. Unfortunately, when such a headset is used with a multichannel radio and two or more messages arrive at the listener simultaneously, they are heard as originating at the same point in the head and greatly mask each other. As a result, very little, if any, of the delivered information can be interpreted by the listener.
However, if the headset is equipped with stereophonic earphones, it can be used quite effectively to monitor two-channel radio traffic because it directs one channel to the left earphone and the other channel to the right earphone of the listener. The two channels are heard as originating separately in the left and right ear of the listener and, because of their spatial separation, the messages received mask each other much less than if they were heard together in the center of the head (see Bronkhorst & Plomp, 1988, 1992; Brungart & Simpson, 2002; Drullman & Bronkhorst, 2000; Duquesnoy, 1983; Festen & Plomp, 1990; Freyman, Helfer, McCall, & Clifton, 1999; Hawley, Litovsky, & Colburn, 1999; Peissig & Kollmeier, 1997; Plomp, 1976; Plomp & Mimpen, 1979).
In military settings, there is a pressing need to simultaneously monitor more than two communication channels. For example, Vause, Abouchacra, Letowski, and Resta (2001) described a communication network used in a command-and-control vehicle whereby the commander received messages through six communication channels and listened to them through a stereophonic headset. In such a network, when the commander frequently receives three or more simultaneous messages, even a stereophonic headset can become ineffective, and commanders frequently resort to turning off some of the channels temporarily to attend to potentially more critical messages (Vause et al., 2001).
Possible solutions to monitoring multichannel communications include using a built-in prioritization system that blocks secondary channels when primary channels are active, a spatial sound field display incorporating multiple loudspeakers surrounding the listener, or a headset-based processing system capable of spatializing and externalizing phantom sound sources representing individual communication channels. These headset-based spatial audio systems are called three-dimensional (3-D) audio headsets, and they process incoming multichannel signals by convoluting them with the head-related transfer function (HRTF) of the listener with unoccluded ears (Gripper, McBride, Osafo-Yeboah, & Jiang, 2007; Wightman, 1990). As a result, individual talkers sound as though they are located outside of the head and in different locations. Spatial separation between virtual sound sources decreases the amount of both (a) physical (energetic) masking of one signal by another by improving the signal-to-noise ratio in one ear and spectral differences between ears and (b) attentional (informational) masking, thus allowing the listener to focus on a specific direction from which the sound stream of interest arrives, given the different interaural intensity and time differences for each stream (Brungart & Simpson, 2002; Edmonds & Culling, 2006; Shinn-Cunningham, 2002; Zurek, 1993).
Several researchers have demonstrated the performance enhancements that result when vocal signals are spatially separated. For instance, experiments conducted by Abouchacra, Breitenbach, Mermagen, and Letowski (2001); Bolia, Nelson, and Morley (2001); Drullman and Bronkhorst (2000); and MacDonald, Balakrishnan, Orosz, and Karplus (2002) all show that when multiple speakers’ voices are presented simultaneously, listeners are better able to attend to the information of a specific speaker when the voices are separated in space. Because they tend to improve signal reception by the listener, 3-D audio systems have become an attractive means of presenting multiple steams of auditory signals (Begault & Wenzel, 1993; Doll & Hanna, 1995; King & Oldfield, 1997; Ricard & Meirs, 1994). However, even though 3-D audio systems have become more popular over time, they are still relatively rare across various military platforms and are not used by dismounted soldiers and field-operating security forces because of the complexity of the technology required to produce 3-D signals.
In the case of three-channel communication systems, an effective spatialization technique that is much simpler than 3-D audio processing, signals from one channel are delivered to both earphones simultaneously, and signals from the two other channels are delivered to the left and right earphone independently. In this case, one channel is heard as originating in the middle of the head and the two other channels as originating from the left and right side of the listener, respectively. This technique was effectively used by Vause et al. (2001), who reported an 85% preference rate for the three-channel communication interface for a group of 20 military commanders conducting simulated operations. However, such an interface is difficult to implement for dismounted or security operations, when sometimes two radios are concurrently used. Connecting a single headset to two radios would require a special attachment and would be very impractical. An attractive alternative seems to be the use of a bone conduction (BC) transducer (vibrator) to be permanently located on the operator’s head and connected to one of the radios. When the other radio needs to be used, it would transmit one or two additional channels through a headset.
In addition to the ease of implementation in three-channel communication systems as mentioned already, BC transmission has a number of other attractive properties as a communication channel. For instance, it can be used regardless of the presence of earphones or hearing protectors and can be used inconspicuously when the vibrator is hidden under the hair, helmet, or hat of the user (Henry & Letowski, 2007; Walker, Stanley, Iyer, Simpson, & Brungart, 2005). Therefore, it can be implemented as the primary communication channel, and the two other channels can be added when needed by the additional two-channel headset. This type of configuration also assures great flexibility of the three-channel communication system since it can be easily configured on the spot as a one-, two-, or three-channel system depending on the operational needs of the user. However, there remains a need to study the effectiveness of BC vibrators in multichannel communication headsets. In response to this need, in the present article, we describe an experiment designed to investigate the use of a BC vibrator as one of the speech channels in a three-channel communication system and compare it with a three-channel loudspeaker and three-channel earphone configuration. We predict that the configuration including the BC vibrator will prove to be equally or more effective than the other two systems.
Method
Participants
A total of 18 normal-hearing listeners participated in the study. Normal hearing was defined as pure-tone hearing thresholds at or better than 20 dB HL at audiometric octave frequencies from 250 Hz through 8000 Hz. In addition, the difference between hearing thresholds for the left and right ear was no more than 5 dB at each test frequency. The group consisted of 10 female and 8 male listeners between the ages of 18 and 30 years. The participants had no previous experience in multichannel speech communication studies.
Instrumentation
The study was conducted in a large Industrial Acoustic Company 143 M audiometric booth for sound field testing. Instrumentation for the study included (a) a portable IBM PC with a four-channel audio card and multichannel audio software, (b) proprietary software incorporating the Synchronized Sentence Set (S3) test for signal delivery and data collection, (c) a pair of TDH-39 testing earphones, (d) an Oiido BC vibrator (Model SD-02), (e) an array of three loudspeakers with spatial separation of sound sources, (f) a four-channel power amplifier, and (g) a KEMAR (Knowles Electronic Manikin for Acoustic Research) manikin and calibration equipment needed to measure sound pressure levels exiting to the ear of the listener.
Selection of the TDH-39 earphones was based on their bandwidth, which was similar to the Oiido SD-02 BC vibrator. This characteristic was important for ensuring effective equalization of both transducers. The frequency responses of both transducers were equalized with the procedure developed by MacDonald and Tran (2007). Prior to the experiment, the levels of speech signals delivered by the three loudspeaker channels, left and right earphones, and BC vibrator were set to be equally loud. In this step of the systems’ calibration, three people not participating in the study served as the listeners.
Stimulus Materials
The S3 test (Abouchacra, 2000; Abouchacra et al., 2001; Abouchacra, Letowski, Besing, & Koehnke, 2009) is a four-channel test that was developed at the U.S. Army Research Laboratory–Human Research and Engineering Directorate (ARL-HRED) for measuring speech intelligibility (SI) in multichannel communication systems. The test consists of 2,034 sentences (10 syllables each) constructed from 104 token phrases and recorded by four male talkers. All sentences are structured in the following format: “[Name], write the number [number] on the [color] [object].” Table 1 shows all options for each item of the sentences. The recordings were made in carefully controlled conditions, such that during presentation, the corresponding elements of concurrent sentences occur simultaneously. The S3 companion software enables presentation of up to 4 synchronized sentences (one through each channel) with 1 sentence designated as the target (T sentence) and up to 3 sentences designated as competing (C sentence). In the current study, a three-channel variant of the S3 test was used.
Words for Synchronized Sentence Set Sentences
Communication Systems
Performance data were collected for three different three-channel communication systems. The systems were configured as follows:

Bone conduction vibrator placement. The condyle location has been determined to be one of the most receptive bone conduction vibrator locations (McBride, Letowski, & Tran, 2008).
Procedures
Testing environment
All testing took place inside a large acoustically treated sound booth used for testing auditory perception in a sound field. Background noise level in the room was below 40 dBA during all data collection, as indicated by sound pressure measurements taken while the listener was absent by using a KEMAR manikin and its accompanying calibration equipment. Inside the booth was an array of three spatially separated loudspeakers. The first and second speakers were placed to the left and right side of the listener (−90° and +90°, respectively), and the third speaker was placed in front of the listener. Each loudspeaker was set to present the S3 sentences at an average level of 65 dB SPL measured at the listener’s location. Depending on the system tested, listeners also wore testing earphones and a BC vibrator placed on the left condyle. Both the earphones and BC vibrator were calibrated prior to data collection to deliver equally loud speech signals as those presented through the individual loudspeakers. All of the transducers used in this study were verified to operate in phase. That is, in the System 3 configuration, all three channels were equally loud; however, in the System 2 configuration, the common channel was actually louder than the left- and right-side channels.
The computer monitor and keyboard were placed inside the booth for use during data entry; however, the tower containing the central processing unit (CPU) was located outside the booth to reduce background noise. A graphical user interface (GUI) with dropdown lists was used to fill in the words of the S3 sentences. A screenshot of the GUI is shown in Figure 2.

Synchronized Sentence Set participant response screen.
Participant’s tasks
During the study, three different S3 sentences spoken by different male talkers were presented simultaneously through the three channels of the communication system. One of the sentences always began with the call sign Troy and was designated as the T sentence. For each communication system, participants were instructed to attend to the sentence containing the target name Troy and to record its contents (i.e., number, color, and object) regardless of the channel through which the sentence was presented. The two other competing sentences were to be treated as maskers and neglected. The responses were recorded with use of the GUI dropdown list. Listeners were instructed to fill in as many of the blanks as they could distinguish for the sentence beginning with Troy only. Each participant listened to the S3 materials with all three communication systems in a counterbalanced order to reduce the impact of learning effects. A block of 48 presentations constituted a trial for each communication system. Each channel randomly presented 16 T sentences for each of the three-channel systems.
Prior to the experimental session, each listener was familiarized with the S3 materials and the listening task using the three loudspeaker setup. The familiarization session ended when the listener felt comfortable with responding to the task. Also, when the listener used System 2 or System 3 for the first time, a short practice trial was provided to familiarize the listener with in-the-head virtual sound sources. The task took each participant approximately 1.5 hr to complete. No performance differences were noted for any individual participant during the course of the experiment.
Results
The mean percentage of correct responses and corresponding standard deviations for each of the three systems are shown in Table 2. Figure 3 graphically illustrates the mean percentage of sentences entirely correct for each system along with the corresponding standard error bars, and Figure 4 depicts the breakdown of the sentence data for all three systems. Based on the data, it appears that System 3 is the most effective configuration, followed by System 2 and then System 1. The same pattern is apparent for each component of the sentence identified (i.e., number, color, object) as for the entire sentence.
Mean Data for the Three Multichannel Communication Systems (in percentages)
Note. Standard deviations shown in parentheses.

Mean percentage of the sentences entirely correct for the three-channel systems overall with standard error bars.

Bar graphs of mean system performance data per channel with standard error bars.
An ANOVA was performed on the data for the entire sentence for the three systems to determine whether the apparent differences between the systems were statistically significant. Prior to the statistical tests, all percentage scores were transformed into rau units (Studebaker, 1985) to eliminate the potential of ceiling effects characteristic of the percentage scale. From the ANOVA results based on an alpha of .05, there is enough evidence to conclude that the performance data for these three systems differ significantly, F(2, 51) = 5.40, p < .01. A Tukey HSD test was used to compare the means for each system to determine where the differences were observed. The post hoc analysis showed that listeners’ performance with System 1 was significantly lower than for either System 2 (p = .03) or System 3 (p = .01). The difference between System 2 and 3 was not statistically significant (p = .90).
Also of interest was the performance per channel for each of the three systems, which is shown in Table 2. After converting the scores for the entire sentence data into rau units, we performed ANOVAs to determine whether there were significant differences between the channels of each system. These tests indicated no difference between the three individual channels for Systems 1 and 3, F(2, 51) = 0.61, p = .55, and F(2, 51) = 0.62, p = .54, respectively. However, there were significant differences between the three channels in System 2, F(2, 51) = 18.92, p < .01. The post hoc Tukey HSD test revealed that the scores for Channel 1 (left ear) and Channel 2 (right ear) were significantly higher (p < .01 in both cases) than the score for Channel 3 (both ears). The difference between the scores for the left and right ear was not significant (p = .08).
Discussion
The objective of this study was to investigate the use of BC as a channel in multichannel communication systems. The results showed that there are differences in the performance of the communication systems evaluated. Notably, listeners performed significantly worse when using the system that represented a natural listening environment (System 1) than with either of the other two multichannel communication systems. This is most likely because of the fact that with System 1, all three signals are transmitted to both ears, which results in their masking one another; however, in the other two systems, some of the signals arrive only to one ear, which reduces the amount of energetic masking, thus making them easier to segregate and understand. On the basis of the channel analysis for System 1, the SI of each of the three channels was similar to one another; however, the intelligibility of each channel and the system overall was poor and unacceptable per military communication standards (MIL-STD-1472G).
The channel analysis for System 2 provided evidence that the dual-ear channel was less effective than both of the single-ear channels, such that messages sent to the dual-ear channel appeared to be more difficult to isolate. The difference in the effectiveness of signals transmitted through the dual-ear channel versus those for the single-ear channels is the result of two opposing effects. First, the signal presented through the dual-ear channel is louder than each of the signals presented to the single-ear channels because of binaural loudness summation. At moderate suprathreshold levels, binaural loudness summation results in a loudness increase equivalent to about 6 dB greater than the monaural signal (e.g., Epstein & Florentine, 2009; Gigerenzer & Strube, 1983; Irwin, 1965; Moore & Glasberg, 2007; Zwicker & Zwicker, 1991). The increase in the overall loudness of the binaural signals results in approximately a 5% increase in binaural SI; however, this value depends to some degree on the listening conditions and base monaural intelligibility (Davis, Haggard, & Bell, 1990; Harris, 1965; Kaplan & Pickett, 1982).
Second, the resulting increase in audibility and SI of the binaural signal is reduced by an increase in energetic masking caused by the synergetic masking of two additional signals. The lower SI scores for the dual-ear channel shown in Table 2 indicate that the increase of masking clearly negated the advantage caused by binaural summation of loudness, thus making the intelligibility of the dual-ear channel significantly poorer than the intelligibility of the single-ear channels. This lack of effectiveness of the dual-ear channel may be even more detrimental in real operational conditions, when potential equipment shortcomings are magnified because the attentional resources of the users are limited. This means that in some operational conditions, this type of three-channel system may be no more effective than a two-channel system.
The difference in SI between the left and right ear channels in System 2 is puzzling and hard to explain. However, the difference could potentially be caused by similarity between the three voices used in the study and the random assignment of C signals to the three channels. Three different male voices were used in the study, and one of them was always assigned to T sentences. For each presentation, the voice speaking the T sentence was randomly assigned to one channel and the two masking voices were randomly assigned to the two other channels. If one of the two masking voices was more similar in quality to the T sentence voice and appeared more often in one of the channels than in another, theoretically, this similarity could cause some imbalance in listeners’ responses for the left and right channels (intelligibility for the common channel would not be affected by this behavior). If both masking voices were assigned equally to each of the channels, the left and right ear channels’ SI would not be affected by voice similarity. It is possible that the randomized assignment of the masking voices to the channels of System 2 resulted in a more asymmetric voice distribution than in the case of System 3 or that the voices did not differ sufficiently in their quality, which in turn affected the data collected.
The three-channel communication system with a BC channel (System 3) performed equally as well as the three-channel system with the headset only (System 2). However, intelligibility of the BC channel in System 3 was significantly better than that of the dual-ear channel in System 2, and the intelligibilities of all three channels of System 3 were very similar. This finding indicates that the use of a BC interface as one of the channels in multichannel transmission is promising, especially in situations in which the ears are covered by the headset or hearing protectors. Such conditions include noisy environments in which ear coverage limits the masking effects of surrounding noise. However, occlusion of the ears creates small, poorly tuned resonating chambers of the external ear canals, making the BC signal louder in comparison to listening to BC signals with unconcluded ears. The amount of gain is frequency dependent and can be as large as 20 dB at 250 Hz. This gain would need to be taken into consideration if signals presented through an air conduction (AC) headset are equally as important as those transmitted via BC, since the increase in loudness of the BC signals could effectively mask the AC signals.
One additional advantage of a BC channel is its use in situations in which soldiers or security personnel need to simultaneously operate two radios (e.g., short-range and long-range radios). A BC vibrator can be connected to one of the radios, and the other radio can be handheld at the ear or interfaced with a headset. This use allows seamless concurrent listening to two radios without a need to switch back and forth between the radios.
BC-transmitted signals can also prove useful for airmen and mounted soldiers using encapsulated or similar helmets that incorporate the communication headset and additional or built-in earmuffs or earplugs. This combination can be considered a form of double hearing protection. Simpson, Bolia, McKinley, and Brungart (2002, 2005) reported that participants in their study were unable to use auditory information in localizing audiovisual targets when using double hearing protection in high-noise environments. This type of localization situation may occur when spatially distributed alert and warning audio signals are presented in the cockpit of an aircraft. The authors hypothesized that this situation can be caused by masking of the target audio signal by direct leakage of the signal through the bone structure of the head, which cannot be localized. This problem can be resolved by delivering HRTF-processed audio signals to two BC vibrators (left and right) connected to the alert or warning system of the aircraft or vehicle. Researchers have demonstrated that such strong BC signals can be localized (MacDonald et al., 2006) and can overcome sound field–generated masking signal leaking through the bones of the head.
Conclusions
Within the constraints of this research, the data obtained in this study support the value of using a BC interface for one of the channels in multichannel communication systems, even in the case of concurrent use of two one-channel radios. The three-channel AC-BC reproduction proved to be more intelligible than the AC loudspeaker system and more balanced than the AC earphone system. In addition, the SI scores obtained for the AC-BC system meet the requirements of military communication standards. In summary, the AC-BC system appears to be the most effective of the three-channel communication systems tested in this study overall.
It is also important to stress that the results obtained indicate that for a multichannel communication system, bone-conducted channels can perform equally as well as air-conducted channels. It should also be noted that the differences in intelligibility for BC and AC communication systems were not realized as with previous single-channel studies (Gripper et al., 2007; Osafo-Yeboah, Jiang, Gripper, & Lyons, 2006).
The findings support the potential for a multichannel communication system that combines both air and bone conduction. Such a system would use the intelligibility advantages of AC interfaces and the situation awareness advantages of BC interfaces depending on the operational conditions. When compared with AC, BC is equally as effective for delivery of speech signals in multichannel communication systems; thus selection of audio equipment should be based on the task and environment. Further research is needed to determine the limitations of BC communication in more realistic and adverse listening conditions.
Key Points
Three three-channel communication systems were used to deliver spatially separated competing sentences to listeners to assess the impact of replacing a traditional air conduction channel with a bone conduction (BC) channel.
The Synchronized Sentence Set (S3) was used to present 48 sets of competing sentences to the listeners, who were tasked with identifying the components of the sentence that began with the name Troy.
Speech intelligibility in the sound field loudspeaker condition (System 1) was the poorest and was lower than the acceptable levels documented in military communication systems standards (MIL-STD-1472G).
Intelligibility of speech delivered through System 3 (with BC vibrator) was slightly but not significantly better than that of System 2 (without BC vibrator).
The three channels of System 3 provided significantly less variation in speech intelligibility than did the channels of System 2. This finding combined with better operational utility of System 3 for field operations makes the BC channel well suited for multichannel speech communication.
Footnotes
Misty Blue is an assistant professor in the Department of Biomedical, Industrial, and Human Factors Engineering at Wright State University in Dayton, Ohio. She received her PhD in industrial and systems engineering with a concentration in human machine systems engineering from North Carolina Agricultural and Technical State University in 2006.
Maranda McBride is an associate professor of management in the School of Business and Economics at North Carolina Agricultural and Technical State University in Greensboro. She received her PhD in industrial and systems engineering with a concentration in human machine systems engineering from North Carolina Agricultural and Technical State University in 2003.
Rachel Weatherless is a member of the Perceptual Sciences Branch at the Army Research Laboratory, Human Research and Engineering Directorate. She received an MS in computer science from Towson University in 2005.
Tomasz Letowski is a senior research scientist at the Army Research Laboratory, Human Research and Engineering Directorate. He received a PhD in acoustics and telecommunications from Wroclaw Technical University in 1973 and a DSc degree in technical sciences from Warsaw Technical University in 1986.
