Abstract
Introduction
Research on the effects of background speech in open-plan offices has shown negative effects of this task-irrelevant sound on both performance and self-reported wellbeing [1, 2]. In view of these findings, it is quite alarming to note that the open-plan office “solution” is frequently used [3]. When no actions are taken concerning these noise problems, decreased productivity and well-being can result in substantial costs for organizations and society [4].
According to Virjonen and colleagues [5], typical ways to combat distraction by background speech include the installation of greater absorption material in walls and ceilings and the use of masking techniques (i.e., to play back sound that makes more distracting sound less audible, thereby reducing the properties of the sound that, when perceived, give rise to disruption of task-performance). However, absorption panels in the ceiling are not sufficient to reduce distractions from speech in adjacent desks [6]. An explanation for this comes from the speech intelligibility account. Hongisto [7] states that if speech intelligibility (measured by speech transmission index; STI) is not reduced enough, then the sounds will retain their potency to disrupt performance. For example, if absorption panels are added, the consequence is that both the level of speech and background noise are reduced, and the signal-to-noise ratio remains unchanged. However, the reverberation time is reduced. As a consequence of this, STI can even increase by adding sound absorbing panels [8].
Further, the results regarding the effects of sound-masking systems on performance and subjectively reported annoyance are unclear and stem mainly from laboratory settings (e.g. [9–13]). Different results do not necessarily mean that they conflict with one another, rather they may demonstrate how poorly the sound masking technology is standardized, compared to other acoustic treatments, such as sound absorbing panels or walls. Many researchers use different levels of mask in terms of sound intensity. They may also choose different spectrums within the background sound to mask. Moreover, the content used within the mask can differ markedly between studies. The inconsistent effects of masking could perhaps also be partly attributed to the use of different tasks to evaluate the effects of masking on performance, as some tasks are not sensitive to speech distraction at baseline [14]. With this in mind it is relevant to discuss suitable masking noise candidates. Some noise sources are rated as unpleasant. For example, pink noise is commonly perceived as an undesirable sound [15] and hence may not be appropriate to use as a masker. Some insight on the preference of different sounds as a mask has been provided by Hongisto and colleagues [16]. They tested spectrally different pseudorandom noises and found that their participants were most satisfied with sounds having more emphasis on low frequencies and a slope of -7dB per octave increment. However, there is a need to develop and test sounds with higher frequencies which may mask speech sounds better and still be perceived as satisfactory. One such candidate is nature sound [9, 17]. It has, for example, been speculated that more natural sounds are more easily perceived as inherently belonging to the environment than pseudo random noises and are therefore more acceptable, although their sound spectrums may not be very smooth [16]. Given that nature sounds are rated as more acceptable than other masking sounds, the current study tests whether natural sounds work as an effective masker in terms of attenuating the potential distraction of cognitive performance by background sound and reducing the subjective task load ordinarily produced by that sound.
Another way in which open-plan office workers attempt to reduce the influence of distraction within the office environment is through wearing headphones. Merely wearing headphones (without playing music) may create an aura of privacy whereby co-workers may assume that the individual is busy and are less likely to interrupt them. Moreover, headphones reduce the loudness and intelligibility of background speech, thereby possibly reducing the subjective intrusiveness of the sound. Masking by headphones may also involve personal control over the sound, which may provide better satisfaction than, for example, global masking with ceiling loudspeakers, which is not under personal control of the individuals. This is an important issue as research shows that those who perceive control over a sound demonstrate higher satisfaction and greater performance [18]. Simply wearing headphones could be a cost-effective and simple means by which distraction could be reduced in noisy environments. As such, the current study explores whether merely wearing headphones is a sufficient way of reducing distraction to cognitive performance and subjective task load produced by task-irrelevant sound.
One of the most commonly employed empirical tools to test auditory distraction, within this field of research, is the serial short-term memory task. In this task, the participants are typically visually-presented with a sequence of items (digits or letters), one by one, and are required to recall them in the same order as they were presented. The to-be-recalled items are either presented in silence or against a background of sound. When some types of sounds are presented, recall is impaired. The magnitude ofdisruption appears to be insensitive to differences in the sound’s intensity - sound at the intensity of a whisper [48dB(A)] is just as distracting as sound presented at the intensity of a shout [76dB(A)] [19]. It should be noted, however, that if the instructions for the task allow free recall (in contrast to serial recall), the intensity of the sound appears to be more important [20]. Moreover, the magnitude by which serial recall is disrupted by background sound is not influenced by the similarity in meaning between the background speech and the to-be-recalled items [21]. It is only in the context of free recall that background speech (e.g., names of animals) similar in meaning to the to-be-recalled items (e.g., names of other animals) is more disruptive than semantically unrelated speech (e.g., names of tools [22, 23]). Instead, disruption to serial recall is a function of the acoustic variability of the background sound [24]. Acoustically variable sound (such as the sound sequence “k l m v r q c”) is more disruptive than acoustically invariable sound (“m m m m m m m”). This finding is called the changing-state effect [24]. Moreover, a deviant sound embedded into an otherwise repetitive sequence (e.g., “m m m K m m m”) also produces distraction. This is called the deviation effect [25].
In view of these findings, broadband noise presented simultaneously with the background speech, which reduces the acoustic variability and the distinctiveness of individual elements in the sound stream, should protect against the effects of background speech on serial short-term memory. This is because adding a mask reduces the perception of change between the successively presented sounds within the sound stream. The addition of the masking stimulus not only reduces the changing-state effect but also the deviation effect since for an item to be processed as deviant, successive changes within the auditory stream must be processed. The addition of broadband noise, music [15], spring water [9], and additionally played-back speech [10, 26–29] as a masker has beneficial effects to disruption of serial short-term memory, but even though masking sound protects performance, the acceptance ratings of the masking sounds are typically not high and people tend to prefer silent conditions [7]. This discrepancy between performance data and subjective evaluations of masking sounds has also been reported in other studies (e.g., [10, 31]. These findings underscore the necessity to consider both performance data and subjective ratings when evaluating the effectiveness and appropriateness of a certain sound as a masker in work environments. Whilst a masker comprising multiple voices can protect performance from distraction to a greater extent than a single voice mask, typically the ratings of workload are still higher with multiple-voice masking than in quiet [10]. In the current study, we investigate the effectiveness of a multiple-voice masker in comparison with other maskers, and whether these maskers attenuate distractions so that performance and perceived workload is comparable to a quiet condition.
The aim of this experimental study was to compare the effects of two masking sounds (nature sound and speech sound) and the effects of wearing headphones as an acoustic background attenuating device upon subjective workload and cognitive performance (as gauged by the serial short-term memory task). We selected five sound conditions for this comparison: a one-voice background speech condition (hypothesized to be the most disruptive), a condition with nature sound as a masker superimposed on the one-voice background speech, a condition with multiple-voices as a masker superimposed on the one-voice background speech, a condition wherein a one-voice was presented as background speech but was recorded in a way that stimulates the wearing of headphones, and a quiet control condition.
Method
Participants
The participants were 30 students (14 females) within the age range of 19–31 years (M = 24.03 years, SD = 2.98 years) from Uppsala university. All participants reported normal hearing and vision. The participants were recruited at different campuses via direct invitation and they were informed about voluntary participation. The participants received either a cinema ticket, or course credits, after the completion of the experiment.
Sound conditions
The five sound conditions are presented in Table 1. The first condition was a quiet control condition, wherein the task was performed without background speech. The background sound in the room was less than 25 dBA. The other four conditions were always presented with background speech, which consisted of an audio recording of one female voice. The recording of the voice (earlier made in an anechoic chamber) was binaurally recorded again in a room of dimensions 9.5×9.1×3.1 meters to comprise normal room reverberation and background noise to the voice. The room’s reverberation times (RT20) in octaves from 125 Hz to 8 kHz were 0.72, 0.72, 0.71, 0.71, 0.76, 0.71, and 0.61 s. The sound source (a speaker) was placed three meters in front of an artificial head (Head Acoustics HMS IV). For conditions 3–5 headphones (Sennheiser HD 202) were mounted on the artificial head, to create a simulation of wearing headphones. Wearing headphones attenuates the sound level mainly for frequencies above 1 kHz; measured insertion loss using pink noise were 0.4, – 1.0, 0.8, 8.6, 14.3, 21.9, and 23.1 dB at octave frequencies from 125 Hz to 8 kHz. The background voice level was reduced by 2.2 dBA with headphones (see Table 1). For condition 4-5 the binaural recordings of the background voice were mixed with masking sound. The masking setups were simulated by a computer program. The first masking sound condition consisted of nature sound masking with headphones, namely bird twitter in combination with rippling water (sound condition 4). The second masking sound condition consisted of 7-voices masking with headphones, which was created by adding seven female voices together (sound condition 5). The conditions in Table 1 are described by means of the Speech Transmission Index (STI) [32, 33], which measures the intelligibility of the background female voice between 1.00 (perfect intelligibility) and 0.00 (no intelligibility at all). The STI values were derived using the indirect method from measured signal-to-noise ratios (SNR) and reverberation times. The A-weighted sound pressure level of the voice was computed using the real speech level [33] that only considers the active parts of a speech signal. For the maskers the equivalent levels were used. Sound pressure levels in octave bands for the background speech (with and without headphones) and the two maskers are presented in Fig. 1. The 7-voices masker, used in sound condition 5, masks the background voice across the whole frequency region whereas the nature sound, used in sound condition 4, masks the background voice above 1 kHz. Speech intelligibility is reduced substantially by both maskers; STI decreases from 0.63 to 0.28 and 0.05 using the nature sound and the 7 voices as masker, respectively (see Table 1). To show the effects of masker modulation on the intelligibility of the background voice, short-time STI was also computed by taking the mean of the octave SNRs for 18 ms time segments [10]. The short-time STI values displayed increased speech intelligibility as when compared to the standardized STI; STI increased from 0.28 to 0.38 for the nature sound masking with headphones (sound condition 4) and 0.05 to 0.13 for the 7-voices masking with headphones (sound condition 5).
All the voice signals used (both as maskers and as background voice) consisted of audio samples from seven female audiobook narrators and were constructed as follows. First short segments (30–60 sec) from monophonic nonfiction audiobooks (MP3 48×kbps CBR 22.1 kHz 32-bit) were converted to WAV format (22.1 kHz 32-bit) and normalized in terms of A-weighted level. These segments were then, for each narrator, combined into 2 minute-long speech signals. The murmur of speech was created by merging the seven different narrator voices. When combined, the short segments were ordered differently for the background speech signals and the 7-voice masker. The stimuli were finally spliced into parts adapted to the length of each sequence of numbers that were presented in the serial recall task (the recall phases were completed in quiet). All simulated sound conditions were played back through headphones (Sennheiser HD 202) at a noise level around 55 dBA, except from the masking condition with seven voices which reached 63.1 dBA. This meant that all participants wore headphones during the entire experiment. Using this method, it was ensured that all participants heard the sounds from the same direction and distance.
Dependent measures
Serial short-term memory task
The participants were told to remember a string of eight sequentially-presented one-digit numbers drawn from the set 1 and 9 and then to recall them in the correct order. Each digit was presented for 350 milliseconds at the computer screen, with a blank interval of 400 milliseconds between each digit. Participants were directly told (i.e. without any retention interval) to write down the sequences in exactly the same order as they were presented. The numerical sequences were random with the constraint that no sequence began with a 1 and that no more than two digits were presented in canonical (e.g., 1, 2) or reverse-canonical (e.g., 3, 2) sequence. The same (pseudo)random sequences were used for all participants. If the participants could not remember a given digit, they were told to write an “x” for that digit in the answer box. Each block consisted of 10 sequences and only numbers written at their correct positions were scored as correct. The maximum score for each sound condition was eighty points.
Perceived work load
The participants filled the short version of the questionnaire NASA Task Load Index after they finished their work in each sound condition. The questionnaire consisted of six statements which the participants assessed on a scale from 0–100: mental demands, physical demands, time pressure, own performance, effort needed, and how much frustration they felt. The questionnaire was translated by the authors from English to Swedish, which was then reviewed by a person who is fluent in both English and Swedish. The items were grouped together (mean task load) and Cronbach’s alpha was acceptable (α= 0.85).
Procedure
Data collection took place in a sound-proof lab room at Uppsala University where the background noise (ventilation noise) was less than 25 dBA. The participants undertook the study on a laptop (HP Probook 350) and wore headphones (Sennheiser HD 202) during the entire experiment. When the participants arrived they received information according to the ethical guidelines about voluntary participation etc. Subsequently, the experimenter gave instructions about the task and that they should type in their answers at the numeric keypad. They were also given instructions to work as quickly and accurately as possible, and that they should ignore any sound presented via the headphones. After five practice trials the experiment-proper started. During each sound condition a test block with ten different trials was presented. One trial took about 7.5 seconds. In the conditions with sound, the sound was presented synchronously with the to-be-remembered sequence. No sound was presented during the recall phase of each trial. A dialog box was presented on the screen instructing the participants to write down their answers. No time-limit was set, so the participants could take the time they needed to answer. Between each test block, participants answered the NASA TLX questionnaire. To reduce the risk of order effects the sound conditions were counterbalanced with two Latin squares (where the order in the second Latin square was the opposite of the first) resulting in ten different presentation orders of the sound conditions. The whole procedure took about 30 minutes.
Statistical analyses
To test the hypotheses of this experiment we used repeated measures analyses of variance (RM-ANOVA) with sound (5) as within-subjects factor. As follow up tests we used Tukey t-test with Bonferroni corrected p-values for multiple tests.
Results
Serial recall performance
As shown in Fig. 2, the participants were less distracted by background speech when it was masked by nature sound through headphones (i.e. sound condition 4; see Table 1). This conclusion was supported by a repeated measures analysis of variance with sound condition as the independent variable and serial recall performance as dependent variable, F(4, 116) = 9.07, MSE = 60.0, p < 0.001, ηp2= 0.24. The largest difference between conditions was around 25% [i.e., between quiet and speech. Bonferroni corrected t-tests showed that the only condition that protected against distraction was the nature sound masking through headphones (see Table 2)]. Whilst there was no difference between the nature sound masking condition and the quiet condition, the participants performed better in quiet compared to all other sound conditions. Moreover, performance was better in the headphones with nature sound masking condition in comparison with the speech only condition (i.e. condition 2).
Perceived work load
The participants perceived lower work load in quiet (condition 1) compared to all other sound conditions, with a difference of 9–13 points between the best and the worst condition (Fig. 3). A repeated measures analysis of variance with the 5 sound conditions as within-subjects factor revealed a significant effect on perceived work load, F(4, 116) = 11.6, MSE = 66.3, p < 0.001, ηp2= 0.29. Table 3 reports Bonferroni corrected probability values for the difference between conditions. The participants perceived lower work load in the quiet condition compared to the other sound conditions. There was also a close to significant difference between the nature sound masking with headphones condition and the speech condition, and between the headphones only condition (i.e. condition 3) and the speech condition, suggesting that headphones and nature sound masking with headphones may have decreased workload to some degree in comparison with no protection against the background voice.
Discussion
Can headphones and masking sound attenuate distraction by background speech?
The main finding of the experiment reported here is masking background speech with nature sound through headphones (at STI 0.28) brings performance back to baseline (i.e., performance in quiet). Masking speech with 7-voices through headphones (STI 0.05), in turn, does not bring performance to baseline. Moreover, the studied headphones alone cannot protect against distraction, even though the device attenuates high frequencies of the sound. While headphones (alone) appear to offer some resistance to the effects of background speech on workload, it seems as headphones do not smoothen the troughs and peaks of the acoustic signal so much that the abrupt pitch and loudness changes become more temporally constant, which would prevent the automatic encoding of the order of changing-state sounds and, consequently, protect serial short-term memory [35]. From an acoustical perspective, this is not surprising as the headphones only reduced STI by 0.04 and the sound pressure level by 2.2 dB(A).
Practical implications
Studies dealing with the effects of masking are few and the results are unclear. One consistency, however, is the finding that effects of noise on performance, and on subjective ratings do not always yield the same picture. A typical finding is that negative effects of background noise are found in subjective ratings (e.g., workload and distress), while performance remains largely unharmed by the same sound [9, 13]. The results of the current study also suggest that there is a discrepancy between subjective ratings and performance, but here the difference was in the opposite direction. Headphones alone did not protect performance from disruption, but the participants tended to report a lower workload in this condition. Therefore, although within the open office wearing headphones may serve to protect one’s private sphere by curtailing physical distractions or disturbances from colleagues, there is very little evidence that they will protect performance from distraction by background speech. Moreover, such protection of the private sphere may carry a cost in reduced responsiveness to co-workers calls for attention. For example, when an office colleague calls another’s name, or dials the telephone extension for their work-station.
A larger consistency between performance and subjective ratings was found when headphones were used together with nature sound masking, which both protected against distraction and appeared to release from workload. Taken together, the current study points toward the possibility that nature sound is an appropriate masker that is able to both protect performance and shield against workload. The findings are partly consistent with a previous study on masking effects which also found that nature sound (water waves in that case) protects performance [10], but in contrast to the present study, multiple voices were the most efficient masker in that study. There are several reasons for this discrepancy. For example, only 7 voices were used in the present study, while the previous study used a larger number (i.e. a recording consisting of nine people talking simultaneously which was multiplied five times to get a more diffuse sound during the experiment). In the current study, we tested nature sounds consisting of bird twitter together with rippling water as a speech masker. Future research should aim to distinguish better between different nature sounds and test which, and why some, sounds are better than others as maskers in an office setting. Furthermore, it is important to test whether it is really sufficient to add nature sound to an office setting to protect performance.
On this note, a number of limitations of the current study are worth mentioning. We only tested the acute noise effects, and longitudinal studies may not show the same results during exposure to the sounds over an eight hour workday. Moreover, the setting in which the present participants performed the experiment was reduced from other distracting stimuli, which may be naturally occurring in an office environment, such as distracting sound from the printer and ringtones, as well as visual distractions from passers-by. It is therefore uncertain how the tested sound maskers in the current study would really work out in an environment where other distractions are constantly occurring above speech sounds. Different offices will also have varying acoustic conditions which make it hard to generalize which maskers will improve speech privacy in the field. Finally, according to Haapakangas et al. [9], the suggested masking levels for open-plan offices range between 40 and 45 dB (LAeq). In the current study, relatively high masking levels were used, up to 63.1 dB (A), which should not be directly applied to an office setting where masking by loudspeakers is used. However, people may listen through headphones to nature sounds and radios at these levels [36].
While this study has investigated whether the simple act of wearing headphones (Sennheiser HD 202) can attenuate distraction, other types of headphones may be more effective at attenuating distraction. For example, one might expect noise-cancelling headphones to be useful in reducing distraction produced by open-plan offices. However, noise-cancelling headphones tend only to be efficient at reducing predictable steady-state low frequency noises such as the hum of an airplane engine [37] and are largely ineffective against acoustically complex, unpredictable mid- and high-frequency sounds, such as background conversational speech. In the context of the current study, to-be-ignored background speech was presented via headphones while the participants undertook a visually-based task. Since the auditory material was entirely to-be-ignored within the context of the current study, wearing noise-cancelling headphones would not be beneficial. In fact, a case could be made for quite the opposite: The noise-cancelling headphones could reduce the potentially masking effects of steady-state external noise on the signal presented via headphones thereby increasing the acoustic complexity and intelligibility of the signal (if it comprises speech in a language an individual understands). This could reasonably be expected to increase the magnitude of the changing-state effect in the context of tasks that require short-term memory for serial order [24] and the disruption by the meaning of background speech on tasks that require processing of meaning such as word processed writing [38]. Future research need to test a wider battery of headphones to rule out whether they can reduce distractions from speech.
Conclusions
The current study suggests that the headphones tested in the current study alone, does not protect against the effects of background speech on (short-term memory) performance, even though they may make people feel somewhat released from workload. One interpretation of this view is that headphones can give the person who wears them the wrong impression that they protect performance from distraction by environmental sounds when in fact, they do not. Nature sound played back through headphones as a sound mask, however, seems to offer some promise as a potential technique to reduce distraction in the office setting, both in the context of performance measures and subjective workload.
Conflict of interest
The authors have no conflict of interest to report.
Footnotes
Acknowledgments
This study was the basis for Patrik Björkeholm’s Master thesis in Psychology. The research was funded by the University of Gävle and Uppsala University. We thank the anonymous reviewers for their constructive comments.
