Abstract
Listening to speech in a noisy background is difficult for everyone. While such listening has historically been considered mainly in the context of auditory processing, the role of cognition has attracted considerable interest in recent years. This has been particularly true in the context of life-span research and the comparison of younger and older listeners. This article will discuss three factors that are important to consider when investigating the nature of cognitive involvement in speech-in-noise (SiN) perception: (1) the listening situation, (2) listener variables, and (3) the role of hearing aids. I argue that a similar level of intelligibility can be achieved with the support of very different skills, or listening strategies, depending on the listening situation and listener. Age as a variable is particularly instructive for this type of research as it is accompanied by auditory as well as cognitive changes. As age-related changes are not uniform, using aging as a tool for the investigation can increase the opportunity to see individual differences in contributing processes and resulting compensation strategies. In addition to highlighting different interaction effects between hearing and cognition, I will argue that our conceptual understanding of the underlying processes can only be furthered if the selection of cognitive tests and experimental procedures in SiN studies follows accepted cognitive models, so that understanding can advance on a conceptual level without reliance on a particular test. Ultimately, a better understanding of the various listening strategies available to listeners, and the conditions under which they are used, may have theoretical as well as practical implications. Theoretically, it may help us better understand phenomena such as listening effort. Practically, it may guide us toward more effective diagnosis and intervention for listening difficulties in everyday life.
Introduction
Listening to speech, particularly under adverse conditions, requires both auditory processing and cognitive engagement. The role of cognition specifically has been of growing interest to speech and hearing scientists as well as clinicians, as evidenced by a number of recent reviews (e.g., Dryden et al., 2017; Füllgrabe & Rosen, 2016; Humes et al., 2012; Mattys et al., 2012). Besides reviewing and meta-analyzing recent experimental evidence on sensory–cognitive interaction and integration, a number of frameworks have been suggested to integrate the available evidence (e.g., Pichora-Fuller et al., 2016; Rönnberg et al., 2019; Schneider et al., 2010; Wingfield & Tun, 2007). The most prominent among them is the Ease of Language Understanding (ELU) model (Rönnberg et al., 2013, 2019), although other models exist (Schneider & Pichora-Fuller, 2000; Schneider et al., 2010). The ELU primarily focuses on the roles of working memory (WM) and executive function for speech understanding under adverse listening conditions. A main tenet of the model is that when speech quality is good, and a clear match between input and phonological representation in memory can be made, lexical access is rapid and resource-free and speech perception is automatic with no requirement for top-down cognitive knowledge. However, when the input quality is poor, either due to internal (e.g., hearing loss) or external (e.g., background noise) factors, the match is not fast and resource-free and perception is not automatic. Instead, the degraded auditory information has to be supported by linguistic or other knowledge, thus making the listening process conscious and effortful. Those interested in a more detailed description, analysis and critique of the ELU model are directed to Rönnberg and colleagues original papers (Rönnberg et al., 2013, 2019) as well as Wingfield and colleagues’ excellent discussion of the model (Wingfield et al., 2015). Of particular importance for the current article is the fact that Wingfield and colleagues discuss many of the cognitive concepts employed in the ELU model within the wider field of cognitive psychology. They suggest ways in which the ELU model can be productively connected to cognitive and linguistic models to further elucidate its details.
One aspect of the listening experience that receives comparatively little attention in any of the models but might be instructive in understanding the variable relationship of cognition to speech-in-noise (SiN) perception is the nature of the listening situation. Investigating this relationship systematically is difficult because correlations between speech perception and cognition vary not only depending on the particular listening test and cognitive test but also depending on each test’s (often unknown) retest reliability. One way of dealing with some of the variability in a study is to limit the variation in listening situation to only one dimension, for example, to only change the target speech (e.g., Knight & Heinrich, 2017) or the background noise (e.g., Zekveld et al., 2013). Moreover, to increase the likelihood of seeing a meaningful change in correlations between two listening situations and cognitive function, changes in listening situation should be substantial rather than gradual. A disadvantage to both of these experimental strategies, that is, extreme groups and reduced dimensionality, is that they make it more difficult to extend experimental results to real-life listening situations. Dryden and colleagues (2017) have suggested a multidimensional and finer-grained approach and investigated how previously published associations with cognition fit within it. I will discuss their results in the first section of this article. Following this, I will discuss how the differential contribution of cognition to SiN listening may depend on listener characteristics, with a special emphasis on age and hearing loss.
I aim to show that listening strategies can differ between different listening situations that are all classed, within their models, as SiN perception. Moreover, I will show that listening strategies can differ for identical listening situations depending on the characteristics of the listener. By discussing some of the seemingly contradictory evidence, I will argue that to improve our understanding on a conceptual level, we will need to consider concepts developed within cognitive theories more closely. Finally, I aim to add to the discussion about ways in which this knowledge might be used to advance clinical assessment and rehabilitation.
The Listening Situation
To understand what types of cognitive skills support intelligibility in a particular listening situation, it is important to consider what types of stimuli are presented and in what types of listening environments. Stimulus characteristics discussed here include the nature of target and background sounds, their relationship to each other, and the overall level of performance accuracy they generate.
An example of the differential involvement of cognition in listening was provided by a series of studies conducted by Heinrich and colleagues. In all of these studies, the focus was on the nature of the target signal and how it affects cognitive involvement. Both studies showed that differences in cognitive performance only consistently predicted SiN perception when the task involved sentence perception but not when it involved the perception of phonemes (Heinrich et al., 2015, 2016). When the task was perception of single words, the power of the predictive effect of cognition depended on the characteristics of the target words. Only the perception of words from a potentially unlimited set was predicted by individual differences in WM. Conversely, when the words came from a limited set of nine digits, individual differences in cognition did not help to predict SiN performance. For sentence perception, a range of cognitive abilities including attention, inhibitory control, and WM predicted performance, whereby the exact nature of the predictive relationship depended on whether the task comprised simple sentences in modulated noise or sentences masked by a single sentence.
Besides the nature of the target signal, the nature of the background sound also represents a stimulus dimension that determines the way in which cognition is engaged in listening. Most experimental results point to a stronger engagement of cognition when more complex maskers are present, but exactly how masker characteristics need to be constructed to engage particular cognitive processes is far from clear. For instance, Nüsse et al. (2018) used a number of fluctuating maskers of various types and combinations. Despite all maskers being acoustically complex, however, only the intelligible single speaker maskers in cafeteria noise showed a predictive relationship with cognitive abilities (lexical abilities). Similar results are reported by Zekveld et al. (2013). They investigated the association between cognitive function and SiN perception for single talker, fluctuating and stationary noise maskers, and found correlations with cognition (WM) only in the single-talker masker condition. Hence, experimental results suggest the following: first, the role of cognition for SiN perception differs depending on the type of background sound, with more complex sound probably engaging more cognitive processes; second, the particular cognitive process(es) engaged in listening depends on the specific characteristics of the background sound.
One more characteristic of the listening signal needs to be considered, namely, the relationship between the two sounds. A number of studies have shown that the relative level of the two contributing signals, typically expressed as signal-to-noise ratio (SNR) is also important in determining which cognitive processes are engaged to achieve successful SiN perception. Lower SNRs tend to lead to lower levels of overall SiN performance. This is important to consider because overall levels of performance influence if and how cognition is engaged in perception, although the exact relationship between the two is far from clear. Some studies show a higher correlation between cognition and SiN perception at higher levels of overall intelligibility (80% vs. 50%) (Larsby et al., 2011) while other studies show more complex patterns (Koelewijn et al., 2007).
Finally, to provide a further illustration of the complex nature of the relationship between listening situation and cognitive engagement, consider the study by Lunner and Sundewall-Thorén (2007) in which the authors found a correlation between SiN recognition in fluctuating noise (but not stationary noise) and WM—a finding which is in partial contrast to the study by Zekveld et al. (2013). Two factors might have contributed to the divergent results. First, the studies used different WM tasks, which inadvertently may have tested different aspects of WM. Second, the studies used different groups of listeners: Lunner and Sundewall-Thorén tested older hearing aid (HA) users while Zekveld et al. tested young normal-hearing listeners. To what extent the characteristics of the cognitive predictor task and/or listener contributed to the differences in results between studies is unclear. I would like to argue that only a more theory-guided investigation of the relationship between these factors can help us disentangle these results. Given that cognitive theories have formalized many of the relationships between cognitive variables (e.g., Craik, 2016; Diamond, 2013; McGrew, 2009; Miyake & Friedman, 2012), we need to consider their role more closely in cognitive hearing science.
The Listener
Numerous listener characteristics may play a role in SiN perception. In the following section, I will discuss age, language status, educational status, and cognitive abilities, of which age is the most widely studied listener characteristic in the context of SiN perception. Here, I focus on differences found between listeners described as “young” adults, normally between the ages of 18 years and 30 years, and “older” adults, typically above the age of 60 years. I will concentrate on these “extreme” groups as they allow me to discuss the topic in the clearest way. However, I would like to add that there is mounting evidence that the change in listening strategy already starts in middle age (Goossens et al., 2017). The listening performance of children will not be considered in this article. Interested readers are referred to Leibold and Buss (2019). Note that very little work exists that has explicitly addressed this age group in the context of comparing listening strategies with younger and older adults, even though the field as a whole and theoretical considerations in particular would benefit if the whole life-span perspective were taken into account.
Aging is accompanied by many changes, auditory, cognitive, and linguistic. To what extent each change contributes to the common complaint by older listeners that they find SiN perception challenging is difficult to determine. With respect to auditory factors, some changes, such as reduced audibility of sound, are well-known and currently form the basis for audiological intervention such as HAs. Other auditory changes have received less scrutiny but are probably just as important for SiN perception. They occur on many levels of the auditory system and include changes in the cochlea (Humes, 2008), in peripheral neural pathways (Kujawa & Liberman, 2009), and in central pathways (Martin & Jerger, 2005). For an excellent overview to auditory changes associated with aging, the reader is directed to Gordon-Salant and colleagues’ (2010) collection of essays. Similarly, many changes in cognitive function occur with old age, an in-depth discussion of which in healthy and diseased aging can be found in Craik and Salthouse’s (2015) handbook. Finally, changes in both domains are highly correlated (Baltes & Lindenberger, 1997; Lindenberger & Baltes, 1994), which has presented a considerable conundrum to aging researchers. In an attempt to understand this relationship, four hypotheses have been advanced: (1) sensory deprivation hypothesis, (2) information degradation hypothesis, (3) cognitive load hypothesis, and (4) common-cause hypothesis. The first two hypotheses suggest that sensory decline causes cognitive decline with the time required for the effect to emerge being long versus short, respectively. The third hypothesis suggests that decline in cognition causes decline in sensory processing. Finally, the common-cause hypothesis does not link sensation and cognition causally but instead suggests that a common factor underlies concurrent changes in both. For a more in-depth discussion, see Schneider and Pichora-Fuller (2000).
Crucial for the current discussion is the question of what listeners do when faced with changes in either or both domains and how they adjust their listening strategies. It seems reasonable to assume that age-related changes in sensory and cognitive processing lead to changes in the relative contribution of each to SiN perception depending on what is most advantageous for a particular listener. This leads to group-level differences between younger and older listeners. A case in point is a series of studies in which a word-in-noise perception task was individually adjusted to produce the same overall accuracy level for both younger and older listeners (Heinrich & Schneider, 2011; Murphy et al., 2000; Schneider, Avivi-Reich, Leung, & Heinrich, 2016). Then, the accuracy with which the perceived words could be recalled from memory was measured and compared to recall accuracy in quiet. Results showed that in addition to the expected age-related memory-deficit in quiet, memory for the words presented in noise was disproportionately impaired in older listeners compared to younger listeners despite comparable perceptual accuracy of the words. One way to interpret these results is to assume that older listeners had to invest disproportionately more attentional resources than younger listeners during the perception of the words. If the pool of attentional resources is limited (Kahneman, 1973) and shared between perception and memory, then it is conceivable that the increased investment of resources during perception compromised subsequent memory performance. This interpretation is consistent with a meta-analysis by Füllgrabe and Rosen (2016) who found that WM performance, while not a good predictor of SiN perception in young normal-hearing listeners, was a better predictor of SiN perception in older listeners, suggesting that WM (and the associated use of attention) played an increasingly important role in SiN perception as listeners aged.
Apart from spending more attentional resources on SiN perception, older listeners have also been found to use semantic context to the same extent or a greater extent than their younger counterparts when given the opportunity. In a seminal paper, Pichora-Fuller et al. (1995) showed that older listeners did disproportionately well in SiN perception compared to their younger counterparts when the target sentences contained contextual information compared to when they did not. The authors suggest two potential interpretations for the result. Either, compared to younger adults, older listeners may allocate more resources to the perception process and thus benefit to a greater degree when additional information is available. Or, because younger and older listeners have to operate in identical SNRs in everyday listening situations which, in turn, means that older listeners tend to correctly perceive fewer words due to adverse age-related changes in hearing, they routinely have to compensate using contextual information. As a result, they may have more practice doing so (for a similar argument concerning cochlear implant users, see Dingemanse & Goedegebure, 2019). In addition, given older adults’ larger vocabulary (Verhaeghen, 2003) and increased confidence in using their linguistic knowledge (Kavé & Halamish, 2015), older listeners may be more prepared to take qualified guesses about what word they thought they heard. All three types of behavior would reflect a change in listening strategy from a more signal-driven, bottom-up strategy seen in younger listeners to a more top-down, knowledge-driven strategy in older listeners (Avivi-Reich et al., 2014; Schneider, Avivi-Reich, & Daneman, 2016). As aging is commonly accompanied by hearing loss, it can be difficult to disentangle whether the suggested changes in listening strategy are mainly due to age or due to hearing. One way to investigate this question is to see whether listeners of comparable age but different amounts of hearing loss show different listening strategies. The next section will explore this questions.
Besides hearing loss, language competency and education have also been found to affect the strategies adopted by listeners. For instance, non-native listeners have been found to adjust their listening strategy to the aspects of linguistic knowledge they felt most comfortable with even if this is not the type of knowledge commonly used by native listeners. One illustration of this strategy is provided by Heinrich et al. (2010) who investigated the effectiveness of a particular acoustic-phonetic cue (English r-resonance) for SiN perception and showed that only native listeners were able to take advantage of this cue. Non-native listeners used different cues to achieve SiN perception in the same situation, and their choice of cue depended on their proficiency in the tested non-native language: non-native listeners with lower proficiency inappropriately attempted to transfer acoustic-phonetic knowledge of their native language to the non-native language; non-native listeners with higher proficiency instead relied on lexical knowledge in their non-native language as a way of maximizing intelligibility.
A final listener characteristic, which may impact upon listening strategies, is educational attainment. Although its role in modulating the relationship between cognition and SiN perception has yet to be systematically explored, Knight and Heinrich (2019) provide initial evidence to suggest that it may be an important factor. Inhibition, in addition to WM and attention, is often hypothesized to be important for SiN perception; however, Knight and Heinrich (2019) showed that only listeners with lower educational attainment showed the expected link between better inhibition scores and better SiN perception. In summary, the way in which people listen depends not only on the nature of the listening situation but also on listener characteristics including age, hearing loss, language status, and educational attainment.
The Effect of HA Use
Apart from considering age in the context of “normal” hearing, that is, in the absence of clinical loss, age also plays an important role in the context of HAs as most HA wearers are of older age. In fact, over 70% of people over the age of 70 have a clinical hearing loss (Action on Hearing Loss, 2019), which is most commonly treated by fitting HAs. HAs amplify sound based on the hearing loss profile of the listener. In the United Kingdom alone, over 4 million people wear HAs. HAs are very likely to affect the way in which people listen. While we know a lot about how individual differences in cognition can affect the extent to which listeners derive benefit from HA use (for a review, see Souza et al., 2015), it is less clear how HA use affects listeners’ reliance on cognitive function to achieve SiN perception. A number of studies have suggested that differences in benefit derived from HA use can be explained by differences in WM (Foo et al., 2007; Lunner, 2003; Lunner et al., 2009; Ng et al., 2014; Rudner et al., 2011). However, given that we know that SiN listening in younger and older listeners without HAs relies on a broad range of cognitive abilities, and that the particular combination of abilities depends both on the cognitive makeup of the listener and the particulars of the listening situation, it seems unlikely that for this particular listener group WM is the only defining cognitive ability. On the other hand, to date it is not known what these other abilities might be, as most studies so far have focused on the relationship between HA listening and WM.
One notable exception is Nüsse et al. (2018) who investigated cognitive measures beyond WM, in particular attention and lexical knowledge. When comparing two groups of older adults (non-HA users with better hearing and HA users with poorer hearing), they found that their measures of cognition only predicted SiN perception in the group of relatively better hearing non-HA users. In the HA user group, amount of hearing loss was the only predictor of SiN perception. Superficially, this result contrasts with the results of the research that has shown WM to be an important predictor (Ng et al., 2014; Rudner et al., 2009), but it is important to note that the predictive effect was either shown for new HA users (Ng et al., 2014) or for fairly new users (after a 9-week trial, Rudner et al., 2009). Ng and colleagues also showed that the strength of the relationship weakened over time and was no longer significant at 6 months post-fitting. In Nüsse et al.’s study, all HA users had worn their HA for at least 1 year, and this difference in time interval between HA fitting and testing might explain the contradicting results. However, even if the results of these two studies can be reconciled, they still contrast with the plethora of studies in the literature that report a predictive relationship between individual differences in WM and SiN perception in HA users (Souza et al., 2015). To further complicate matters, it is also important to keep in mind that just as with normal-hearing listeners, the predictive power of cognitive function interacted with the particular listening situation and was stronger for more complex (modulated) noise (Lunner & Sundewall-Thorén, 2007). In summary, the way in which cognitive function interacts with HA use and what role the underlying hearing loss plays is still not well understood. WM may play a role; if and how this role differs from non-HA users, however, is not clear. Moreover, how other cognitive functions such as inhibition, attention, phonological and lexical processing as well as general comprehension are engaged in listening has not been explored in great detail.
Discussion
Speech perception in noise is a complex task and requires the involvement of bottom-up auditory as well as top-down cognitive processes. The realization of the central role of cognition for SiN perception has led to the development of a new field of research, cognitive hearing science (Arlinger et al., 2009). While the fact that cognition can play a pivotal role is now well established, we still know comparatively little about what this role entails in particular situations. Based on a brief overview of the literature, I would suggest the following considerations.
First, I have shown that the role and extent of cognitive involvement in SiN perception depends on the particular listening situation. Aspects of the listening situation that affect the type and extent of cognitive involvement include the nature of the target signal, the nature of the background sound, the relationship between target and background, and the overall difficulty level. Similarly, listener characteristics such as age, language status, educational status, cognitive abilities, and hearing status affect the nature and extent of cognitive engagement. Aging is a particularly useful dimension to consider when trying to understand listening strategies because it not only leads to group differences between younger and older adults but also increases the variability and individuality of listeners and more likely leads to a greater variety in listening strategies.
While WM undoubtedly plays an important role for SiN perception, it is also clear that WM cannot be the only cognitive function involved. A more comprehensive assessment of cognitive function and its relationship to listening in various situations is needed. However, to make such an assessment most effective, it has to be embedded within standard cognitive theory. Only when we understand how the cognitive skills we assess relate to each other, can we fully understand how they relate to SiN perception. An application of the knowledge, both theoretical and experimental, available in cognitive science could help us develop theoretically based predictions about which cognitive functions are involved in SiN listening, and under what conditions. Studies now increasingly consider the role of cognitive functions other than WM for SiN perception, including inhibition (Stenbäck et al., 2016), attention (Heinrich et al., 2015), response control (Heinrich et al., 2016), vocabulary and language comprehension (Schneider, Avivi-Reich, & Daneman, 2016; Schneider, Avivi-Reich, Leung, & Heinrich, 2016). Understanding the relationship of these concepts to one another and relating them systematically to different listening situations would greatly increase the potential clinical impact of this research. It could allow us to understand which cognitive functions individual listeners engage in the listening situations they encounter most often in their daily lives or find most important to master in the context of a fulfilling life. Based on this, and understanding of the influence of the characteristics of the individual listener, we could then develop suitable interventions and offer targeted training.
Second, an improved understanding of the role of cognitive function in SiN perception would allow us to better understand how this role changes with HA use. A more systematic understanding could be a stepping stone toward developing specific and targeted interventions and training programs for new and experienced HA users. It could also lead to algorithm selection to best fit the individual needs of the listener. In line with Meister (2017), I suggest that the connection between cognition and speech perception can create perspectives for individualized treatment.
Third, listeners in different age groups use different listening strategies to achieve the same level of SiN intelligibility (Avivi-Reich et al., 2015; Schneider, Avivi-Reich, & Daneman, 2016). I would like to posit that this mechanism is likely to occur in a gradual manner across all listeners and is driven by an individual’s personal mixture of cognitive skills, knowledge, and sensory ability. I would also like to highlight that while we normally concentrate on a small number of listening characteristics and tend to consider them in isolation, it is more likely that a number of these characteristics interact, sometimes compensating for, and at other times interfering with, each other. How these interactions occur, and which cognitive functions interact in which situations, still largely remains to be understood.
Footnotes
Acknowledgments
I would like to thank Adriana Hanulikova for continued encouragement, and Astrid van Wieringen, Sarah Knight, and Helen Glyde for constructive comments on earlier versions of this manuscript.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the NIHR Manchester Biomedical Research Centre.
