Abstract
BACKGROUND:
Voice banking allows those living with Motor Neurone Disease (MND) to create a personalised synthetic voice. Little is known about how best to support this process.
OBJECTIVE:
To review a dedicated voice banking service with the aim of informing service development.
METHOD:
A service review of existing health records from neurological services in Sheffield, UK, carried out retrospectively and covering 2018 and 2019. Case notes were reviewed to extract information about use of communication aids, offer of voice banking, and use of synthesised speech. Responses to a routine follow up survey were also collated.
RESULTS:
Less than half of the clients whose notes were reviewed had been informed about voice banking, one in four had completed the voice banking process, around half were using communication aids, and one in ten were using their personalised synthetic voice on a communication aid. The time taken to complete the process had a large variation. Those completing the process viewed the personalised voices positively and all were used when created. Support from professionals was noted by some as being key.
CONCLUSIONS:
Voice banking services should be more widely promoted to ensure that individuals can consider voice banking prior to changes in their speech. Research studies should inform how and when those living with MND are introduced to voice banking.
Introduction
Motor Neurone Disease (MND) is a rare progressive neurological disease that is estimated to have a mean incidence of around 1.59 per 100,000 person years worldwide [1] and around 2.06 in England, UK [2]. The disease has many identified forms and around 25% of those diagnosed with MND have the bulbar palsy form which affects speech and swallowing at the start of the disease [3].
Representation of Voice Options for Voice Output Communication aids. Current (solid lines) and emerging (dashed lines) options.
Augmentative and Alternative Communication (AAC) is a global term used for a range of strategies that support communication [4]. The UK National Institute of Health and Care Excellence guideline for motor neurone disease: assessment and management [5] outlines the need for AAC for people living with MND. 95% of those diagnosed with MND will develop speech and communication issues at some point in the disease progression and of these 72% may benefit from AAC [6]. AAC includes communication aids that use speech synthesis technology to produce a synthetic voice output and these are a common intervention for individuals living with MND and often provided through speech and language therapy services [7]. Those living with MND will use these communication aids in a variety of ways depending on the type, stage and severity of their MND. Touchscreens on tablets, keyboards on physical devices, switches and sensors, and eye-gaze technology are all examples of ways in which individuals can access communication aid software to type messages [8]. Techniques such as word prediction can be used but rate of communication is often cited as a challenge in using communication aids by those who use them [9].
There are a wide range of communication aids on the market and these are provided with a choice of commercially produced ‘standard’ synthetic voices. Synthesised voices are available in many languages but with a small number of regional accent or dialect options – a reflection of the fact that production of these voices involves hundreds of hours of high-quality voice recording from voice actors. In most applications of this speech synthesis technology (e.g. automated phone messages or voice assistants) this range of voice options is appropriate. When considering communication aids however it is clear that the impact of voice is much more significant as voice is considered part of self-identity. Other authors have highlighted the potential impact of voice personalisation and choice on the adoption and acceptability of communication aids to those using them, including those with MND [10, 11, 12].
Research into personalised synthetic voices including for use with communication aids has been ongoing for many years [13, 14, 15, 16]. The underlying technology on which the synthesis is based has developed over this time and current commercial offerings [17, 18] are largely based on statistical parametric techniques [19]. More recent novel underlying speech synthesis technologies that aim to improve personalisation and expressiveness have also been developed [20, 21, 22], including those based on generative adversarial neural networks [23, 24]. All these processes take recordings from an individual and using specific software then create a synthetic voice that is a version of the individual’s voice and retains as many of the properties of the voice as possible. Recordings created prior to changes in speech production can also include whole messages that can be used in communication aids as recordings, rather than (or as well as) being used to create a personalised synthetic voice – a process known as message banking [25]. The focus of this paper is on personalised synthetic voices and we use the term in this paper to mean synthetic voices which include some characteristics of the end user’s voice. It is clear, however, that synthetic voices can also be personalised in other ways – i.e. based on a user’s preferences for characteristics of the voice. Figure 1 represents the range of current and emerging (dashed lines) options for voice output on communication aids.
This paper reviews a voice banking service delivered by the Neurological Enablement Service (NES) in Sheffield [26] which provides support for creating personalised synthetic voices to individuals including those living with MND. We use the term voice banking to refer to the service and the process of making recordings of words or phrases, ideally prior to changes in speech production, to produce a personalised synthetic voice that is subsequently used on a communication aid. Within the UK, the Motor Neurone Disease Association (MNDA) have been at the forefront of promoting the availability of these providers to those with a diagnosis to complete this as a ‘vocal insurance’ [27] due to the risk of communication difficulties.
Voice banking is a relatively new technology and there is likely to be significant variation in how this is delivered across the UK and elsewhere. In searching the literature we did not find any reviews or evaluations of the delivery of voice banking services by health care professionals. It is important to understand how best to deliver voice banking services, as well as the impact which this intervention may have on those receiving it and how to best optimise these outcomes. The primary aim of the work described in this paper was to review the voice banking service provided to people living with MND in Sheffield in terms of uptake and outcomes in order to inform local service development or commissioning.
A case notes review of records from 2018 and 2019 was completed to extract summative descriptive statistics and qualitative responses to a locally developed survey sent to those completing the voice banking process.
Context of this study
Within Sheffield, the assistive technology service has been offering support for people wanting to complete voice banking since 2016. All individuals with a diagnosis of MND who live in the Sheffield Clinical Commissioning Group region with any swallowing or communication needs are referred to the Neurological Enablement Service (NES) and those with a potential need for a communication aid or who state an interest in voice banking are referred to the assistive technology service within NES. Others with a diagnosis of MND without swallowing or communication needs receive therapy services from the Neurology Outreach Team.
The voice banking service offered within Sheffield during the review period used the ModelTalker system [16, 28]. The rationale of the selection of this system was due to a number of factors: of the systems available in 2016 this was deemed, through informal evaluation by the service’s staff, as the easiest to complete at home and online; initially the process and voice was free, as part of a research project, and subsequently when this became chargeable this was funded by the MNDA; the synthesised voice produced by this process was technically compatible with the communication aids provided by the assistive technology service in Sheffield; and finally the results of the process and support offered by the team at ModelTalker were considered good. At the time that the clients described in this paper completed the process the requirement was to record 1600 sentences into the ModelTalker system.
On receipt of referrals for support with voice banking an initial assessment appointment was completed by the specialist speech and language therapist from NES. Prior to this appointment basic information about voice banking had been provided to clients by referring healthcare professionals to allow consent to the referral. The initial appointment aimed to provide clients the information required to enable them to make an informed decision to continue. At this appointment each client was asked what they knew about the voice banking and as appropriate a discussion took place about: the ModelTalker process, the commitment required from the client, the outcome being a version of their voice, and the support that was available from NES. During this discussion clients were also given an opportunity to hear real examples of personalised synthetic voices from other individuals living with MND who had completed the process as well as examples of the personalised synthetic voice of the second author [17].
If the individual indicated that they would like to complete the ModelTalker process then the service’s therapy assistant delivered a Sennheniser
Flow chart of record identification.
Once the individuals’ synthetic voice was downloaded and the process completed a questionnaire was sent to the client to fill in. The questionnaire was distributed by post, electronically, or by hand at a subsequent appointment. If possible clients completed the questionnaire independently but if they were unable to do this then the therapist or assistant involved competed it as an interview. The questionnaire (see Appendix 1) was developed and administered as part of routine practice with the aim of supporting the ongoing evaluation of this service offering. The questionnaire was developed in 2016 by the team based on the anecdotal feedback from clients and the team’s own experience and aimed to cover the effort of completing the voice banking process and the client’s perceptions of the outcome of it. The questions were reviewed after initial use and were felt to provide the required feedback.
In order to identify the appropriate records to review, the following was carried out:
A report was run on the electronic case notes system (TPP SystmOne SystmOne, TPP – The Phoenix Partnership Ltd, Leeds, UK:
Unique patients with a diagnosis of MND were identified by filtering based on diagnosis on all referrals logged on the TPP system.
Clients where the referral was not actioned were excluded, such as if the client died before being seen.
The records identified were thus for clients who had a diagnosis of MND during 2018–19 in Sheffield and had a referral to one of the neurological services (Fig. 2).
The identified client records were then reviewed by hand to identify:
if they had an open Speech and Language Therapy (SLT) referral – using a keyword search “speech”; If they used AAC or not, and details of the AAC used (voice output or not); if voice banking had been discussed or offered – using a keyword search of each of these records using “voice banking”, “voice” and “banking”;
details of if this was accepted or declined; the date that they were delivered the headset and opened the account; if the process was completed and the date of completion; the date discharged if voice not required once completed; if the voice was downloaded on to a device and the date this occurred.
The data were tabulated as date and categorical data and descriptive statistics were produced using spreadsheet software.
Completed feedback questionnaires were retrieved from the client records of all who completed voice banking, the questionnaires once retrieved were anonymous and did not contain identifiable data. The free text answers were transcribed into word processing software, reviewed and categorised by the second author, and illustrative quotes extracted.
Case review – AAC and voice banking information
Case review – voice banking clients with SLT referrals
Case review – outcomes of personal synthesised voice creation
Case note review – voice banking offer
Descriptive statistics – time to complete
Tables 1–3 summarise the outcome of the client record review. Sixty-two clients had a diagnosis of MND and an open referrals in 2018 and 2019, of this sample, 55 had an open SLT referral. Sixteen clients were noted as having attempted voice banking either prior to or within this period and of these 15 had completed the voice banking process. Twenty nine clients were recorded as using AAC, the majority [16] were recorded as using an aid with a non personalised synthetic voice, 6 were using their personalised synthetic voice, and the remaining 7 were either using paper based or low tech AAC systems that did not have a voice output. Those using non personalised voices included individuals who had significant speech difficulties when referred or who had been using these systems prior to voice banking service becoming available in 2016. Nine clients completed the voice banking process but were not using their personalised synthetic device on a communication aid, all 9 were recorded as not experiencing significant speech changes at that point in time. Four clients who were banking their voices were not known to SLT or the wider NES service but were know to the Neurology Outreach Team.
Records were reviewed to identify if voice banking was discussed or offered by Speech Therapists and this is summarised in Table 4. No reference to voice banking was found in the records for 36 of the 62 clients. Records included notes of a discussion regarding voice banking for 24 clients, of these 7 could be identified as having been offered the voice banking serivce but declined. One client did attempt to start voice banking but their condition rapidly deteriorated and they were unable to complete the voice banking process. Two clients were noted by the therapist as not appropriate for voice banking without a recorded discussion with the client about the topic.
The time that it took to complete the voice banking process from delivery of the headset to completion of the voice ranged from 7 weeks to 65 weeks with a mean of 20 weeks and standard deviation of 17 weeks (Table 5).
The 15 clients who completed the voice banking process were offered the opportunity to complete a questionnaire and 7 were returned (46% response rate).
The trigger for the client being referred to the voice banking service was reported by respondents as being from a range of sources with the majority from professionals prompting or raising it. One respondent was prompted by friends/family and two clients were prompted independently by noticing changes in their voice and being aware of the service.
All respondents reported finding the ModelTalker process easy or OK, none reported that it was challenging and all respondents reported that they would do it again. Four respondents made positive referrence to the support of having the assistant:
“professional and cheerful help of therapy assistant and colleagues”
“the extensive inputting was greatly helped by therapy assistant’s encouragement, professional advise and friendliness.”
“made easy by the first class speech therapy assistant from NES”
“(assistant) fun, bite size chunks, when happy left to it”
The clients view of the final voice was mixed with some positive responses:
“Smashing and it sounds like me and if I lose my speech I will still be able to communicate with my family, pets and friends.” “pleased with the end result and it is fun to hear it played back when reading word documents. The grandchildren love it.”
While others provided a nuanced response:
“OK, definitely sounds like me. It is a bit electronic. I was a little disappointed with the pronounications”
“although the end voice isn’t prefect it sounds a lot more like me than the synthetic voices I heard before” “if you are going to do it, do it early”
One respondant referred to having been prepared for hearing back the voice:
“(therapy assistant) advised synthetic voice and ‘hearing back’ my own voice would sound different to how I expected. However, my family and I think it is a reasonable representation and we are happy with the result”
Discussion
Voice banking enables people using AAC to have a personalised synthetic voice which can help mitigate the loss of identity in losing your voice [10, 27, 29]. For anyone with a diagnosis of MND, where speech quality will deteriorate with the progression of the disease, voice banking is ideally completed before any changes in the individual’s speech have occurred – as one of the clients in the study stated “if you are going to do it, do it early”. It is clear that many living with MND will experience a mourning process that can provide psychological barriers to uptake of interventions such as this [30]. Likewise Judge et al. [31] identify preparedness as a facet of communication change in MND – and that preparatory tasks such as voice banking can be seen by some as beneficial irrespective of the final outcome of the process.
The results of this work demonstrate that less than half (38%) of those with MND during the sample period appear to have been offered or informed about voice banking and around a quarter (26%) had attempted to complete voice banking. This is despite the presence of a specific service within this locality to support this process – something that is unlikely to be avaliable consistently in other localities. It may be that therapists are discsusing voice banking routinely with those living with MND and that this was simply not recorded in the notes, however this work concurs with Cave and Bloch [29] in identifying the need to better consider how and when those living with MND are introduced to voice banking and to ensure that this is consistently offered.
The data from the case notes review was disappointing. The UK National Institute for Clinical Excellence guideline Motor Neurone disease: assessment and management (2016) outline the need for AAC for people living with MND and that assistive technology should be included as part of the wider multi-disciplinary team. As part of a multi-disciplinary team we anticipated that discussions about voice banking would be a regular part of the service offered to people living with MND. Guidance on voice banking from the Royal College of Speech and Language Therapists [32] describes the role of speech and language therapists in the voice banking process and in early intervention. An individual coping with a new diagnosis may not want to discuss the potential of losing their voice and the discussion clearly needs to be senistively approached and on the individual’s own terms. Introducing voice banking is thus potentially challenging and difficult to incorporate into a standardised process. Further work is indicated to look at how therapists or other professionals around an individual are best placed to introduce this intervention.
As this was a case note review on a keyword it is possible that the notes did not reflect discussions about voice banking and that these discussions were simply not recorded or recorded in another way. Standardising and mandating the recording of information relating to the voice banking offer within the clinical records would improve the abilty to review this service in the future and may also drive uptake in offering this service.
The questionnaire data demonstrates that the outcome of this process is clearly valued by those who completed the process however those living with MND still have to weigh up the potential benefits of a personalised synthetic voice and the commitment required to create this. The affordance of identity provided by a personalised voice and the psychological impact of hearing a ‘lost’ voice on the individual and family members were identified by Cave and Bloch [29] as factors involved in the uptake of this intervention. Cave and Bloch also found a suplimentary factor in the decision making as being the support percieved as required to complete the voice banking process. This aligns with feedback reported here from those who had completed the process and noted the benefit of the “therapy assistant’s encouragement, professional advise and friendliness.” It is clear however that there may be many other factors that may affect uptake of this intervention and these appear little investigted in the literature.
Cave and Bloch [29] also identify the need for practitioners to consider how a personalised synthetic voice will be used in an AAC device. Of the clients identified in this review no clients had actively rejected or ceased using their personalised synthetic voice in an AAC device once having completed the process: of those that attempted voice banking in this sample, all but one completed it, 38% of those who attempted voice banking were actively using the voice in an AAC device, and those not using it were not using it because they were not yet using powered AAC. This level of uptake potentially reflects the support and preparation that the clients received in use of AAC and the personalised digital voice as well as the prior expectation setting around what the resulting voice would sound like.
There were a small number of clients in the case note review where voice banking appears to have been considered by the therapist but not offered. This is potentially attributable to vocal changes experienced by the client at the point of consideration by the therapist. As well as suggesting the need for early intervention, this situation highlights the potential for new and emerging technologies in offering ‘voice repair’ or ‘voice donation’. These technologies, some of which are already offered on the market, aim to allow clients who are experiencing changes in their voice to potentially still access a personalised digital voice that affords a personal identity.
The process of voice banking can be time consuming, the large range and variance reported in the time taken to complete the voice banking likely reflects the challenges of living with a diagnosis of MND and that physical health issues, work commitments, support systems, and care and therapy support will impact on the ability of an individual to complete the proces of voice banking. As with any long term condition, a significant proportion of an individual’s day may be spent in managing the disease [30] and realistic expectations of the demands and ability to complete the voice banking process will vary. It is encouraging that new voice banking services are now available that have much lower completion requirements. The service review reported here may also be of use to those designing voice banking processes who may assume that users would complete the process alone and in short intensive periods and design the system with this use-case in mind – the information reported here would suggest that this is not the case. The support from the therapy assistant was, unprompted, mentioned by four of the seven respondents who returned a questionnaire – and it may be that this support is integral to equitable access to a resource, knowledge and time intensive process such as this.
Ideally those attempting voice banking will have limited speech involvement and so it was anticpated that the number of clients attempting voice banking who were not known to the speech and language therapy (SLT) team would be a significant proportion however only 6% of the sample were not receiving SLT services and known to the team for voice banking only. This potentially demonstrates the need for wider promotion of the voice banking outside the SLT professional group.
Creer et al. [6] estimated that 72% of people diagnosed with MND would benefit from AAC. Within this sample, 47% were found to be using a communication aid during the sample period of two years. Although the percentage using AAC is lower than that estimated by Creer et al, this can likely partly be explained by the 2 year sample period – suggesting that the range of times between onset of condition and need for use of AAC is greater than 2 years. This service review was not designed to look at the population from an epidemiological point of view, however it may provide some confidence in the estimate from Creer et al. This review highlights the potential for more in depth epidemiological work with this population, including the potential for a registry based study, to provide more definitive data on the rate of uptake of different forms of AAC.
Limitations
As a small retrospecitve service evaluation, this work was not designed to provide generalisable results and can only be interpreted in this context. As a case notes review, the categorising and interpretation of the notes is subject to judgement. The response rate to the survey was low, as may be expected in routine practice, and the respondents are likely to be self selecting. As a service evaluation it was not possible to follow up non-respondents, as this was not part of routine practice. A simple qualitative method was used to anlayse the survey responses and some responses were interpreted as positive or negative without an external validation process. Whilst we suggest that this method is appropriate to this type and ammount of data, a more robust method would be required for any future research studies evaluating these services.
The numbers in the study are a sample of all the people who have MND in Sheffield who were referred in to one of the Sheffield neurological services. Whilst it is expected that this will be the vast majority of individuals living with MND, there will be an unknown number of clients in Sheffield with an MND diagnosis whose records were not accessible for this review. To derive more generalisable results, a future research study could look to include all clients with a diagnosis of MND and collate other measures which were not available in these service data such as ALSFRS scores, the type of AAC, method of AAC use, and demographic data.
Conclusions
This service review covered a two year period and consisted of a casenote review of clients of NHS nurological services in Sheffield, UK who were living with MND. The review provides information that supports the improvement of the voice banking service offered in this area as well as providing a reference that can be used to compare voice banking services in other areas. This information could be used to inform commissioners regarding service requirements and also provides useful indicative results to inform estimates around need for and uptake of AAC within the population of those with living with MND.
Around half of those identified as living with MND were using AAC, one in four completed the voice banking process and one in ten were using their personalised synthetic voice on an AAC device. Personalised synthetic voices were seen positively by those completing the voice banking process and can be viewed as a person centred intervention that will support the maintenance of identity for this group of clients for whom loss of speech and the identity that it provides is a high risk.
The review suggested that the support provided by the service was viewed positively and may contribute to the high rates of use of the personalised synthetic voices produced. Voice banking was not recorded as being offered to around half of those living with MND in this sample and further work is recommended to ensure that those who may benefit from voice banking are identified early in the progression of the MND disease, before changes to their speech, and provided with support to allow informed consideration of voice banking.
This was a small, retrospective, opportunistic, service evaluation, and it is suggested that further prospective research studies and more robustly designed service evaluations should be carried out in order to better inform how and when those living with MND are introduced to voice banking and make best use of personalised synthetic voices in communication aids.
Author contributions
CONCEPTION: Simon Judge and Nicola Hayton.
PERFORMANCE OF WORK: Simon Judge and Nicola Hayton.
INTERPRETATION OR ANALYSIS OF DATA: Simon Judge and Nicola Hayton.
PREPARATION OF THE MANUSCRIPT: Simon Judge.
REVISION FOR IMPORTANT INTELLECTUAL CONTENT: Simon Judge and Nicola Hayton.
Ethical considerations
As a service evaluation, ethical review was not required by the UK Health Research Authority. Data were collected and processed in compliance with local data processing regulations.
Footnotes
Acknowledgments
With thanks to Dr Stuart Cunningham who provided support in describing the underlying speech synthesis technology.
Conflict of interest
Simon Judge reports no conflict of interest. Nicola Hayton is employed by Sheffield Health and Social Care Trust of the National Health Service who provide the NES service described in this paper.
Appendix 1: Voice banking questionnaire
What prompted you to enquire about voice banking? How did you find the process? What do you think of the result? Would you do it again? Yes/No
