Abstract
Background:
Speech variations enable us to map the performance of cognitive processes of syntactic, semantic, phonological, and articulatory planning and execution. Speaking is one of the first functions to be affected by neurodegenerative complaints such as Alzheimer’s disease (AD), which makes the speech a highly promising biomarker for detecting the illness before the first preclinical symptoms appear.
Objective:
This paper has sought to develop and validate a technological prototype that adopts an automated approach to speech analysis among older people.
Methods:
It uses a mathematical algorithm based on certain discriminatory variables to estimate the probability of developing AD.
Results and Conclusion:
This device may be used at a preclinical stage by non-expert health professionals to determine the likelihood of the onset of AD.
INTRODUCTION
Alzheimer’s disease (AD) is an insidious and progressive neurodegenerative disorder that is clinically defined by the impairment of certain cognitive and functional abilities. Numerous longitudinal studies have reported that cognitive impairment could be detected long before the appearance of prodromal symptoms (between 8 and 15 years prior to the diagnosis of AD) if a suitably sensitive measurement could be found [1–3]. There is currently an ongoing global search for markers of the very early, and perhaps reversible, pathological changes related to AD in individuals that are still cognitively unimpaired, prior to the appearance of the first symptoms. The minor cognitive changes during the prodromal phase have been described in sundry studies addressing a range of cognitive processes, with the ones most widely used being an early decline in episodic memory [4] or the changes in semantic memory and the visuospatial function [5]. The evolution of this preclinical phase is measured by factors such as cognitive reserve [6], especially in individuals in whom the subjective awareness of such impairment and their compensatory mechanisms temporarily alleviate their cognitive deficits and impede early detection in neuropsychological tests [3].
The contention in recent years has been that speech and language disorders are deficits that precede even the first memory lapses in patients with AD, and they may be a good predictor of the disease [7–9]. These alterations have been observed both in narrative tasks involving the description of images, in which semantic alterations are symptoms of the disease [10, 11], as well as in reading tasks [12]. Cognitive impairments in language involve the early appearance of semantic paraphrasing [13] with grammatical and lexical simplification [7], and problems in lexical-semantic access that lead to the early onset of anomic aphasia [13]. They both characterize a multifactorial alteration of semantic content and access to the same [14]. Alterations have also been detected in verbal fluency [15], prosody [16], and speech rhythm [17–19]. There is nothing new about the consideration of phonetic-phonological factors in relation to the symptoms of AD, and this was even stated by Alois Alzheimer himself back in 1907 [20], when describing the oral output of patients with the dementia named after him as slurred speech, which is slow and fluctuating, monotonous, tremulous, hesitant, weaker, with no control over breathing, with articulatory apraxia, a low melodic level, and slower rhythm. In short, there seems to be an irrefutable link between speech and mental processes [21], whereby analyzing the former may constitute a reliable and valid way of assessing an individual’s cognitive state.
Speech involves converting ideas into sound patterns, and any variations during spontaneous speaking may reveal performance difficulties regarding phonological, morphological, lexical, syntactical, or thought processes [22]. Speech alterations are present in preclinical states of AD, being one of the first functions affected by neurodegenerative processes due to subtle alterations in a person’s anatomy and histology. Alterations in acoustic and speech rhythm are imperceptible to the human ear, but the development of technologies in Automatic Speech Analysis has led to the identification and effective extraction of these acoustic and temporal parameters. Vocal biometry or automatic speech analysis would be an ideal tool for assessing cognitive deficits or alterations among older people at risk of developing AD, given that it allows the real-time recording of verbal planning, sequencing and performance [11]. This procedure is informed by Source-Filter Theory, according to which speech is the outcome of the modulation of sound as an airflow that can be analyzed as a mathematical circumvolution of the functions of the source (Fourier transform) and of the filter (frequencies [23]).
Automatic speech analysis has already been used in research into neurological complaints among children [24], in cases of Parkinson’s disease [25], and brain damage [26]. In the case of AD, the scope of automatic speech analysis has been especially ground-breaking and has received considerable attention detailing the characteristics of speech impairment among people with AD (see review in [15, 22].
Speech in AD has been characterized by changes in different temporal and acoustic voice parameters, such as the following: alteration to the second and third formant (sdF2 and sdF3) in the speech spectrum [12, 27]; attenuation of the speech signal and a shorter vowel duration [16, 28]; a greater number of voice breaks and hesitations [12, 29]; more pauses when speaking, or a lower rate of expressive articulation [19]. Other important speech variables in the early detection of the first signs of cognitive impairment are the average length of pauses and syllables, phonation time, and the mean of the fundamental frequency [18, 30]. The significance of speech analysis lies above all in its major discriminatory capacity, which according to some studies [22] is more than 91.2% accurate in distinguishing between individuals with AD and those without dementia. This algorithm provides the profile of a voice with interruptions and breaks, monotony with few variations and shades of intensity over time, little control over the use of high frequencies, producing a high-pitch sound without the hoarseness typical of older people, and with little stability in its fundamental frequency. Besides these parameters, another original way of analyzing speech in dementias that we are going to consider involves the study of speech rhythm alterations [18]. Rhythmic differences between groups, with a higher variability of syllable interval duration in AD patients (ΔS value), can differentiate between AD patients and older adults without dementia with an accuracy of 87% (ROC Curve, specificity 81.7%, sensitivity 82.2%).
These findings suggest that such prediction functions could be automated and used to detect the probability of the onset of the early stages of dementia. To do so, we have re-analyzed all the samples available from prior studies using processing scripts that include both acoustic parameters and parameters of alterations in speech rhythm. The combination of both types of variables should provide more effective and simplified discriminant indices.
The aim of this study was therefore to design, test, and validate an original and effective prototype for automatic speech analysis, using algorithms for classifying the features related to early AD and obtaining estimates on the likelihood of developing the disease. With a view to achieving this, the prototype should be based on the experimental data reported in scientific journals, and should be a device of a non-invasive nature, small enough to be carried, reliable, and easy to use by any non-expert socio-health professional.
The development of an automatic analysis technology as an early detector of AD has numerous benefits [31]: it permits early access to pharmacological and cognitive treatments that delay the onset of the disease and defer the need for specialized institutionalization [32]; it may also lead to the detection and treatment of other functional issues (spatial-temporal orientation) and behavioral ones (depression, anxiety, sleep disorders) and many other comorbidities, reducing the financial and personal costs of AD [33]. Finally, it improves a patient’s quality of life, their access to social services, and their future outlook.
METHODS
Participants
The development of the VAD-AD prototype is based on experimental studies involving older people that feature in the speech sample database compiled by our research team with native European Spanish speakers. In prior studies, these samples have provided the algorithms forthcoming from the discriminant functions and/or ROC curves for the classification or not of AD, as described earlier.
A total of 145 registers were used as the sample for obtaining and adjusting the decision algorithms in the prototype’s design (see table 1 on group characteristics). Participants had to be aged between 65 and 95 and were required to have more than six years of primary education in order to ensure they were fully literate; they should still be able to read and score less than seven on the Goldberg depression test. Participants not fulfilling these criteria were excluded. As further exclusion criteria, we used the medical record of drugs or alcohol consumption; being under pharmacological treatment affecting cognitive functions or presenting visual or auditory deficits. Anyone who had previously suffered from head injuries or psychosis was also excluded.
Groups’ characteristics
Non-pathological participants were recruited from the primary care unit and were thereupon classified as asymptomatic subjects on the basis of the assessment made by our research group. Ninety-eight participants formed the non-pathological senescence (NPS) asymptomatic group. The group of early AD patients comprised 47 residents from the State Reference Centre for Alzheimer’s, who were previously diagnosed according to the Spanish National Health System, and whose AD diagnosis was later confirmed according to GDS stages 3 (mild cognitive impairment) and 4 (moderate cognitive impairment) by the Centre’s medical and neurological service, following the criteria of NINCDS-ADRDA [34].
Material and procedure
The VAD-AD prototype’s automated process involves the following steps: 1) using a microphone to record the signal; 2) using a converter to digitize the signal; 3) the subsequent analysis and processing of the digital signal using phonetic analysis software that captures the physical dimensions of the soundwave and analyses their different parameters (articulatory rhythm, intensity of the voice in sonority, intensity of the voice in amplitude, emission time, analysis of the speech rhythms in pauses, accents or voice breaks, variations in the frequencies of the sound signal, timbre or formant structure); and finally, 4) the presentation of the results in the form of numerical parameters that can be interpreted to identify specific alterations in speech fluency, acoustics, and prosody due to changes in cognitive processing [31].
The analyzer’s prototype is a model made up of sundry independent modules of electronic components available on the market. These operate sequentially toward the end goal of using the recording of a physical sound signal to provide the level of probability that the person being assessed will develop AD. The automatic analyzer includes the three key functions required for this purpose: recording the person’s speech, a spectrographic analysis of the voice recorded, and the automatic calculation of the percentage of probability of developing AD based on the appropriate discriminant algorithm. This means that the VAD-AD analyzer simplifies, speeds up, and automates the diagnosis of this dementia. As we shall see in due course, these automatic calculations have required discarding some of the variables obtained in experimental studies and designing algorithms with variables that do not need manual recoding. In sum, the VAD-AD device is intuitive (i.e., it does not require any training prior to its use), straightforward, very reliable and quick, which permits gathering data series in a systemized and automatic manner.
Components of the VAD-AD
All the components are powered by a Micro USB 5v 3000 mA mains AC-DV adaptor. The single-board computer (SBC) is a Raspberry Pi3 Model B (1.2Gz to 64-bit quadcore ARM CPU), with 1 GB of memory RAM LPDDR2, and USB 2.0, using the LINUX operating system. The SoC boards have a proven track-record in the creation of small devices with major computation and storage capabilities. A single circuit board holds a microprocessor, primary memory (RAM type), standard input and output ports, internet connection, programmable databases for adding other hardware, etc. The audio output has a 3.5-mm HDMI connector, and the video output involves an RCA connector (PAL and NTSC), HDMI (rev 1.3 and 1.4) with a display serial interface (DSI) for its LCD panel (Fig. 1). The screen and all the other components are mounted inside a 3D modelled casing.

Prototype for the Voice Analysis Diagnosis of Alzheimer’s Disease (VAD-AD).
The voice recording system uses a unidirectional lavaliere lapel clip-on microphone, model Sony ECM-CS10.CE7, providing the necessary quality for spectrographic analysis. It is connected using an L-shaped 3.5-mm mini jack plug. It has a signal-noise ratio of 38 dB SPL, with an output impedance of 1.9 Kiloohms +–30%. The maximum input sound pressure level is 110 dBm, with harmonic distortion of 3% at 1000 Hz. The spectrum of frequencies ranges from 50 to 15000 Hz.
Recording is controlled through a 3.5-inch Touch Screen TFT LCD (A) 320*480 user interface. The patient’s data are inputted using a standard keyboard: the fields include data on name, age, sex, years of study, as well as the score recorded in the Goldberg Depression Test [35]. Once these data have been introduced, recording may begin, following which the speech analysis is conducted. The voice recording involves reading aloud the first paragraph (in Spanish) of the novel Don Quixote, by Miguel de Cervantes, with a phonation time sequence of between 25 and 45 seconds. The voice recording format is prepared for the speech analysis software through the use of a Creative Sound Blaster Play sound card. The sound data are checked and digitized using the Audacity program.
The software for the speech analysis is automatically run by the prototype’s Praat 6.0.19 voice analysis program for LINUX [36]. Praat analyses the parameters of the classifying algorithms designed accordingly. There are two steps for loading the analysis processes into the prototype: the restatement of the algorithms obtained experimentally with a view to automatically performing the calculations in the prototype (deleting the variables produced by two recombinations in simpler variables); and the joint use of two kinds of prediction variables: acoustic and temporal parameters for sound analysis [22] and parameters for the velocity of articulatory movements [18].
The following steps were taken to validate the new model for predicting AD:
1) The average values of the voice registers were compared between the samples of asymptomatic and early AD individuals, with the results shown in table 2, indicating those that recorded significant differences (see Supplementary table 1 for glossary of acoustics terms).
Comparison of the voice parameters used between older people undergoing normal aging and those diagnosed with AD. T-tests for independent samples, means of the two groups, and lower and upper confidence intervals
p < 0.01; **p < 0.001.
2) The variables that recorded significant differences and the age of the participants were used to conduct a discriminatory analysis through the step-to-step procedure, using the latter as predictor (independent) variables on the dependent variable (non-pathological versus AD groups). The initial outcome of this analysis was a highly significant discriminant function (see table 3). The values of the Discriminant Function are Autovalue (1.615), percentage of variance (100%), canonical correlation (0.786), the Wilks Lambda (0.382), and Chi-square value (132.65, p < 0.001).
Coefficients of the discriminant function
This function is parsimonious inasmuch as it includes solely those variables that make a significant contribution to the prediction of the dependent variable. These variables were as follows: age; mean of the voice’s minimum amplitude in each utterance (Amplit_Min); mean of the value of the maximum amplitude difference value in each utterance (Amplit_Diferenc_Max_Mean); location of the mean frequency within the interval of frequencies (Asymmetry); the value of the standard deviation of the first formant (sdF1); the size of the bandwidth of the third formant (F3_B3); pitch trajectory (sum of absolute intervals) within syllabic nuclei, divided by duration (in ST/s) (Traj_Intra); normalized Pairwise Variability Index (nPVI); the standard deviation of the harmonics-to-noise ratio (sdHNR), and the harmonics-to-noise ratio (HNR) extracted from the Acoustic Voice Quality Index (AVQI_HNR). Secondly, the standardized coefficients were obtained for each one of the dimensions of the voice registers playing a significant part in the prediction (see table 3).
3) Given that the measurements provided by Praat are direct scores of the register, they have been standardized (the mean value is subtracted from each direct value and the difference is divided by the value of the standard deviation) for their inclusion in the discriminant function. table 4 shows these values.
Mean values and standard deviation for the conversion of direct scores into standardized ones
These standardized scores and coefficients are therefore used to construct the discriminant function that provides the final score (PF in its Spanish abbreviation): PF = (0.6216 * z_Amplit_Dif_max_med) + (0.7018 * z_Amplit_min_dB) + (0.4609 * z_Age) + (0.3763 * z_sdF1) + (0.5055 * z_nPVI)–(0.4368 * z_sdHNR)–(0.4782 * z_AVQI_NHR_dB)–(0.3857 * z_Asimetria) + (0.2892 * z_TrajIntraZ) + (0.1709 * z_F3_B3). The final score is compared to the centroids (each group’s reference values), which are–0.874 for the NPS group and 1.822 for older people with AD. These figures describe the characteristic value of people with NPS and those with the early onset of AD. We may assess the adjustment of each voice to the model prototype for each one of the samples (see Fig. 2, spectrogram of the prototype voice for each voice sample).

Waveform (upper panel) and broadband spectrogram (middle panel) for the sentence “En un lugar de la Mancha” (In a village of La Mancha) showing the spectral energy of the sound over time; below, the speaker’s pitch contour superimposed. The spectrogram shows the speech emissions from a normal 84-year-old male (A) and an 80-year-old male patient with Alzheimer’s disease. Lower figure (B) shows a flat, low intensity, and monotonous speech, without inflection. It also shows that the spectral fine structure of the speech signal is highly degraded (presence of noise between harmonics, substitution of harmonics by noise, graphical irregularity, and reduced series of harmonics), essentially resulting in a great loss of speech clarity.
4) The application of this function to the same participants informs us about the degree of coincidence between an individual diagnosis (with or without AD) and that predicted by the function, and thereby ascertain the extent of the proposed model’s crossed validity. The outcome was that each participant was correctly classified in 92.4% of the cases (a classification of normality is predicted in 97% of cases). In the control cross-validation, these variables correctly classified 91% of the cases (rising to 97% in the case of the normality group). In addition, based on the final score the procedure described provides the degree of probability of belonging to the NPS and AD groups. table 5 shows the probabilities (expressed as a percentage) corresponding to each value of the final score as regards its association with “Normality”. The complement to this probability should be understood as the “Probability of developing AD”. The interval of values ranges from “Normality highly probable” to “Dementia highly probable”.
Probability comparison scale
The internal data recorded are stored on a 32GB microSDHC memory card, and a summary of the assessment may be printed using a Mini Thermal Receipt Printer Starter Pack. The data can be downloaded to a Kingston Digital 8GB USB 3.0 DataTraveler 50. The prototype has now progressed to stage 3 in the clinical validation process.
DISCUSSION
Our experimental work in recent years has led to the development of an original technique with unique features for measuring changes in cognitive language skills in any disorder that affects speech processes. The automatic voice analyzer is an innovative, reliable, non-invasive and economical response to current demands in society, in which the rate of AD is steadily increasing. One of the most important aspects of VAD-AD is its ability to automatically analyze temporal and acoustic voice markers to flag the onset of AD in its pre-clinical states. We are not the only ones to have detected these markers [12], as other research groups have done so independently [22, 37], thereby confirming their major scope for diagnosing AD.
As a diagnostic tool, the VAD-AD is a small, light, portable device, which means it can be used almost anywhere in terms of space and/or location. It is wholly suitable for use during normal assessment procedures in clinical-health facilities. One of the VAD-AD analyzer’s main advantages is its non-invasive nature, which contrasts with the PET imaging, or cerebrospinal fluid punctures [38]. What’s more, VAD-AD involves a test that does not require any further training on the part of the evaluator, and any professional that wishes to assess a patient may do so. Easy to use by professionals, this device may be deployed by any center or facility specializing in the treatment and diagnosis of pathological senescence, which will find it easy to use at an affordable cost.
Although this study confirms the predictive power of VAD-AD— coinciding with other European research groups involved in the same line of work [17], it should be noted that speech and voice alterations do not always have a clear etiologic, which means that more research is required in this field to gain a more precise understanding of the predictive value of the variables in question, and possibly discover other variables that have not been considered here. We are currently working on the design of a model for testing speech and language that will enable us to include more variables associated with the language pathology caused by neurodegeneration. We are therefore aware of the limitations of the language test used in the VAD-AD prototype and of the need to review it, not only for including the component of its spontaneous production, but also to fulfil the criterion of the phonetic representativeness of the samples analyzed.
Finally, the assessment of the early diagnosis of dementia should be accompanied by the battery of neuropsychological tests professionals’ use, as a negative result in a voice analysis in the case of neuropsychological assessment requires sending a potential patient for a full neurological evaluation.
Footnotes
ACKNOWLEDGMENTS
My Research Project was partially or fully sponsored by University of Salamanca Foundation with Fondos FEDER and Junta of Castile and León (Project: “Desarrollo del prototipo de la aplicación informática (APP) del analizador biométrico de la voz para la detección precoz de la enfermedad de Alzheimer (DAV-EA)”. Reference: PC_TCUE1517_F2_014.
This research has been carried out under cover of the State Reference Care Centre for Persons with Alzheimer’s disease and other Dementias (CRE Alzheimer Salamanca; IMSERSO), SPAIN.
