Diagnosis Framework for Probable Alzheimer’s Disease and Mild Cognitive Impairment Based on Multi-Dimensional Emotion Features

Abstract

Background:

Emotion and cognition are intercorrelated. Impaired emotion is common in populations with Alzheimer’s disease (AD) and mild cognitive impairment (MCI), showing promises as an early detection approach.

Objective:

We aim to develop a novel automatic classification tool based on emotion features and machine learning.

Methods:

Older adults aged 60 years or over were recruited among residents in the long-term care facilities and the community. Participants included healthy control participants with normal cognition (HC, n = 26), patients with MCI (n = 23), and patients with probable AD (n = 30). Participants watched emotional film clips while multi-dimensional emotion data were collected, including mental features of Self-Assessment Manikin (SAM), physiological features of electrodermal activity (EDA), and facial expressions. Emotional features of EDA and facial expression were abstracted by using continuous decomposition analysis and EomNet, respectively. Bidirectional long short-term memory (Bi-LSTM) was used to train classification model. Hybrid fusion was used, including early feature fusion and late decision fusion. Data from 79 participants were utilized into deep machine learning analysis and hybrid fusion method.

Results:

By combining multiple emotion features, the model’s performance of AUC value was highest in classification between HC and probable AD (AUC = 0.92), intermediate between MCI and probable AD (AUC = 0.88), and lowest between HC and MCI (AUC = 0.82).

Conclusions:

Our method demonstrated an excellent predictive power to differentiate HC/MCI/AD by fusion of multiple emotion features. The proposed model provides a cost-effective and automated method that can assist in detecting probable AD and MCI from normal aging.

Keywords

Alzheimer’s disease cognitive impairment emotion machine learning

INTRODUCTION

Global aging is a major worldwide trend. By 2050, the world’s population aged 60 years or older will reach 2.1 billion [1]. As the primary risk factor of neurodegenerative disease, the population of Alzheimer’s disease (AD) and mild cognitive impairment (MCI) are expected to increase substantially worldwide [2, 3]. Although patients can benefit from pharmacological and non-pharmacological approaches, managing AD and MCI is very challenging [4]. Recognizing warning signs and early diagnosis of MCI and AD, therefore, are of crucial importance and remains a public priority. In clinical practice, a diagnosis of AD depends on highly skilled clinicians to conduct a systematic examination that includes an inquiry of patient history, an objective neurological assessment of cognition, and a structural MRI scan [5]. The AD detection procedure is time-consuming, and mainly relies on the clinician’s expertise, resulting in a sensitivity ranging between 70.9% and 87.3%, and a specificity between 44.3% and 70.8% [6]. During the last decade, significant advances in research have been made toward detecting AD pathology using cerebrospinal fluid (CSF) biomarkers, positron emission tomography (PET) amyloid, and tau imaging [7]. However, these methodologies are invasive in nature, costly, and inconvenient. It is highly desirable to develop a cost-effective, efficient, and non-invasive MCI/AD assessment procedure, which encourages large-scale screening in community settings and accelerates the search for effective treatments.

Emotion is a multi-dimensional construct that is reflected by physiological features (e.g., electrodermal activity, EDA), mental features (e.g., subjective feelings), and outwardly presenting motor activity (e.g., facial expression) [8]. Disturbed emotion perception and social cognition have been increasingly recognized in AD [9 –11] and MCI [12]. Facial expression is a promising emotional fingerprint for underlying AD pathophysiology [13], and it has been widely investigated in the literature [14 –17]. Advanced machine learning paradigms, together with facial expression and motion analysis, offer new ways to differentiate AD from healthy aging individuals. A major challenge at this stage, however, is to understand the dynamic emotional deficits in AD individuals and how these deficits can be combined into one prediction model. Emotion processing involves at least two processes: the identification of the emotional significance of stimuli and the production of an affective state in response to the stimuli [18]. We hereby propose a novel paradigm by presenting movie scenes with intensive emotional content to the participants, and by collecting multidimensional data demonstrating both identification and production processes. This paradigm has several advantages. First, we used more ecological stimuli of films as the mood induction procedure, which could facilitate the evocation of psychophysiological, cognitive, and motor responses to emotion [19]. A recent study investigated dimensional and discrete emotional reactivity in AD individuals and proposed film clip as a research tool in dementia [20]. Second, we utilized both subjective questionnaire data and objective arousal data to index emotion identification. Most previous studies used photographs portraying the six basic emotions and required participants to select from a multiple-choice list. This method cannot reveal the individuals’ subjective experience. In addition, EDA is a promising marker of emotion recognition [21], but few studies have used EDA during emotional induction in AD individuals. Third, we employed facial expression analysis technology to decode facial expressions as the product of emotions. Based on analysis of facial dynamics, a recent study demonstrated classification of apathetic patients and non-apathetic individuals [22].

In last decades, deep machine learning approach has been used as the main method for predicting AD diagnosis. Deep learning approaches have wide applications in the biomedical community and demonstrate high accuracy classification across a broad spectrum of disease, such as the classification of cognitive status by using neuroimaging data [23] and electroencephalogram data [24]. It is noted that recent machine learning approach has been applied for facial expressions or EDA data to detect cognitive status. For instance, a deep learning-based model of facial emotion expression was developed to detect cognitive impairment, and the results were stable independent of sex, race, age, education level, mood, and eye movements [25]. Another study utilized a machine and deep learning approach on facial expressions to predict cognitive states and cognitive skills [26]. A further study measured the reliability of facial expressions on predicting cognitive performance across a diverse set of cognitive tasks [27]. Meanwhile, machine leaning models on EDA has been used to classify cognitive tasks [28] and to predict cognitive level [29] as well as cognitive impairments [30]. However, machine learning based on single-modal EDA or facial expression may not yield satisfactory results for differentiating individuals with cognitive impairments [31]. It is challenging to classify individuals with cognitive impairment based solely on a single model.

The ability to perceive others’ emotions and to express one’s own emotions is critical for social functions and daily activities. Dysfunction of this ability can lead to interpersonal difficulties [32, 33], a poor quality of life [34], a reduced ability to live independently [35], and caregiver burden [36]. On the other hand, neuropsychiatric and behavioral symptoms, such as depression and anxiety, are common in patients with AD [37], which have been linked to impaired emotion processing [35]. All these demonstrate the practical implications and significance for emotional functions in AD individuals. The primary objective of the study is to develop a novel machine learning framework to classify individuals with healthy aging, MCI, and AD, by using combined subjective and objective measures of emotion recognition and facial expression measures of emotion production.

METHODS

Participants

Older adults aged 60 years or older were recruited from residents in the long-term care facilities and the community. Participants included healthy control individuals with normal cognition (HC), patients with MCI, and patients with probable AD. A diagnosis of probable AD was made on the basis on the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) criteria [38], and the Mini-Mental Status Examination (MMSE) criteria [39]. A diagnosis of MCI was operationalized according to Petersen’s revised diagnostic criteria of MCI [40]: (a) subjective reported memory loss; (b) preserved daily living functions with a score for the activities of daily living (ADL) less than 16 [41]; (c) absence of dementia as defined by the criteria in the Diagnostic Statistical Manual of Mental Disorders (5^th ed.) (DSM-V) [42] and the Montreal Cognitive Assessment (MoCA) criteria [43]. The HC participants were older adults: (a) with normal scores on the MMSE and MoCA; (b) had no memory or cognitive complaints; (c) had no history of diagnosed AD or MCI. Both MMSE and MoCA cutoff scores were determined with consideration of the education levels of participants. All participants were native Chinese speakers. Exclusion criteria were: (a) a diagnosis of neurological (other than AD) or psychiatric diseases according to the DSM-V; (b) severe visual or hearing impairment that may affect participation in the assessments. This study was carried out in accordance with the recommendations of the Declaration of Helsinki, with written informed consent obtained from all participants. The protocol was approved by the Ethics Committee of the School of Medical Science at Jinan University (JNUKY-2022-049).

Emotion assessments

We used six films to elicit the basic emotions: neutral, sadness, disgust, fear, anger, and happiness. The film clips were selected from the standardized Chinese Affective Video System (CAVS), which was validated in a sample of Chinese adults [44]. The film stimuli were presented on a 14” LED computer screen (1920 × 1080 resolution) and in a counterbalanced order. The participants watched the films at a distance of 40 cm and heard the sound through a pair of soundproof headphones. The duration of the experiment was approximately 40 min, administered using BioTrace+software (Mind Media BV, Netherlands). The experiment began with a resting period (3 min) to establish the baseline of physiological measurements. To allow the participant to return to their baseline emotional state, the film slips were separated by a 60-s resting period during which the participant closed eyes and relaxed.

During the experiment, the subjective responses were assessed immediately after each film clip, and the physiological responses were measured on a continuous basis. The subjective emotional response was accessed using the Self-Assessment Manikin (SAM) [45]. The participant was required to complete a 9-point Likert-type scale to indicate the level of arousal and valance. Graphic figures were used to represent different emotional states to make it easy for participants to understand. The physiological emotional response was assessed by EDA on a Nexus-10 system with BioTrace+Software (Mind Media BV, Netherlands). Two flat 10-mm Ag/AgCl dry electrodes were fixed at the phalanges of the ring and middle fingers of the non-dominant hand to collect the EDA data. Based on the protocol on skin conductance measurement [46], we collected EDA data of the subject’s non-dominant hand to reduce the noise induced by skin conductance level and sweat gland activity. Facial expressions were captured while the participant was watching the film clips using a HD camera (1980 × 1080 resolution). The overall experiment protocol is shown in Fig. 1.

Fig. 1

Overall experiment protocol: (A) baseline measurement; (B) emotion film clips watching; (C) subjective measurement for mental features; (D) rest period.

Data collection

Data was collected in quiet and comfortable environment. Data with incomplete coverage of face during capture or significant EDA noise due to hand movement were excluded from analysis. A total number of 30 probable AD patients, 23 MCI patients, and 26 HC individuals were included in our analysis.

Development of the model

Features extraction

To extract facial emotion features, the two-dimensional continuous model of valence and arousal [47] was used to represent facial emotions (Fig. 2). The advantage of this expression definition method is that it can distinguish subtle differences between different expressions with the help of continuous values, thereby helping computers to better understand human expressions.

Fig. 2

Two-dimension continuous valence and arousal features.

EmoNet is a deep neural network framework developed by Samsung AI Research Institute and Imperial College London (https://github.com/face-analysis/emonet). EmoNet is an expression recognition model based on deep learning, which can perform emotion classification on face images, and accurately recognize the emotions expressed by facial expressions, such as happiness, sadness, anger, etc. The underlying architecture adopts Convolutional Neural Network (CNN), which can automatically extract image features, thereby effectively capturing the details and characteristics of human facial expressions, and then perform emotion classification. EmoNet is a neural network developed from a database of 2,185 videos to be classified by using color, spatial power spectrum, and the presence of objects and faces in images. The concordance correlation coefficient (CCC) and Pearson correlation coefficient (PCC) evaluation indicators of the EmoNet model performed well in the three data sets of AFEW-VA, AffectNet, and SEWA [48]. Importantly, EmoNet has been widely used in the emotion literature [49 –51]. We used the pre-trained Emonet network model to extract two-dimensional Valence-Arousal continuous features from the collected videos frame by frame: $\begin{matrix} [[v_{f 1}, a_{f 1}],] v_{f 2}, a_{f 2}], . . . .,] v_{fn}, a_{fn}]] \end{matrix}$ (1)

Among them, v_fn and a_fn respectively represent the valence value and the arousal value of the nth frame of videos (Fig. 3).

Fig. 3

Two-dimensional valence-arousal continuous features in a time series.

According to the preprocessing result of the formula (1), we extracted a four-dimensional representation for the two-dimensional continuous emotional feature trajectory line between the two adjacent frames of each video: $\begin{matrix} E_{i} = [v_{i}, a_{i}, Δ v_{i}, Δ a_{i}] \end{matrix}$ (2)

Where Δv_i = v_i+1 - v_i, Δa_i = a_i+1 - a_i respectively represent the difference in the value of arousal/valence between two adjacent frames, which represent the intensity of emotional change. Through the representation conversion method of Eq. (2), we can extract two-dimensional valence-arousal continuous features [[v_f1, a_f1], [v_f2, a_f2],..., [v_fn, a_fn]] convert to [E₁, E₂, …, E_n].

For EDA data, we performed continuous decomposition analysis (CDA) and standard trough to peak (TTP) on the original skin electrical signal based on the Ledalab toolbox. Ledalab toolbox is an open source Matlab-based software for analysis of skin conductance data and it has been widely used for research purpose. CDA decomposes the original signal into continuous phase and tonic activity signals. This method extracts the phasic (driver) information underlying EDA and aims at retrieving the signal characteristics of the underlying sudomotor nerve activity (SNA). This is useful for all unbiased scoring analysis of staged and stressful activities, and the event related activation finally obtained. The TTP is working as below. First, detection of skin conductance response (SCR) that is the segment of the data corresponding to physiological responses. Second, identification of trough and peak. The trough is the lowest point in skin conductance within the SCR, which represents the baseline skin conductance level when individual is at rest. The peak is the highest point in skin conductance within the SCR, which represents the highest skin conductance level reached during the response. Finally, the amplitude of the SCR is calculated as the difference between the peak and the trough, which is often expressed as “Amplitude = Peak – Trough”.

Recurrent neural network (RNN)

The RNN has strong processing capacity for variable length sequence data since it can well mine the timing information and semantic information in data. In current study, we used bidirectional long short-term memory (Bi-LSTM) [52] to train classification model. LSTM is designed to address the issues of gradient explosion and gradient disappearance commonly observed in general cyclic neural networks by adjusting the gradient through the use of a cell state. Its effectiveness has been widely demonstrated in various deep learning areas, such as time series prediction. First, we analyzed the EDA and facial video clips, including the stages of recognition, extraction feature fusion, and PCA dimensionality reduction, and then incorporate the preprocessed information into the RNN model. Our study used data augmentation methods for multimodal datasets, including adding noise and random window sampling. We adopted two-layer Bi-LSTM networks to improve the generalization performance. The number of neurons in each layer was set 100 and the dropout was set as 0.8. The optimizer adopts Adam, where the batch size and the initial learning rate were set 256 and 0.001 respectively [53].

Multimodal fusion

Multimodal fusion aims to improve the accuracy of data analysis by leveraging the complementary information between different modal data. Multimodal fusion strategies mainly include data fusion, feature fusion, decision fusion, and hybrid fusion. Among these, feature fusion refers to the fusion of multiple independent data sets into a single feature vector that as the input into the machine learning classifier. Decision fusion is to fuse the output decisions of the classifiers trained separately with a single modality. Hybrid fusion can combine the advantages of both early feature fusion and late decision fusion strategies. Therefore, this study adopted a hybrid fusion strategy to investigate the performance and practicality of multimodal cognitive dysfunction recognition.

Overall architecture

The overall framework of hybrid fusion cognitive dysfunction recognition method based on multiple attention mechanism is shown in Fig. 4. We summarize the framework as below. First, mono-modal feature extraction and fusion. The facial expression module took facial expression videos from six different induced scenarios and fed them into a pre-trained EmoNet model to extract facial features, including categorical emotions, predicted valence and arousal values, as well as facial landmarks. These facial features were combined into a composite vector using an “add” parallel fusion strategy. The composite vector was then input into PCA and Bi-LSTM to extract well-fused mono-modal features. The EDA module follows a similar process, and it used a fully connected layer as the input layer rather than the EmoNet model. The SAM module took subjective emotional assessment and input it into two Conv1D layers for convolution operations, extracting features from the data. Second, multi-modal feature fusion. The extracted feature data from the mono-modal feature fusion stage was input into a fully connected layer. This layer cross-connected feature information from different branches and fed it into the Multi-Head Attention Mechanism module. This allows the model to effectively organize and utilize the features extracted during the first stage. Finally, multi-modal fusion feature classification. After aggregating mono-modal features through the Multi-Head Attention Mechanism, the final output was fed into a softmax classifier to achieve the ultimate output.

Fig. 4

Overall framework of hybrid fusion.

Data augmentation

For the facial expression data, we used the Dlib tool to detect the faces on each frame of the videos. The size of the cropped face area will be resized to (256, 256). Given the valence-arousal two-dimensional emotion representation, we used upsampling to obtain more label data, and unified the data of various scales for machine learning. “Upsampling” is a method used to increase the amount of data available for training machine learning models. Six video clips were presented and the duration of each clip ranged from 1 min 25 s to 3 min 32 s. The clips were standardized as below. The sampling interval τ is determined by both the length of the sequence VA_Land the first dimension of the input tensor of RNN Seq_len: $\begin{matrix} m = {VA}_{L} / {Seq}_{len} \end{matrix}$ (3)

In our experiment, Seq_len was set as 300, and the length of two-dimensional emotional representation sequence VA_L is the total number of frames of video N_frame, which is determined by the total duration of the video T_video and the frame rate of video fps, as N_frame = T_video * fps. Based on cross-correlation experiments using a sliding window on the facial feature sequences, we found that windows with higher cross-correlation coefficients were mostly occurred at the later stage of the time series. Then we selected the second half sequence as the effective sequence for data analysis. High cross-correlation between different sequences or data signals is important because it indicates a strong linear relationship or similarity between them. High cross-correlation suggests that these data segments share consistent patterns or features across different individuals or conditions, making them more effective in capturing relevant emotional responses or patterns of the participants. Moreover, the patterns identified in this segment are less affected by individual differences or noise, thus enhancing the reliability of emotional responses [54, 55]. Since the training data set has a small amount of data, and the network is easy to overfit, our study used data augmentation methods for multimodal datasets, including adding noise and random window sampling. In our experiments, we augmented the original data from each subject to generate 10 multimodal data entries by introducing noise and employing random window sampling. We divided the dataset into a training set and a testing set to ensure that there is no information leakage when training the model on the augmented dataset. Adding noise means randomly adding a certain amount of Gaussian noise to each value to generate a new sequence without affecting the overall properties and label information of the sequence, to prevent the machine learning model from over fitting. The Gaussian noise is a subtype of statistical noise that follows a Gaussian distribution (normal distribution). It is a random noise with a mean (average) of zero and a symmetric, bell-shaped probability density function. In the context of data and signal processing, Gaussian noise is used to represent random variations and uncertainties in measurements or data. Random window sampling involves definition of a fixed-length window at the beginning of the original data sequence. Within this window of 3 to 5 s, a random position is selected, and data points are sampled at a specified sampling frequency (300 frames per video data sample) starting from that position to create a new sequence. This process can be repeated multiple times to generate multiple training sample datasets from the original data of a subject. We adopted two-layer BI-LSTM networks to improve the generalization performance. The number of neurons in each layer was set 100 and the dropout was set as 0.8. The optimizer adopts Adam [56], where the batch size and the initial learning rate were set 256 and 0.001 respectively. The division ratio of training set and test set is 9 : 1. Since there are a limited number of samples in the dataset, we evaluate the performance using K-fold cross-validation, in which a small part of the samples constitutes the testing set, while most of samples are used to train the model. Specifically, we employed the k-fold cross-validation approach. Initially, the dataset D was randomly divided into k equally sized mutually exclusive subsets. In each iteration, k-1 subsets were randomly chosen as the training set, with the remaining 1 subset being used as the test set. In our experiments, the training-to-testing ratio was 9:1, resulting in K = 10. Augmented sequences from the same subject were exclusively present in the training set to prevent any information leakage. If they were included in the test set, it would induce information leakage issues. As a result, the model’s generalization performance in real-world scenarios was preserved. All models have been implemented using Keras library with Tensorflow as backend and run on a NVIDIA titan-x 12 g GPU.

Transparency and openness

Since our data involves faces privacy of the participants, the material has not been made accessible. The analytic plan was not preregistered. The analysis code is available in https://github.com/funkylun/emotion_machine_learning. We calculated the statistical power, using the G * Power software (v. 3.1.9.6.) for the one-way ANOVA. For an effect size f = 0.4, α= 0.05, 1-β= 0.80. We calculated the total sample size being 66, non-centrality parameter λ = 10.56, critical F = 3.14, actual power = 0.82. Referring to previous studies of facial machine learning [22] and considering 10% of the lost, we finally determined the minimum total sample size as 73.

RESULTS

The demographic and neuropsychological characteristics are shown in Table 1. Descriptive statistics (mean and standard deviation) were performed on participants’ sociodemographic information and neuropsychological test results. Analysis of variance (ANOVA) and t-test were used to examine group differences for continuous variables (age, MMSE and MoCA scale scores), and the chi-square test examined group differences on gender and education level. Between-group differences were not significant in age, gender, and education level (p > 0.05) and significant in neuropsychological assessments (p < 0.05).

Table 1

Demographic and neuropsychological scores of the participants

Variable	HC (n = 26)	MCI (n = 23)	AD (n = 30)	p ¹
Age (X±SD)	75.88±7.15	76.61±3.06	78.67±4.05	0.395
Gender
Female	20	6	14	0.123
Male	6	17	16
Education level (y)
≤6	11	9	8	0.725
7– 12	11	11	18
≥13	4	3	4
MMSE (X±SD)	27.08±2.54	25.30±4.55	13.37±4.05	<0.001
MoCA (X±SD)	24.19±2.15	18.91±3.49	–	<0.001

We have investigated three binary classification tasks: (I) HC versus probable AD, (II) MCI versus probable AD, and (III) HC versus MCI. The classification results are shown in Table 2. The performances show as mean±standard deviation (STD) over the 10 runs of 4 methods. Compared with the single feature method, the multimodal fusion feature method improved the performance. Among the three classification tasks, classification between HC and probable AD demonstrated the best classification effect (Accuracy (ACC): 88.3±5.1%; Specificity (SPE): 89.3±3.0%; Sensitivity (SEN): 86.1±5.1% F1-score: 86.8±4.4%), followed by classification between MCI and probable AD (ACC: 87.3±4.8%; SPE: 88.3±4.3%; SEN: 85.3±4.3%; F1-score: 85.7±5.3%) and classification between HC and MCI (ACC: 76.6±6.4%; SPE: 80.3±4.4%; SEN: 75.6±4.4%; F1-score: 76.2±3.4%). The ROC results are shown in Fig. 5. Among the three tasks, the best AUC effect was HC versus probable AD (AUC = 0.92), followed by MCI versus probable AD (AUC = 0.88) and HC versus MCI (AUC = 0.82).

Table 2

The performance of different features

Feature	ACC			SPE			SEN			F1-score
	AD	AD	MCI	AD	AD	MCI	AD	AD	MCI	AD	AD	MCI
	versus	versus	versus	versus	versus	versus	versus	versus	versus	versus	versus	versus
	HC	MCI	HC	HC	MCI	HC	HC	MCI	HC	HC	MCI	HC
Facial expression	76.2±3.2	72.5±3.5	66.1±3.7	79.2±2.9	74.1±3.5	69.1±3.7	75.1±3.2	70.5±3.5	64.1±3.7	75.4±3.2	71.3±3.1	64.8±4.2
SAM	77.0±4.2	70.5±5.4	66.5±5.3	80.0±3.2	75.5±5.4	68.5±5.2	76.0±4.6	68.6±2.4	64.0±3.3	76.6±4.4	69.4±4.4	64.7±4.3
EDA	66.3±4.3	58.5±3.5	52.5±4.5	69.3±5.1	62.3±3.5	54.5±4.8	64.3±4.3	56.2±3.5	50.4±4.5	64.8±5.3	56.8±3.7	51.1±3.7
Facial expression+EDA	81.2±3.5	73.5±4.5	69.1±4.7	83.2±3.8	76.6±4.5	71.1±4.7	80.1±3.2	71.1±4.3	68.1±4.5	81.4±3.6	71.8±3.2	68.9±5.3
Facial expression+SAM	84.0±4.1	76.5±3.4	67.5±2.3	87.0±3.6	79.5±3.4	68.9±2.8	82.0±4.1	74.8±4.4	66.7±4.3	82.6±4.71	75.3±4.4	67.4±4.3
EDA+SAM	79.3±4.4	64.5±3.9	56.5±5.5	82.3±4.2	67.2±3.9	58.1±3.5	77.2±4.5	63.3±3.3	55.5±5.0	77.9±3.5	63.9±5.1	56.2±4.2
ALL	88.3±5.1	87.3±4.8	76.6±6.4	89.3±3.0	88.3±4.3	80.3±4.4	86.1±5.1	85.3±4.3	75.6±4.4	86.8±4.4	85.7±5.3	76.2±3.4

¹Determined by ANOVA or chi-square test.

Fig. 5

ROC curve in classification tasks.

DISCUSSION

Our developed model demonstrates a strong predictive power to differentiate between old adults with normal cognition, patients with MCI, and patients with probable AD. By using multidimensional emotion variables, the system is a good supplementary for cognitive assessment among older adults. Importantly, the new automatic system is convenient and cost-effective for large-scale screening for cognitive impairment in community settings, showing great clinical implications and potential value for public health.

Emotion is intercorrelated with cognition [57]. It is well established that individuals with cognitive impairments demonstrate deficits in emotion perception [58]. For instance, AD patients had difficulties in identifying or recognizing negative emotions [20]. Furthermore, accumulating evidence has indicated that cognitive deficits are associated with less specific facial expressions, suggesting that AD individuals may have difficulties in expressing emotions [16]. These emotional deficits may be attributed to the atrophy and neuropathological changes in the amygdala and hippocampus that occur in older adults with cognitive impairment [59]. On the other hand, reduced episodic memory in AD patients could further hinder their ability to respond to emotional stimuli [60]. The deficits in emotion processing could occur early at the MCI stage [61], although several studies have suggested that the MCI individuals and healthy older adults have comparable performance at the emotion recognition task [62]. Nevertheless, the evidence on emotional deficits is not enough in the populations of MCI.

We used multi-dimensional data, including subjective questionnaire data, electrodermal activity, and facial expressions, to differentiate and to predict AD, MCI, and healthy individuals. Each of this method has strengths and weaknesses, making it challenging to accurately predict cognitive status using any single method. Subjective emotional questionnaires collect specific information about individual’s emotional experiences, reflecting their subjective emotional feelings [63]. However, AD patients often have cognitive impairments that affect their reading, comprehension, and language abilities, making it difficult for them to accurately express their emotions [16]. Additionally, AD patients may have memory biases that lead to inconsistencies between questionnaire results and their actual emotional experiences. Emotion and attention are interleaved. By enhancing attention, individuals can increase the ability to detect emotional stimuli. Skin conductance responses can detect emotional physiological information objectively, even if emotional target stimuli are not perceived [64]. Physiological signals have the advantage of providing accurate and objective emotional physiological information and can record on-line emotional physiological responses during stimulus presentation [65], which can also minimize the influence of recall bias and memory errors. Facial expressions reflect an individual’s inner feelings, emotions, motivations, and needs [66]. Recognizing facial expressions involves perceiving visual configurations, evaluating emotional values. Taken together, the three methods (subjective rating, skin conductance, and facial expressions) complement each other and provide a whole picture of emotion.

It is noticed that the MCI versus HC classification showed the worst SPE, SEN, and AUC values compared with the other two classification tasks. The possible reasons for this phenomenon are summarized as below. First, emotion and cognition are highly correlated [67]. A systematic review concluded that emotional processing ability will further deteriorate with the continuous progression of neurodegenerative diseases [68]. In other words, emotion impairment is more pronounced in AD than MCI [69]. Second, the findings on emotion impairment in MCI are not consistent [68]. For example, amnestic MCI multiple domain patients demonstrated impairments in emotion recognition, but this impairment could not be found in amnestic MCI single domain [70]. The current study did not differentiate between sub-types of MCI. Finally, we did not find significant difference in the skin conductance response and questionnaire evaluation between MCI individuals and HC, which in agreement with the poor classification results by machine learning.

A convenient screening instrument to assess cognitive impairment in community-dwelling older adults would be beneficial for public health. Currently, evaluation of probable AD and MCI requires a cognitive examination, primarily based on face-to-face interviews with patients or their caregivers. Two commonly used screening tools are MMSE and MoCA, which rely on experienced clinicians or neurologists and require 10 to 20 min to administer. These two instruments should consider a variety of factors, such as culture, language, and education level [39, 71]. Recent advances in technology showed promises in promoting sensitivity and specificity for identifying AD. For instance, the PET could show abnormalities suggestive of AD with a sensitivity of 91% and a specificity of 85% [72, 73]. A recent study used electroencephalogram (EEG) metrics to detect preclinical AD, which found a non-linear relationship between amyloid burden and EEG metrics [24]. Although above methods can reliably detect AD/MCI, these approaches are expensive and require presence in hospitals. A more automated screening method for AD/MCI is needed. For instance, a recent study presented a new automatic method could successfully detect apathy symptom in dementia by analyzing facial emotion and motion [22]. We observed excellent predictive power by emotion assessment, achieving 92.5% AUC for HC versus probable AD, 88.9% AUC for MCI versus probable AD, and 82.7% AUC for HC versus MCI. The performance demonstrated higher power than natural language processing approach that using voice recordings in a recent study [74]. Importantly, the proposed method seems to be more convenient and cost-effective for screening AD and MCI by collecting multimodal emotional data in community-dwelling individuals.

The current study is not without limitations. First, the study only recruited individuals with probable AD and MCI. Different neuropsychiatric disorders have different emotion features. For example, older adults with depression tended to perceive neutral or ambiguous facial expressions as sadness and had reduced abilities in recognizing all basic emotions except sadness [75]. Post-stroke mood and emotional disturbances were frequent and common, which had diverse manifestations in symptoms [76]. The practical application of this method requires further investigations in other neuropsychiatric disorders. Second, although the MMSE and MoCA are standard measures for determining the cognitive status of older adults, the diagnosis of AD and MCI is not confirmed by biomarker or brain imaging evidence. The cost of collecting biomarkers, such as CSF and PET imaging, is very expensive and may carry certain risks and side effects. In addition, the participants were recruited among residents in the long-term care facilities and the community, where the diagnostic biomarkers were not available. In future, it is hoped to use PET imaging, CSF, and other objective indicators to confirm the diagnosis of AD and MCI. Finally, the results cannot be extended by using multivariate classification to differentiate HC/MCI/probable AD, due to the limited sample size. In future, it is hoped to investigate the multivariate classification of cognitive disorders in a large sample.

Footnotes

CONCLUSION

Our method demonstrated an excellent predictive power by fusion of multiple emotion features. This study provides a cost-effective, automated method that can help detecting probable AD and MCI accurately.

ACKNOWLEDGMENTS

We thank all the participants who enthusiastically participated in the research.

FUNDING

This work was funded by research grants from the National Key R & D Program of China (2020YFC2005802); the National Natural Science Foundation of China (82172530).

CONFLICT OF INTEREST

The authors have no competing interests to declare.

DATA AVAILABILITY

The data supporting the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

The supplementary material is available in the electronic version of this article: .

References

WHO, Ageing and health, Last updated October 2022, Accessed on October 2022.

Association

(2019) Alzheimer’s disease facts and figures. Alzheimers Dement 15, 321–387.

Hou

, Dan

, Babbar

, Wei

, Hasselbalch

, Croteau

, Bohr

(2019) Ageing as a risk factor for neurodegenerative disease. Nat Rev Neurol 15, 565–581.

Knopman

, Amieva

, Petersen

, Chételat

, Holtzman

, Hyman

, Nixon

, Jones

(2021) Alzheimer disease. Nat RevDis Prim 7, 1–21.

McKhann

, Knopman

, Chertkow

, Hyman

, Jack

Jr , Kawas

, Klunk

, Koroshetz

, Manly

, Mayeux

(2011) The diagnosis of dementia due to Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement 7, 263–269.

Beach

, Monsell

, Phillips

, Kukull

(2012) Accuracy of the clinical diagnosis of Alzheimer disease at National Institute on Aging Alzheimer Disease Centers, 2005– 2010. J Neuropathol Exp Neurol 71, 266–273.

Blennow

, Zetterberg

(2018) Biomarkers for Alzheimer’s disease: Current status and prospects for the future. J Intern Med 284, 643–663.

Jungilligens

, Paredes-Echeverri

, Popkirov

, Barrett

, Perez

(2022) A new science of emotion: Implications for functional neurological disorder. Brain 145, 2648–2663.

Freedman

, Binns

, Black

, Murphy

, Stuss

(2013) Theory of mind and recognition of facial emotion in dementia: Challenge to current concepts. Alzheimer Dis Assoc Disord 27, 56–61.

10.

Heilman

, Nadeau

(2022) Emotional and neuropsychiatric disorders associated with Alzheimer’s disease. Neurotherapeutics 19, 99–116.

11.

Klein-Koerkamp

, Beaudoin

, Baciu

, Hot

(2012) Emotional decoding abilities in Alzheimer’s disease: A meta-analysis J Alzheimers Dis 32, 109–125.

12.

Spoletini

, Marra

, Di Iulio

, Gianni

, Sancesario

, Giubilei

, Trequattrini

, Bria

, Caltagirone

, Spalletta

(2008) Facial emotion recognition deficit in amnestic mild cognitive impairment and Alzheimer disease. Am J Geriatr Psychiatry 16, 389–398.

13.

Gola

, Shany-Ur

, Pressman

, Sulman

, Galeana

, Paulsen

, Nguyen

, Wu

, Adhimoolam

, Poorzand

(2017) A neural network underlying intentional emotional facial expression in neurodegenerative disease. Neuroimage Clin 14, 672–678.

14.

Burton

, Kaszniak

(2006) Emotional experience and facial expression in Alzheimer’s disease. Aging Neuropsychol Cogn 13, 636–651.

15.

Chen

K-H

, Lwi

, Hua

, Haase

, Miller

, Levenson

(2017) Increased subjective experience of non-target emotions in patients with frontotemporal dementia and Alzheimer’s disease. Curr Opin Behav Sci 15, 77–84.

16.

Henry

, Rendell

, Scicluna

, Jackson

, Phillips

(2009) Emotion experience, expression, and regulation in Alzheimer’s disease. Psychol Aging 24, 252.

17.

Mograbi

, Brown

, Salas

, Morris

(2012) Emotional reactivity and awareness of task performance in Alzheimer’s disease. Neuropsychologia 50, 2075–2084.

18.

Phillips

, Drevets

, Rauch

, Lane

(2003) Neurobiology of emotion perception I: The neural basis of normal emotion perception. Biol Psychiatry 54, 504–514.

19.

Fernández-Aguilar

, Ricarte

, Ros

, Latorre

(2018) Emotional differences in young and older adults: Films as moodinduction procedure. Front Psychol 9, 1110.

20.

Fernández-Aguilar

, Lora

, Satorres

, Ros

, Melendez

, Latorre

(2021) Dimensional and discrete emotional reactivity inAlzheimer’s disease: Film clips as a research tool in dementia. J Alzheimers Dis 82, 349–360.

21.

Boucsein

(2012) Electrodermal activity, Springer Science & Business Media.

22.

Happy

, Dantcheva

, Das

, Zeghari

, Robert

, Bremond

(2019) Characterizing the state of apathy with facial expression and motion analysis. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019) IEEE, pp. 1–8.

23.

Qiu

, Joshi

, Miller

, Xue

, Zhou

, Karjadi

, Chang

, Joshi

, Dwyer

, Zhu

(2020) Development and validation of an interpretable deep learning framework for Alzheimer’s disease classification. Brain 143, 1920–1933.

24.

Gaubert

, Raimondo

, Houot

, Corsi

, Naccache

, Sitt

, Hermann

, Oudiette

, Gagliardi

, Habert

, Dubois

, De Vico Fallani

, Bakardjian

, Epelbaum

(2019) EEG evidence of compensatory mechanisms in preclinical Alzheimer’s disease. Brain 142, 2096–2112.

25.

Jiang

, Seyedi

, Haque

, Pongos

, Vickers

, Manzanares

, Lah

, Levey

, Clifford

(2022) Automated analysis of facial emotions in subjects with cognitive impairment.. PLoS One 17, e0262527.

26.

Kerdawy

, Halaby

, Hassan

, Maher

, Fayed

, Shawky

, Badawi

(2020) The automatic detection of cognition using EEG and facial expressions. Sensors (Basel) 20, 3516.

27.

Sharma

, Niforatos

, Giannakos

, Kostakos

(2020) Assessingcognitive performance using physiological and facial features:Generalizing across contexts. Proc ACM Interactive, Mobile,Wearable Ubiquitous Technol 4, 1–41.

28.

Posada-Quintero

, Bolkhovsky

(2019) Machine learning models forthe identification of cognitive tasks using autonomic reactions fromheart rate variability and electrodermal activity. Behav Sci(Basel) 9, 45.

29.

Rahma

, Putra

, Rahmatillah

, Putri

YSKA

, Fajriaty

, Ain

, Chai

(2022) Electrodermal activity for measuring cognitive and emotional stress level. J Med Signals Sens 12, 155.

30.

Patient

, Ghali

, Kolivand

, Hurst

, John

(2021) Application of virtual reality and electrodermal activity for the detection of cognitive impairments. In 2021 14th International Conference on Developments in eSystems Engineering (DeSE) IEEE, pp. 156–161.

31.

, Li

, Zhang

, Wu

, Zhao

, Qiang

(2023) Synergy through integration of digital cognitive tests and wearable devices for mild cognitive impairment screening. Front Hum Neurosci 17, 1183457.

32.

De la Torre-Luque

, Viera-Campos

, Bilderbeck

, Carreras

, Vivancos

, Diaz-Caneja

, Aghajani

, Saris

IMJ

, Raslescu

, Malik

(2022) Relationships between social withdrawal and facial emotion recognition in neuropsychiatric disorders. Prog Neuro-Psychopharmacology Biol Psychiatry 113, 110463.

33.

Sollberger

, Neuhaus

, Ketelle

, Stanley

, Beckman

, Growdon

, Jang

, Miller

, Rankin

(2011) Interpersonal traits change as a function of disease type and severity in degenerative brain diseases. J Neurol Neurosurg Psychiatry 82, 732–739.

34.

Hasson-Ohayon

, Mashiach-Eizenberg

, Arnon-Ribenfeld

, Kravetz

, Roe

(2017) Neuro-cognition and social cognition elements of social functioning and social quality of life. Psychiatry Res 258, 538–543.

35.

Desmarais

, Lanctôt

, Masellis

, Black

, Herrmann

(2018) Social inappropriateness in neurodegenerative disorders. IntPsychogeriatr 30, 197–207.

36.

Martinez

, Multani

, Anor

, Misquitta

, Tang-Wai

, Keren

, Fox

, Lang

, Marras

, Tartaglia

(2018) Emotion detection deficits and decreased empathy in patients with Alzheimer’s disease and Parkinson’s disease affect caregiver mood and burden. Front Aging Neurosci 10, 120.

37.

Lyketsos

, Carrillo

, Ryan

, Khachaturian

, Trzepacz

, Amatniek

, Cedarbaum

, Brashear

, Miller

(2011) Neuropsychiatric symptoms in Alzheimer’s disease. Alzheimers Dement 7, 532–539.

38.

McKhann

, Drachman

, Folstein

, Katzman

, Price

, Stadlan

(1984) Clinical diagnosis of Alzheimer’s disease: Report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease. Neurology 34, 939–944.

39.

, Jia

, Yang

(2016) Mini-mental state examination in elderly Chinese: A population-based normative study. J Alzheimers Dis 53, 487–496.

40.

Winblad

, Palmer

, Kivipelto

, Jelic

, Fratiglioni

, Wahlund

, Nordberg

, Bäckman

, Albert

, Almkvist

, Arai

, Basun

, Blennow

, De Leon

, Decarli

, Erkinjuntti

, Giacobini

, Graff

, Hardy

, Jack

, Jorm

, Ritchie

, Van Duijn

, Visser

, Petersen

(2004) Mild cognitive impairment – Beyond controversies, towards a consensus: Report of the International Working Group on Mild Cognitive Impairment. J Intern Med 256, 240–246.

41.

Graf

(2008) The Lawton instrumental activities of daily living scale. AJN Am J Nurs 108, 52–62.

42.

American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders: DSM-5, American Psychiatric Association, Washington, DC.

43.

Nasreddine

, Phillips

, Bédirian

, Charbonneau

, WhiteheadV , Collin

, Cummings

, Chertkow

(2005) The Montreal CognitiveAssessment, MoCA: A brief screening tool for mild cognitiveimpairment. J Am Geriatr Soc 53, 695–699.

44.

Gilman

, Shaheen

, Nylocks

, Halachoff

, Chapman

, Flynn

, Matt

, Coifman

(2017) A film set for the elicitation of emotion in research: A comprehensive catalog derived from four decades of investigation. Behav Res Methods 49, 2061–2082.

45.

Bradley

, Lang

(1994) Measuring emotion: The self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry 25, 49–59.

46.

Lykken

, Venables

(1971) Direct measurement of skin conductance: A proposal for standardization. Psychophysiology 8, 656–672.

47.

Vaswani

, Shazeer

, Parmar

, Uszkoreit

, Jones

, Gomez

, Kaiser

, Polosukhin

(2017) Attention is all you need. arXiv, https://doi.org/10.48550/arXiv.1706.03762 [Preprint]. Posted June 12, 2017, last revised August 2, 2023.

48.

Toisoul

, Kossaifi

, Bulat

, Tzimiropoulos

, Pantic

(2021) Estimation of continuous valence and arousal levels from faces innaturalistic conditions. Nat Mach Intell 3, 42–50.

49.

Kahou

, Bouthillier

, Lamblin

, Gulcehre

, Michalski

, Konda

, Jean

, Froumenty

, Dauphin

, Boulanger-Lewandowski

(2016) Emonets: Multimodal deep learning approaches for emotion recognition in video. J Multimodal User Interfaces 10, 99–111.

50.

Xia

, Zhang

, Li

, Chen

, Min

, Han

(2022) Dynamic viewing pattern analysis: Towards large-scale screening of children with ASD in remote areas. IEEE Trans Biomed Eng 70, 1622–1633.

51.

Olier

, Spadavecchia

(2022) Stereotypes, disproportions, and power asymmetries in the visual portrayal of migrants in ten countries: An interdisciplinary AI-based approach. Humanit Soc Sci Commun 9, 410.

52.

Hochreiter

, Schmidhuber

(1997) Long short-term memory. Neural Comput 9, 1735–1780.

53.

Kinga

, Adam

(2015) A method for stochastic optimization. In International conference on learning representations (ICLR) San Diego, California, p. 6.

54.

Zebende

(2011) DCCA cross-correlation coefficient: Quantifying level of cross-correlation. Phys A Stat Mech its Appl 390, 614–618.

55.

Podobnik

, Horvatic

, Petersen

, Stanley

(2009) Cross-correlations between volume change and price change. Proc Natl Acad Sci U S A 106, 22079–22084.

56.

Kingma

, Ba

Adam: A method for stochastic optimization. arXiv Prepr arXiv14126980.

57.

Okon-Singer

, Hendler

, Pessoa

, Shackman

(2015) The neurobiology of emotion– cognition interactions: Fundamental questions and strategies for future research. Front Hum Neurosci 9, 58.

58.

Kumfor

, Sapey-Triomphe

, Leyton

, Burrell

, Hodges

, Piguet

(2014) Degradation of emotion processing ability in corticobasal syndrome and Alzheimer’s disease. Brain 137, 3061–3072.

59.

Lehericy

, Baulac

, Chiras

, Pierot

, Martin

, Pillon

, Deweer

, Dubois

, Marsault

(1994) Amygdalohippocampal MR volumemeasurements in the early stages of Alzheimer disease. Am JNeuroradiol 15, 929–937.

60.

Fairfield

, Colangelo

, Mammarella

, Di Domenico

, Cornoldi

(2017) Affective false memories in dementia of Alzheimer’s type. Psychiatry Res 249, 9–15.

61.

Cárdenas

, Blanca

, Carvajal

, Rubio

, Pedraza

(2021) Emotional processing in healthy ageing, mild cognitive impairment,and Alzheimer’s disease.. Int J Environ Res Public Health 18, 2770.

62.

Sheardova

, Laczó

, Vyhnalek

, Andel

, Mokrisova

, Vlcek

, Amlerova

, Hort

(2014) Famous landmark identification in amnestic mild cognitive impairment and Alzheimer’s disease.. PLoS One 9, e105623.

63.

Lang

(1980) Behavioral treatment and bio-behavioral assessment: Computer applications. Technol Ment Heal care Deliv Syst 119–137.

64.

Dolan

(2002) Emotion, cognition, and behavior. Science 298, 1191–1194.

65.

BoucseinW(2012) Electrodermal activity. Springer science & business media.

66.

Ekman

(1992) Are there basic emotions? . Psychol Rev 99, 550–553.

67.

Luzzi

, Piccirilli

, Provinciali

(2007) Perception of emotions on happy/sad chimeric faces in Alzheimer disease: Relationship with cognitive functions. Alzheimer Dis Assoc Disord 21, 130–135.

68.

Torres Mendonça De Melo Fádel

, Santos De Carvalho

, Belfort Almeida Dos Santos

, Dourado

MCN

(2019) Facial expressionrecognition in Alzheimer’s disease: A systematic review. J ClinExp Neuropsychol 41, 192–203.

69.

Weiss

, Kohler

, Vonbank

, Stadelmann

, Kemmler

, Hinterhuber

, Marksteiner

(2008) Impairment in emotion recognition abilitiesin patients with mild cognitive impairment, early and moderateAlzheimer disease compared with healthy comparison subjects. AmJ Geriatr Psychiatry 16, 974–980.

70.

McCade

, Savage

, Guastella

, Lewis

SJG

, Naismith

(2013) Emotion recognition deficits exist in mild cognitive impairment, but only in the amnestic subtype. Psychol Aging 28, 840.

71.

Chen

, Xu

, Chu

, Ding

, Liang

, Nasreddine

, Dong

, Hong

, Zhao

, Guo

(2016) Validation of the Chinese version of Montreal cognitive assessment basic for screening mild cognitive impairment.. J Am Geriatr Soc 64, e285–e290.

72.

Ben Bouallègue

, Mariano-Goulart

, Payoux

Alzheimer’s Disease Neuroimaging Initiative (ADNI) (2017) Comparison of CSF markers and semi-quantitative amyloid PET in Alzheimer’s disease diagnosis and in cognitive impairment prognosis using the ADNI-2 database. Alzheimers Res Ther 9, 32.

73.

Perani

, Cerami

, Caminiti

, Santangelo

, Coppi

, Ferrari

, Pinto

, Passerini

, Falini

, Iannaccone

(2016) Cross-validation of biomarkers for the early differential diagnosis and prognosis of dementia in a clinical setting. Eur J Nucl Med Mol Imaging 43, 499–508.

74.

Amini

, Hao

, Zhang

, Song

, Gupta

, Karjadi

, Kolachalama

, Au

, Paschalidis

(2023) Automated detection of mild cognitive impairment and dementia from voice recordings: A natural language processing approach. Alzheimers Dement 19, 946–955.

75.

Dalili

, Penton-Voak

, Harmer

, Munafò

(2015) Meta-analysis of emotion recognition deficits in major depressive disorder. Psychol Med 45, 1135–1144.

76.

Kim

(2016) Post-stroke mood and emotional disturbances: Pharmacological therapy based on mechanisms. J Stroke 18, 244.