Abstract
BACKGROUND:
Alzheimer’s Disease (AD) is normally assessed in clinical settings using neuropsychological tests and medical procedures such as neuroimaging techniques: Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) among others. The latter procedures are expensive and unavailable in most nations, so early diagnosis of AD does not occur, which heavily increases the subsequent treatment costs for the patients.
AIMS:
This research aims to evaluate cognitive features related to dementia progression based on neuropsychological tests’ data that are related to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) We utilise data related to two neuropsychological tests including the Clinical Dementia Rating Scale Sum of Boxes (CDR-SOB), and Mini-Mental State Examination (MMSE), to assess advancement of the AD.
METHODS:
To achieve the aim, we develop a data process called Neuropsychological Feature Assessment via Feature Selection (NFAFS) to identify impactful features using Information Gain (IG) and Pearson Correlation to assess class-feature and feature-feature correlations Later we will model a minimal subset of neuropsychological features using machine learning techniques to derive classification models.
RESULTS AND IMPLICATIONS:
Results obtained show key cognitive features of the MMSE are Time Orientation, Recall and Complex Attention, since they correlate with the progression class being ranked high in results of the feature selection techniques. For the CDR-SOB features, and aside from the memory feature it was difficult to identify other specific features that are signs of the dementia progression Clinicians can use specific features in a digital knowledge base to pay more attention to specific cognitive deficits related to Recall, Orientation and Complex Attention during dementia clinical evaluations in order to seek possible signs of the disease progression early.
Keywords
Introduction
Worldwide, approximately 50 million people have dementia, with 10 million new cases diagnosed every year (World Health Organization, 2020). The most common form of dementia, accounting for 60–80% of cases, is Alzheimer’s Disease (AD) [5] with approximately 200,000 Americans under 65 affected [3]. AD affects memory, thinking and behaviour – sometimes its symptoms are mistaken for stress or aging. AD a progressive condition that evolves over time is classified in three stages: Early, Middle and Late [75]. In its early stages, memory loss is mild, but with late-stage AD, individuals lose the ability to carry on a conversation and respond to their environment [59]. A common early symptom of AD is difficulty remembering newlylearned information due to changes in the part of the brain that affects learning [4]. As AD progresses symptoms become increasingly severe including disorientation, mood and behaviour changes, confusion about events, time and place, unfounded suspicions about family, friends and caregivers, more serious memory loss, and difficulty speaking, swallowing and walking [52].
Neuropsychological evaluation using cognitive/functional tests such as the Alzheimer’s Disease Assessment Scale – Cognitive Subscale [ADAS-Cog] [46], Clinical Dementia Rating Scale [CDR] [39], Functional Activities Questionnaire [FAQ] [55], MMSE [27] and others have been used to diagnose Mild Cognitive Impairment (MCI) and dementia. These tests mainly evaluate different cognitive and memoryrelated areas by providing the patients with questions and activities within clinical or in some cases non-clinical environments. These tests are effective in the early screening of dementia; however, determining neuropsychological features or activities within these tests during dementia progression is a challenging task [66, 62, 11].
One of the promising approaches for screening of AD is the use of Artificial Intelligence (AI) particularly Machine Learning (ML). ML techniques extract hidden patterns from historical data using search methods that can aid diagnosticians in AD space between screening related such as the progression stage of patients [72, 10, 26, 74]. Recent research works show that using ML techniques can improve the screening and diagnosis of AD, i.e., Hinrichs et al. [36], So et al. [65], Samper-González et al. [63], Grassi et al. [30], Das et al. [22], Thabtah et al. [67], Thabtah et al. [68, 69], and AlShboul et al. [1]. However, primary issues that recent studies have not addressed thoroughly are which cognitive features can be symptoms of dementia progression, and the primary cognitive areas that these features belong to according to the Diagnostic and Statistical Manual (DSM-5) [6]. To fill this gap, this research investigates cognitive features that impact on the progression of AD in dissimilar neuropsychological assessments (MMSE, CDR-SOB) using large real data observations from the ADNI data project [47, 2]. We would like to answer the below research question:
How can we determine cognitive features from the MMSE and CDR-SOB tests using feature selection techniques?
We propose an experimental that are signs of the dementia progression methodology in a data process called Neuropsychological Feature Assessment via Feature Selection (NFAFS) to assess different indicators in neuropsychological assessments from real data given in ADNI (ADNI-Merge, MMSE-sheet, CDR-SOB-sheet) using feature selection. The proposed methodology firstly models the data to present the AD progression problem (creating the progression class label), balances the data with respect to the class label, processes the data observations, and more importantly evaluates each neuropsychological test’s features using feature selection techniques. We model a new class variable that we call Diagnostic Change (DX_change) which holds information of the progression of dementia at any stage. More details on NFAFS are given in Section 3.
The paper is structured as follows: Section 2 highlights the background on the neuropsychological tests used for AD screening and critically analyses relevant research works. In Section 3, we present the NFAFS and discusses the data used and features. Section 4 is devoted to experimental analysis, and finally, the conclusion and future works are given in Section 5.
Literature review
Background on dementia and cognitive assessments
The current standard for the diagnosis of dementia conditions is the DSM-5 framework of which there are six cognitive domains associated with dementia: Complex Attention, Executive Function, Learning and Memory, Language, Perceptual-Motor and Social Cognition [6]. Major neurocognitive disorder (Major ND) necessitates that the patient exhibits a substantial decline over time in at least one of the six cognitive areas of the DSM-5. Minor neurocognitive disorder (Minor ND) requires a moderate decline in cognitive areas over time and that the patient’s independence during everyday activities is not affected. Both major ND and minor ND also require that the cognitive declines be not only observed when the patient is delirious, and the cognitive defects are not better explained by another mental disorder.
Normally, when AD diagnosis is utilised for research purposes, rigorous diagnosis criteria are needed. In the ADNI project [2], AD subjects have been diagnosed as having probable AD according to the Neurological and Communicative Disorders and Stroke (NINCDS) and the Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) criteria [45]. This involves a MMSE score between 20 and 26 and an original CDR score of 0.5 or 1.0. The NINCDS-ADRDA uses eight cognitive domains: Attention, Problem Solving, Orientation, Functional Abilities, Constructive Abilities, Memory, Language, and Perceptual Skills. While a definite AD diagnosis is possible through a biopsy or autopsy, a probable diagnosis can be given when cognitive impairments are present in two or more domains and have shown to progressively decline along with the age of onset between 40 and 90 years and no other diseases are present that could account for the symptoms [45].
The DSM-5 classifies MCI as a mild cognitive decline that does not yet deprive individuals of leading an independent lifestyle or ability to perform complex daily activities such as driving a car. Major cognitive impairment is diagnosed when there is interference with daily living and there are no other culpable disorders [7]. While the DSM-5 does not name any specific diagnostic tests, there are a number of means that clinicians can use to provide a diagnosis of possible or probable AD. Since an AD diagnosis is reliant on the absence of other disorders, MRI and PET scans are useful for screening out other possible diagnoses, while neuropsychological tests such as MMSE CDR, and CDR-SOB are important tools for measuring impairments in the cognitive domains. Common neuropsychological tests that have been developed and used over the years by clinicians for dementia screening are shown in Table 1. This research scope is limited to CDR-SOB and MMSE.
MMSE is a commonly used screening test for early screening of symptoms related to dementia [27]. The test consists of multiple questions over eleven sections where patients score either a 1 or 0. The maximum score is 30 points and a screening is given based on where the patient scores on a table: A score of 20–24 suggests mild dementia, 13–20 suggests moderate dementia, and less than 12 indicates severe dementia
Common neuropsychological tests
Common neuropsychological tests
The CDR-SOB is a clinical assessment for ranking dementia severity [9]. This test is performed by a clinician trained in an administering and interpreting the test results. In the CRD assessment, the patient is evaluated in six areas: memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care. Scoring is 0: None, 0.5: Questionable, 1: Mild, 2: Moderate, 3: Severe. The Sum of Boxes scoring assessment (CDR-SOB) generates scores between 0 to 18 after adding up the CDR domain scores. Recommended cut-off scores are: 0.5–4: Questionable 4.5–9: Mild 9.5–15.5: Moderate to Severe, 1618 [77]. We used in this study CDR-SB data from ADNI study.
One of the earliest researches works to apply ML techniques for the prediction of AD is that by Datta et al. [23]. Looking to improve accuracy over BOMC and FAQ, they used data collected from 578 patients from the University of California’s Alzheimer’s Disease Research Centre and created three datasets using BOMC FAQ, and a combination of BOMC and FAQ. Features were selected manually from the BOMC and FAQ answers as well as age, sex, job, and education level. Of the ML techniques used the authors found that they were able to achieve the best classification accuracy (88.4%) using the Naïve Bayes classifier.
Devanand et al. [24] conducted research with 148 MCI patients over three years to predict the progression of MCI to AD. The authors used cognitive and other clinical tests in their dataset. Using backward and stepwise logistic regression [12], they achieved 90% specificity with the five features identified: informant report of functioning from FAQ, olfactory identification from the University of Pennsylvania Smell Identification Test [UPSIT] [25], immediate recall (verbal memory) from the Selective Reminding Test [SRT] [14], and MRI hippocampal volume as well as MRI entorhinal cortex volume. However, they also found that combining age, MMSE, SRT immediate recall, FAQ, and UPSIT provided 81.3% sensitivity.
Using data from 1,489 patients in the ADNI-1 dataset, Zhu et al. [79] aimed to predict changes in MMSE scores over 24 months. Using manually selected features (age, gender, education, MMSE evaluation, and APOE genotype), the authors developed their own prediction method – COMPASS (Computational Model to Predict the Development of Alzheimer’s Disease Spectrum), and compared this with linear regression [29], RBF [Radial Bias Function] Network [13], SMOreg [sequential minimal optimization – regression] [64], decision tree [56] and Gaussian Process Regression (Rasmussen, 2006). Their method outperformed these, and it achieved 80% accuracy for the top 20% of predictions but derived poor accuracy for MCI classification.
So et al. [65] investigated the problem of predicting dementia using ML with feature selection on two common datasets: MMSE-KC and CERAD-K [42]. The authors used Chi-squared testing [31] and Information Gain (IG) [56] techniques and 13 features. The top five features selected by both of the techniques were similar: Orientation to Place, Orientation to Time, Three-stage Commands, Recall, and Attention”. Simultaneously, the authors used multiple dissimilar classifiers to process the two datasets considered such as Support Vector Machine (SVM) [19], Naive Bayes, Multilayer Perceptron (MLP) [51] and others, and the results showed that the models derived by MLP and SVM are accurate to predict dementia at least on the two datasets considered.
Vinutha et al. [74] replaced the missing value using the imputation method with ML on the neuropsychological scores and the FAQ from the NACC database (National Alzheimer’s Coordinating Center, n.d) to improve the accuracy of classification. The authors conducted the analysis using two patient visits from datasets: Visit 1 and visit 4, this because approximately 50% of the progression from MCI to AD is concentrated in 24–36 months after the first visit. The authors used the Genetic Algorithm (GA) [76] and Logistic Regression to process the datasets after balancing the data with the SMOTE [17] method The results showed that paying bills, remembering appointments, and meal preparation in visit 1 and tax-paying records, remembering appointments, and paying attention in visit 4 are the main progression features.
Hsiao et al. [38] analysed longitudinal data related to the FAQ, focusing on MCI subjects and considering two possible scenarios: whether stable or progressive. Their previous studies suggesting that Instrumental Activities of Daily Living (IADLs) [43] may be helpful in distinguishing MCI from dementia, but not adequate to clarify mapping between longitudinal changes in IADLs and conversion of MCI to dementia. The authors assessed IADL performance using the FAQ and these assessments were split into 10 different categories according to their criteria. The authors focused on the recent longitudinal intervals. Changes in FAQ scores were calculated between the last two visits for stable MCI and between the point of a diagnosis of dementia conditions and two immediately preceding visits for progressive MCI. These two groups were compared employing unpaired
Niyas et al. [50] suggested a feature selection algorithm based on fusion of the Fisher Score and the greedy searching heuristic approach. The authors applied SVM and k-NN techniques for the dementia screening and used the ADNI-TADPOLE (1737 patients) and ABIL (1000, Australian cohort) datasets. The authors selected a cross-sectional data at the baseline stage in the experiment. The selected features based on the algorithm were evaluated using Leave One Out Cross Validation (LOOCV) and stratified 10-fold cross-validation. The suggested algorithm provided good sensitivity (84%) and specificity (82.5%). Based on their experiment CDRSB, MMSE, ADAS-13, and AV45 from ADNI-TADPOLE were ranked highly important features to discriminate normal vs mild cognitive Alzheimer cases. In the ABIL dataset, the features with high importance are: LDELTOTAL (which assesses logical memory), CDGLOBAL (Clinical Dementia Rating Global), and MMSCORE. However, a recommendation by the authors was that the classifiers’ performance improved when combining highly important features with features of low importance. Furthermore, the researchers highlighted the potential role of cognitive tests in discriminating the MCI and AD cases. Examples of these tests are the Clinical Dementia Rating Global test, Memory Immediate Recall test (LIMMTOTAL) Partial Score of Logical Memory (LDELTOTAL), and MMSCORE which are promising features in classifying MCI vs AD.
Thabtah et al. [68, 69] assessed cognitive and functional features for two common medical questionnaires to identify how well machine learning technology can detect AD advancement. The authors used data related to the ADAS-13 and FAQ dementia assessments from the ADNI data project, and a few machine learning algorithms with feature selection. The results pinpointed that cognitive features are more strongly associated with the disease progression.
Recently, AlShbout et al. [1] evaluated the score of the CDR-SOB assessment and how well it correlates with the diagnosis of AD. The authors used several machine learning algorithms to process the ADNI-Merge dataset focusing on the baseline-demographic data with CDR-SOB score. The results indicated competitive performance when using machine learning algorithms in detecting MCI, Dementia or cognitively normal (CN) individuals. However, there was some variation in the evaluation metrics used showing some machine algorithms are better suited to dementia screening problems based on the baseline-demographic data.
Table 2 illustrates the summary of related works.
Methodology
Data, features, and methods
Figure 1 illustrates the methodology used in this research to consider the research problems raised earlier. In NFAFS, three datasets from the ADNI data repository: ADNI-Merge, MMSE data sheets (MMSE-feature), and CDR-SOB data sheets (CDR-SOB- feature), are processed using dissimilar feature selection techniques to identify key neuropsychological indicators. Each of the cognitive test’s data, i.e. CDR-SOB, MMSE, was integrated with ADNI-Merge to capture the diagnostic class using RID and visit code attributes. Some of the visits for a given patient (RID) were ignored when not appearing in the cognitive test data sheets. Since the problem under consideration is to assess the progression of dementia at any stage, this necessitates the creation of a new class variable for each dataset that we call DX_change. This class variable is created to record any progression from CN to MCI or MCI to dementia (AD) using the following rules:
The data instances in each dataset are sorted based on the patient ID (RID) and visit date. Summary of related works
Methodology Followed (NIAFE).
For any two consecutive visits for the same patient (RID) if the original diagnostic class (DX) changes progressively (from CN to MCI or MCI to AD) then we record in DX_change ‘1’. However, if DX changes in a regressive manner, the DX_change is assigned
Whenever all visits of the current RID are considered the DX_change of the first visit for the next patient is automatically assigned 0 and we repeat steps 1–2 on the remaining visits for that patient until all of its visits are fulfilled.
The above steps are applied on the level of data observations (sorted visits for a patient) and for each test dataset.
Once each test’s data are integrated with ADNI-Merge, data instances that are linked with missing class, i.e. DX, are removed; we also exclude instances with DX_change
They adopt different mathematical models in the way they compute feature relevance They have been utilized previously in medical screening applications, i.e. Remeseiro & Bolon-Canedo [60]; Liu et al. [44]; Zhang et al. [78] They are highly efficient in the way they produce the feature relevance (time taken) They are available on different ML platforms such as Waikato Environment for Knowledge Analysis (WEKA) [28]. Rapid Miner (RapidMiner, n.d), R (R Foundation, n.d), and Python (Python Software Foundation, n.d).
The considered feature selection technique evaluates the associations between the progression class and the cognitive features in pairs. For instance, PC measures the independent relationship between two features and identifies the strength and signs of the relationship based on a computed correlation coefficient, i.e.
where
IG measures the amount of information each available feature embraces to distinguish between the class variable values [56]. It has been used successfully in constructing decision tree classification systems by ML algorithms as a way to evaluate data split in reference to the class variable in the dataset. In doing so, IG relies on uncertainty reduction metrics such as Shannon entropy to estimate the worthiness of the data split using each feature in the training dataset by evaluating the information gained before the data split and after the data split. IG mathematical notations are shown below:
where
where
mRMR utilises the maximum relevancy of the available features in the training dataset to the class variable and the minimum redundancy among the features themselves [54]. In other words, this technique tries to not only capture the relationship between the features and the class as a conventional filtering technique, but also minimises the similarities among these features. mRMR employs the F-Statistics metric to compute the features-class relevancy, and PC to compute any redundancy among the features using the below equations based on [54].
Where,
MI
In CFS, the feature goodness is identified without calculating weight/score per feature and only relevant features are selected using mutual information metrics.
In the feature selection phase of NFAFS, features that are a) highly correlated with the class variable and b) dissimilar in reference to the neurodegenerative domains they belong to in the DSM-5, are identified. This will provide diagnosticians with valuable knowledge of which neuropsychological features are more crucial in the screening process of AD progression. Since each of the feature selection techniques may provide a different range or scale when calculating the feature relevance we normalised the scores between 0–1 for the results of both PC and IG. In Section 5, a demonstration of the feature assessment results will be provided, illustrating how each feature subset was chosen In brief we analysed the results produced by the feature selection techniques in terms of common features, features rank, nonoverlapping features with reference to the DSM-5 cognitive domains among others.
Data used in this research has been obtained from ADNI.2
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information, see www.adni-info.org.
The experiment is based on three datasets related to ADNI including ADNI-Merge, MMSE- feature, and CDR-SOB-feature. The latter two datasets contain specific cognitive features (questions/activities) that are related to the MMSE, and CDR-SOB cognitive tests, and were recorded on any particular visit by a patient/control who had participated in the ADNI project. For example, there are 30 attributes related to the original MMSE sheet in the ADNI data repository many assessing memory, learning, communication, etc. ADNI-Merge is a comprehensive dataset that contains data from ADNI 1/2/ GO clinical data, demographics and numeric summaries of pathological indicators of AD. We revised the datasets by merging ADNI-Merge with each cognitive test’s data so we can access the original diagnostic class that will be used to create the progression class variable. The revised datasets’ MMSE features, CDRSOB features include the cognitive features, the diagnostic class (DX) and the new modelled class variable (DX_change).
Table 3A shows the datasets’ characteristics for instance, the MMSE-feature dataset contains many features (questions, answers, scores, total_score, demographics, etc) Table 3B displays the data observations after we modelled and preprocessed the datasets. The processed dataset contains the cognitive features plus the DX_change class variable; all other attributes such as demographics, questions, total_score, etc were excluded. In addition, and to simplify the process of feature selection some of the features within the datasets have been merged when they belong to the same cognitive domain within the same cognitive test. We call this process ‘data modelling and mapping’. For the MMSE-feature dataset, the first column in Table 4 shows the derived columns after data modelling and to which features are mapped within the original MMSE cognitive test. For example, the scores of the first five features which are related to orientation in the MMSE cognitive test have been merged into one feature that we call ‘Q1 OrientoTime’. This new feature will have a score between 0–5. Similarly, the scores of features 6–10 in the MMSE test have been merged to form a new feature that we call ‘Q2 OrientoPlace’. Overall, the revised MMSE-feature dataset consists of 10 cognitive features, and the class (DX_change). All other features have been excluded.
For the CDR-SOB feature dataset, we validated the total scores per data observation based on Table 4B by mapping the recorded value for each CDR-SOB feature in ADNI to its corresponding value within the original CDR test. This is since the recorded features’ values in ADNI for the CDR- feature dataset do not match the corresponding values within the original CDR cognitive test. For CDR-SOB value of ‘0’ participants are mapped to ‘CN’ class, for those with scores between 0.5–4.0 participants are considered with ‘MCI’, and any score above 4, participants are considered ‘demented’. We have not considered the dementia level in the data representation prior training the ML algorithms.
The modelled MMSE-features data
CDR-SOB features validation
Settings
We experimented on three datasets related to the ADNI project: ADNI-Merge, MMSE-feature and CDR-SOBfeature. We integrated ADNI-Merge with each cognitive tests’ dataset respectively to ensure we capture the DX class variable as discussed in Section 3. The experiments were conducted using a Rapid Miner Studio version 9.8 tool. RapidMiner is an open-source software package that provides solutions for data preparation, ML, text mining and predictive analytics [37]. We also used WEKA (Waikato Environment for Knowledge Analysis) to run IG and PC feature assessment experiments WEKA is an open-source tool of ML that was developed at the University of Waikato, New Zealand.
All experiments have been conducted using ten-fold cross validation technique. Cross-validation is mainly applied in ML studies to estimate the predictive validity and to perform model selection – a widespread technique due to its simplicity and its universality [8]. The procedure is also known as k-fold cross-validation with one parameter ‘k’ that represents the number of groups resulting from a given dataset’s split. The processing machine applied to perform the experiments is an Intel(R) Core (TM) i5 4210U CPU @ 1.7GHZ 2.4GHz with 8 GB RAM and working on Windows 7, 64-bit operating system.
Results analysis
MMSE features
Table 5 depicts the features along with their computed weights which have been derived by the feature selection techniques against the MMSE data-feature. We normalised scores per feature for easier comparison and simplicity because each feature selection technique provides different score ranges. The normalised scores are obtained per Eq. (5). By providing normalised scores we are able to easily contrast between the feature selection results obtained. Min-max normalisation is used in the project using the following equation:
No normalisation was done for the mRMR results since this technique only pinpoints whether the feature is relevant or not without calculating the weight of the feature; mRMR adopted in this research employed Correlation Feature Set (CSF) [33].
In Table 5, the top ranked features offered are ‘Q5 DeVerbalRecall’ and ‘Q1 OrientoTime’, which relate to the memory and learning cognitive domains in respect to DSM-5 neurodegenerative criteria for dementia. The findings are also consistent with the NINCDS-ADRDA Criteria for AD in which memory worsening is considered a sign in AD probable and possible diagnosis. The ‘Q1 OrientoTime’ feature in the dataset comprises of five sub-questions all related to recent memory (recall) which involve asking the participant about date, year, month, season, and day; these have been combined into one feature to simplify data modelling. Whereas ‘Q5 DeVerbalRecall’ involves repeating the names of three objects.
Since ‘Q5 DeVerbalRecall’ and ‘Q1 OrientoTime’ cognitive features overlap in terms of the cognitive area they cover, we therefore assess their similarity using a correlation matrix as shown in Fig. 2. The matrix was modelled using PC on all available features in the dataset excluding the class variable. Based on Fig. 2, ‘Q5 DeVerbalRecall’ and ‘Q1 OrientoTime’ have good correlation with each other with computed correlation above 0.41, signalling their similarity as they measure similar cognitive domains.
Another important feature that has been shown on the top ranked results of both IG and PC feature selection techniques is ‘Q4 Attention’, which is linked with complex attention and memory. However, this feature was not chosen by mRMR coupled with CSF feature selection method. For example, the stage-command feature covers a similar cognitive domain which is complex attention, similar to the attention feature. Also, there is some overlapping albeit low with remaining features in terms of the cognitive domain. Further, the ranking results of the feature selection techniques show that ‘Q8 3StageCmd’ and ‘Q10 Writing’ cognitive features have been chosen by the three techniques having ranking positions of (4, 5) and (5, 6) by the PC and IG, respectively. The ‘Q8 3StageCmd’ and ‘Q10 Writing’ as per Han et al. [34] require more complex cognitive functions, that are related Language, Complex Attention, Learning & Memory, and according to [73] Executive Function. When these are considered with the top ranked cognitive features (‘Q5 DeVerbalRecall’ and ‘Q1 OrientoTime’) more cognitive domains related to dementia conditions can be covered in the DSM-5 framework including.
The feature selection results against the MMSE data showed that ‘Copying’ has little impact with dementia at least when using the dataset and the feature selection considered since it was not chosen by mRMR with CFS, and also ranked last in IG and PC. This indicates that constructional and visuoperceptual skills are not as important as memory and complex attention when assessing individuals using the MMSE at least on the data and techniques we consider
Feature selection results on the MMSE data
This can be explained because the memory is the first cognitive feature affected in the early stages of AD [34]. In addition, attention and memory are closely related in detecting prodromal dementia and should be assessed first by the medical practitioner [41].
The feature-feature results using the PC is depicted in Fig. 3. Further, the feature-class (feature ranking) results of CDR-SOB are displayed in Table 6. The feature-feature results show high similarity among features since the computed correlations are high. For instance, the correlation between memory and orientation is close to 79%, and the correlation between memory and judgment is 75%. The feature class correlations are also high which make it difficult to distinguish between features of the CDR-SOB test at least when using the ADNI data repository. One principal reason for such results is the fact that the CDR-SOB test was used originally to assign the diagnostic class in the ADNI project and therefore the results of the feature selection techniques could favour its features as shown in Table 6 and thus will be difficult to differentiate between these features in a clinical cognitive assessment Nevertheless, memory dominates as the main factor that correlates with AD in all feature selection techniques when assessing the CDR – SOB feature dataset, which is consistent with the results obtained against the MMSE-feature dataset.
For the CDR-SOB features and as expected, the memory feature had been ranked first by all the feature selection techniques This is consistent with result of other researchers due to the important role of memory in defining the AD [16, 18]. However, the judgment & problem solving feature was ranked second as per the IG and mRMR-CSF result. The personal care feature was ranked last by the correlation and IG; however, it had been ranked in the third position by the mRMR-CSF. This can be explained due to overlapping of the personal care feature with other CDR-SOB features that had been selected by the two techniques (IG and correlation) such as Orientation [53, 34].
Feature selection results on the CDR-SOB data
Feature selection results on the CDR-SOB data
Correlation matrix computed for the features of MMSE data.
Correlation matrix computed for the features of CDR-SOB data.
Based on the presented results earlier, features related to the Memory cognitive domain were considered as the strongest discriminator for dementia progression based on the IG, PC and mRMR results of both tests (MMSE and CDR-SOB). This is in agreement with previous research that recommended to include Memory as well as Complex Attention in the assessment of the dementia progression [16, 18, 34]. The second group of features is related to Orientation In addition, Cognitive Executive Functions is associated with cognitive decline in many research studies and found to be a good indicator for dementia advancement [32].
The MMSE scope do not assess ‘Executive Function’ explicitly, however, some features suggested by mRMR-CSF method such as writing [21] can be argued that it may cover this cognitive area. As per Tinklenberg [71], writing is related to language domain. However, [34] found out that ‘writing’ can be associated with tasks such as making a phone call, shopping and laundry. This is coherent with [21] suggestion that writing can be associated with cognitive domains like linguistics, visual spatial and executive function. Another item of MMSE that can be an indicator to different domains is the ‘3-stage command’, which is used to measure the language skills cognitive domain. However, [34] found out that it can be related to working memory domain. They found out that it can be associated with activities such as toileting and dressing. In spite of the simplicity of ‘3-command stage’, it can provide indicators that it can be related to managing different simple cognitive domains such as language, processing speed and planning to the execution of the provided instructions. The mapping between activities of cognitive tests and cognitive area requires more in-depth investigation to highlight the overlapping, which is out of the scope of this research.
Dementia is a group of conditions that affects humans, of which AD accounts for over 60% of cases. Ten million individuals are diagnosed with AD every year globally, particularly the elderly, and millions of dollars are spent to diagnose and manage the disease imposing huge pressures on governments’ limited healthcare resources. The process of diagnosing dementia by a specialised clinician primarily requires assessing cognitive factors that measure the patient’s decline in multiple cognitive areas within a neuropsychological test besides the medical history of the patient and his/her family. The problem becomes more difficult when the clinician attempts to identify the disease progression using a specified timeframe since dementia sublevels and dementia precursors may overlap in signs of cognitive decline. Hence, it is essential to develop ways to identify impactful cognitive features of prodromal dementia for clinicians to exploit within digital information sheets during the clinical assessments based on cognitive features. In this research, we investigated the problem of identifying cognitive features that are symptoms of dementia progression using real data of two neuropsychological assessments (MMSE, CDR-SOB) within the ADNI data repository. We have proposed an experimental data process: NFAFS that consists of feature selection techniques in which the correlations between the features and the progression class variable besides correlations between pairs of feature-feature were investigated.
Experimental evaluation using real data observations of cases and controls that have been undertaken the MMSE and CDR-SOB tests of ADNI have been processed using different feature selection techniques including IG, PC and mRMR with CFS. The derived results from the MMSE-feature data showed that Time Orientation and Recall features albeit having correlation have high influence on the progression class variable being ranked the top two features of the PC and IG feature selection results. This result if limited pinpoints that these features signs of a change during the disease progression. Moreover, the results demonstrated that Complex Attention is also a cognitive domain that clinicians need to pay more attention to during the clinical assessment since the MMSE’s attention feature showed correlation with the class progression variable, and was apparent in all considered feature selection techniques’ results. Whereas features related to perceptual motor skills such as MMSE’s copying shapes has low ranking and was not detected by the mRMR with CFS feature selection results.
The results derived from the CDR-SOB features reveal that it is challenging to differentiate among the features, which makes it difficult to spot the ones that are symptoms of the disease advancement. However, CDR-SOB’s feature related to memory was the highest sign of AD progression, which supports previous research findings.
In the near future, we will employ the cognitive features besides biological markers to design and implement an AI algorithm based on deep learning to build classification systems for AD progression. In addition, we would like to determine whether the cognitive features will change based on the level of dementia.
Footnotes
Funding
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
