Abstract
Background
The neurodegenerative diseases like Alzheimer's disease (AD) can result in progressive decline in both cognitive functions and motor skills, which have critical need for accurate early diagnosis. However, current diagnosis approaches primarily rely on timely clinical magnetic resonance imaging (MRI) scans, which impede widely application for potential patients. Leveraging handwriting as a diagnostic tool offers significant potential for identifying AD in its early stages.
Objective
This study aims to develop an efficient, rapid, and accurate method for early diagnosis of AD by utilizing handwriting analysis, a promising avenue due to its association with compromised motor skills in neurodegenerative diseases.
Methods
We propose a novel methodology that leverages self-attention mechanisms for the early diagnosis of AD. Our approach integrates data from 25 distinct handwriting tasks available in the DARWIN (Diagnosis AlzheimeR WIth haNdwriting) dataset.
Results
The Self-Attention model achieved an accuracy of 94.3% and an F1-score of 94.5%, outperforming other state-of-the-art models, including traditional machine learning and deep learning approaches. Specially, the Self-Attention model surpassed the previous best model, the convolutional neural networks, by approximately 4% in both accuracy and F1-score. Additionally, the model demonstrated superior precision (94.7%), sensitivity (94.5%), and specificity (94.1%), indicating high reliability and excellent identification of true positive and true negative cases, which is crucial in medical diagnostics.
Conclusions
Handwriting analysis, powered by self-attention mechanisms, offers significant potential as a diagnostic tool for identifying AD in its early stages, providing an effective alternative to traditional MRI-based diagnosis.
Introduction
Alzheimer's disease (AD) stands as one of the most devastating neurological disorders, with no known cure and symptoms that progressively impair daily functioning. 1 The global impact of AD and other forms of dementia affects around 50 million people. With life expectancy on the rise worldwide, the prevalence of neurodegenerative disorders is expected to surge in the forthcoming years. By 2050, the number of individuals afflicted by AD and related dementias could reach 152 million. 2 Acknowledging the immense benefits associated with the early detection of AD emphasizes the imperative of diagnosing the condition during its nascent stages. The nature of AD is characterized by a relentless progression of neurodegeneration, which leads to a gradual erosion of cognitive abilities and motor skills. 3 This gradual decline presents a complex challenge for detection and monitoring, as the onset of symptoms can be subtle and vary significantly among individuals. Particularly, neuroimaging-derived models have been applied successfully to explore AD mechanisms in our previous works.4–9 Unlike other medical conditions, the diagnosis of AD cannot solely rely on traditional imaging techniques such as clinical magnetic resonance imaging (MRI) scans. While MRI scans are invaluable for assessing structural changes in the brain, 10 they may not capture the earliest signs of functional decline. Thus, the complexity of AD necessitates a multifaceted approach to diagnosis, which includes not only advanced imaging techniques but also neuropsychological assessments and the analysis of motor skills. 11
Handwriting analysis offers a unique intersection between neuropsychological assessment and the analysis of motor skills, making it a valuable tool for the early diagnosis of AD. As AD progresses, it impacts both cognitive functions and motor abilities that are critical to the act of writing. Neuropsychological assessments reveal deficits in cognitive domains such as memory and executive functioning, which can affect the planning and execution of writing. Concurrently, the analysis of motor skills captures the deterioration in fine motor control, coordination, and muscle movement, all of which are integral to handwriting. Handwriting analysis synthesizes these aspects by providing a practical, observable measure of how cognitive and motor impairments manifest in everyday activities. 12 This assessment of handwriting analysis is not only non-invasive but also cost-effective, which enable its application on a large scale for potential screening and early detection.
In this paper, we propose a novel approach for early AD diagnosis by employing transformer-based models equipped with self-attention mechanisms. This approach utilizes extensive data from the DARWIN (Diagnosis AlzheimeR WIth haNdwriting) dataset. 13 More details on the dataset can be found in the Dataset overview. By integrating and analyzing the rich and varied data provided by these tasks, our model leverages self-attention mechanisms to detect subtle and complex patterns in handwriting that are indicative of early-stage AD. This detailed analysis enhances the accuracy of AD detection, illustrating the efficacy of this approach in diagnosing AD.
Our contributions in this research are three-fold. First, we present an innovative methodology that inherently integrates features from various handwriting tasks through the use of self-attention mechanisms. Second, we validate the efficacy of our model by achieving superior results compared to existing methods, thereby pushing the boundaries of early AD detection using handwriting analysis. Third, our comprehensive analysis of the tasks offers valuable insights for future empirical studies with local patients, enhancing the potential for real-world applications.
Related work
The early diagnosis of AD through handwriting analysis has garnered considerable attention in recent research, reflecting the growing need for alternative diagnostic methods that can overcome the limitations of traditional imaging techniques. This section surveys significant contributions that lay the groundwork for our proposed methodology.
Kahindo et al. were pioneers in analyzing the nuances of online handwritten cursive loops to characterize early AD. 14 They implemented a Bayesian framework that exclusively utilized handwriting velocity as a diagnostic indicator, demonstrating robust performance in early AD detection. Ghaderyan et al. introduced a sophisticated algorithm that extracts and analyzes kinematic features from handwriting samples, focusing on dynamic aspects such as speed, pressure, and rhythm. 15 By applying advanced machine learning techniques, the study demonstrates significant potential in differentiating between healthy individuals and those with early AD, achieving high accuracy and reliability. Gregorio et al. analyzed various handwriting tasks such as graphic, copy and reverse copy, memory, and dictation tasks. 16 Effective selection of these tasks can enhance performance while reducing the number of activities. Cilia et al. reviewed various handwriting analysis techniques and highlighted their potential in supporting the diagnosis of AD. 17
In the era of deep learning, Mwamsojo et al. explored Bidirectional Long Short-Term Memory (BiLSTM) networks, demonstrating that BiLSTM achieved higher accuracy compared to standard convolutional neural networks (CNNs). 18 Expanding on technological applications, Dao et al. innovated by introducing a one-dimensional convolutional neural network (1D CNN) as a baseline model for AD diagnosis. 19 They also developed a synthetic data generator, DoppelGANger, to augment the training dataset, significantly enhancing the model's diagnostic capabilities through richer, more varied data inputs. The introduction of the DARWIN dataset by Cilia et al. marked a critical advancement in the field. 13 As the largest publicly available dataset for AD handwriting analysis, it comprises data from 89 AD patients and 85 healthy individuals engaged in 25 distinct handwriting tasks. Cilia et al. further enhanced diagnostic accuracy by integrating feature sets across these tasks and implementing a majority vote decision rule among outputs from 25 task-specific classifiers, showcasing a novel method for data amalgamation. Erdogmus et al. contributed a cost-effective and rapid diagnostic solution that employs CNN models to transform one-dimensional handwriting features into a two-dimensional format. 20 This approach leveraged the spatial correlations within handwriting data, achieving high accuracy and speed in early-stage AD detection. Addressing the challenges of data sparsity and high-dimensional feature spaces, Ngnamsie et al. developed a methodological framework to pinpoint essential features, effectively bypassing the curse of dimensionality. 21 This strategy allowed for more focused and computationally efficient analyses. Further specializing in the type of handwriting analyzed, Cilia et al. conducted an exhaustive study involving six handwriting tasks that included regular words, non-regular words, and non-words. 22 They utilized a range of classifiers, such as random forest, decision tree, support vector machine, and multilayer perceptron, combined with feature selection techniques to refine the diagnostic process for AD.
These studies highlight the significant potential of handwriting analysis for the early detection of AD. They offer a detailed context for our research, showcasing the progression and enhanced complexity of methods within this developing field. However, the extensive architecture of the Transformer model, as described by Vaswani et al., 23 has limited its application in early AD diagnosis tasks due to its size and computational demands. In response, we propose a simplified self-attention based method tailored specifically for this application, which has demonstrated its effectiveness through extensive experimental validation.
Dataset overview
In this paper, we employ the DARWIN dataset, widely recognized as one of the largest and most popular benchmark datasets available.
13
This dataset is composed of handwriting data from 174 participants, including 89 individuals diagnosed with AD and 85 healthy controls. Each participant completed a total of 25 distinct handwriting tasks, which are categorized into four groups: graphic tasks, copy tasks, memory tasks, and dictation tasks. Each category is designed to test different aspects of handwriting and cognitive function:
This diverse set of features provides comprehensive quantitative data on handwriting behavior, useful for various analyses including motor control, neurological health, and personal identification.
Methods
We introduce a novel method illustrated in Figure 1, comprising three blocks with skip-connected self-attention modules and linear layers. This architecture aims to extract robust semantic feature representations from the features obtained from different handwriting tasks utilizing self-attention mechanisms.

The proposed method's architecture. The input involves concatenating features from all 25 handwriting tasks, each represented by an 18-dimensional 1D vector. The method comprises three blocks with a self-attention module (gradient green) and a linear layer (dark blue), which concludes with a final linear layer (pink) after the three blocks, yielding a binary output: “Patient” (red) or “Health” (green).
We introduce a novel method designed to enhance the early diagnosis of AD through the analysis of handwriting features extracted from the DARWIN dataset. Our model incorporates three structured blocks, each equipped with self-attention mechanisms and linear layers, and concludes with a final linear layer for binary classification.
Our proposed method is designed to handle feature inputs of varying lengths and is not limited to 25 handwriting tasks. Accordingly, in the sections Results for individual tasks and Results excluding individual task, we conduct experiments to assess individual handwriting tasks and to evaluate the impact of excluding one handwriting task at a time, respectively.
Handwriting feature (yellow)
The input handwriting feature consists of 25 handwriting task vectors. Each handwriting task vector encompasses 18 characteristics, each quantified as a float value ranging from 0 to 1. This structure forms a feature matrix where the input is a sequence comprising these 25 feature vectors, spread across 18 feature channels.
Self-attention and linear processing (blocks 1 to 3)
Each of the three blocks incorporates a self-attention module and a linear transformation module, connected by skip connections. The structural repetition across the blocks ensures a deep and thorough analysis of handwriting features, enhancing the model's sensitivity to handwriting features. Blocks 2 and 3 share the same architecture as Block 1. Therefore, to avoid redundancy, they are not depicted in Figure 1.
Self-attention module (gradient green)
The self-attention module within each block of the network plays a crucial role in enhancing its capability to selectively focus on the most relevant features for the task of early AD diagnosis using handwriting data. This module operates using three primary elements: Query (Q), Key (K), and Value (V). The Query represents the current feature set under analysis, while the Key components consist of all potential comparable elements or feature sets. Each Key is associated with a Value that holds the actual feature data from the input.
In the self-attention mechanism, a score is computed to determine how much attention each part of the input data should receive relative to the current Query. This score assesses the compatibility between the Query and each Key, helping to gauge their relevance. The normalization of these scores, typically through a softmax function, enables the network to apply a weighted sum to the Values. The output, therefore, is a composition of these Values, adjusted according to their computed relevance, highlighting the features most indicative of AD.
Furthermore, the architecture includes skip connections that link the input of each block directly to its output, crucial for preserving important information throughout the network layers. These connections are instrumental in facilitating a smooth gradient flow during training, which is vital for mitigating the risk of information loss and combating the vanishing gradient problem. This structure ensures more stable and effective updates to the model parameters.
Overall, the self-attention module's ability to dynamically adjust the network's focus on informative features significantly enhances its sensitivity and robustness in diagnosing AD. By identifying and emphasizing the most crucial features in handwriting data, the module improves the network's diagnostic accuracy, making it a potent component of the model's design.
Linear module (dark blue)
Subsequent to the self-attention process, a linear module modifies the attended features into more advanced representations. This transformation is crucial for altering the feature dimensions across various blocks.
Final linear layer (gradient pink)
After the sequential processing through the three blocks, the refined sequence of features are concatenated into a unified feature representation and directed towards a final linear layer. This layer projects the processed data into a binary classification output, distinguishing between ‘Patient’ and ‘Health’. This classification step is critical for applying the extracted handwriting features into a diagnosis decision.
Results
Implementation details
In our experiments, we employed a learning rate of 6 × 10−4 and incorporated an early stopping mechanism that terminates training if no improvement in validation loss is noted over 20 epochs. The model's architecture features three blocks with feature sizes set at 18, 64, and 64 respectively, facilitating the gradual refinement of handwriting feature representations critical for detecting subtle differences associated with AD. Additionally, we introduced a dropout rate of 0.1 within the self-attention module to enhance model generalization and prevent over-fitting. The training culminates in a final linear module equipped with a sigmoid activation function designed for binary classification so as to output close to 1 for ‘Patient’ and near 0 for ‘Health’. For making the final decision on classification, we apply a threshold of 0.5, where values above this threshold classify the result as ‘Patient’ and values below as ‘Health’. This model underwent training on a high-performance computing setup, utilizing an AMD Ryzen 9 7900X CPU and an NVIDIA RTX 3090 GPU, supported by 64GB of RAM. The system was operated under Ubuntu 20.04 LTS, using Python version 3.11. Additionally, the model leveraged CUDA version 12.1 and Pytorch version 2.3.0 to optimize computational processes and neural network training. Given the current hardware and software setup, training a single epoch takes 0.058 s. For comprehensive training, about 80 epochs are required, which includes 20 epochs dedicated to early stopping, culminating in a total of 100 epochs. Therefore, the average total training duration is 5.8 s. Our proposed model consists of 0.24 million parameters. Utilizing float32 for data storage, the model's size amounts to 0.96 MB.
Evaluation metrics
To ensure a fair comparison, we adopt the metrics from the original work
13
and the current state-of-the-art work
20
for our experiments. The reported results of our proposed method are derived using 5-fold cross-validation.
Results for individual tasks
In this section, we aim to ascertain which tasks are most influential by systematically evaluating the performance derived from the features of each individual task within the 25 tasks. Our motivation for focusing on individual tasks is to identify those that disproportionately contribute to overall effectiveness, allowing for targeted improvements in our methodology. This approach helps us understand the unique impact of each task and refine our mode's performance accordingly. The tasks are detailed in. 13 From the results presented in Table 1, two tasks stand out due to their superior performance among the 25 evaluated. Task 6, which involves copying the letters ‘l’, ‘m’, and ‘p’, and Task 17, which requires copying six words (regular, non-regular, non-words) into the appropriate boxes, both demonstrate significant effectiveness. These tasks likely provide valuable insights due to their ability to assess fine motor skills and cognitive processing speed, both of which are crucial in the early detection of cognitive decline associated with AD.
Results for individual tasks. Text in bold indicates the highest score.
Results excluding individual task
In this section, we analyze the impact of excluding individual tasks from a combined evaluation of 25 tasks on the overall diagnostic performance for AD. This approach is motivated by the need to identify which tasks are critical and which might be redundant, thereby optimizing the task set for enhanced diagnostic accuracy and efficiency. By systematically removing each task and assessing the resultant changes in performance, we aim to refine our understanding of each task's unique contribution to the diagnostic process. Table 2 presents the results when each task is omitted one at a time, and the remaining tasks are analyzed collectively. Key observations can be made from the results, particularly when Task 1 (Signature Drawing), Task 10 (Copy the word “foglio”), and Task 20 (Write a simple sentence under dictation) are excluded. Despite their exclusion, the performance metrics such as accuracy, precision, sensitivity, specificity, and F1-score remain relatively high.
Results from combined 25 tasks excluding individual task. Text in bold indicates the highest score.
Comparison of SOTA
Table 3 presents a comparative analysis of various state-of-the-art methods against our proposed Self-Attention (Self-Attn) method, with metrics including accuracy, precision, sensitivity, specificity, and F1 score. From the table we can observe the advanced performance of the Self-Attn method, highlighting its efficacy and potential in the early diagnosis of AD.
Comparison with the state of the art using all 25 handwriting tasks. Text in bold represents the highest score, while text that is underlined signifies the second-highest score.
The Self-Attention model achieves remarkable accuracy of 94.3% and an F1 score of 94.5%, outperforming other SOTA models from both traditional machine learning and deep learning approaches. However, it ranks second in terms of Sensitivity, with the LGBM method achieving the highest performance at 95.8%. Notably, our method outperforms the previous best, the CNN, by approximately 4% in both accuracy and F1 score, which were previously recorded at 90.4% and 90.4%, respectively. Its superior precision 94.7%, sensitivity 94.5% and specificity 94.1% suggest that it not only predicts with high reliability but also excellently identifies true positive and true negative cases, a crucial attribute in medical diagnostics. While our proposed method achieves the second-best performance in terms of sensitivity, the overall effectiveness of our approach underscores the utility of the self-attention mechanism in diagnosing AD.
This leap in performance can be attributed to the novel architecture of the Self-Attention model, which likely leverages deeper and more complex layers of attention mechanisms, enabling it to capture intricate patterns and dependencies in data more effectively than traditional models. This is particularly advantageous in medical diagnostics, where nuanced variations in data can be pivotal.
Interestingly, traditional machine learning models like KNN and SVM outperform modern deep learning architectures such as MobileNetV2 and VGG19. This may seem surprising given the prevalent use of deep learning for its ability to manage large datasets and complex patterns. However, KNN and SVM excel when the training features are highly indicative of outcomes, such as clear Alzheimer's markers in the data. In contrast, CNNs like MobileNetV2 and VGG19 might weaken the influence of key features due to their focus on extracting complex feature hierarchies, and without proper tuning and sufficient data, their performance can lag.
Overall, the data from Table 3 not only highlights the effectiveness of the Self-Attn model but also marks a significant step forward in the application of deep learning techniques in medical diagnostics. This suggests a promising avenue for future research and application, potentially leading to more accurate and timely diagnostics in clinical settings.
Discussion
AD is a severe neurodegenerative disease often characterized by physical deterioration and cognitive decline. The process of AD is complex with insidious onset and progressive development. Thus, a better understanding of the early AD can help us to provide effective care and treatment with early intervention. Though neuroimaging-based early AD diagnosis like MRI serves as a robust modality for delineating structural and morphological alterations, complex analyses and specialist resources are required. It is worthwhile noting that both cognitive functions and motor abilities are closely related to the act of writing. In such situations, the assessment of handwriting analysis can deliver added robust outcomes for clinical application and complement the capabilities to diagnose early AD.
In this study, we introduced an innovative self-attention mechanism-based method for early AD diagnosis through handwriting analysis. This approach is pioneering in its application, utilizing handwriting as a practical diagnostic tool that reflects the motor skill impairments typical of neurodegenerative diseases. Our findings, derived from a comprehensive evaluation using the DARWIN dataset, which encompasses 25 distinct handwriting tasks, confirm our initial hypothesis that a self-attention mechanism can enhance diagnostic accuracy. Indeed, our method reaches or even surpasses state-of-the-art performance across multiple metrics including accuracy, precision, specificity, and F1 score, although it achieved the second-best performance in terms of sensitivity.
These results are significant as they validate the effectiveness of handwriting analysis in the early detection of AD, supporting the notion that motor skill deterioration is an effective biomarker for early-stage neurodegenerative disease. While our results align with the broader research that underscores the potential of machine learning in medical diagnostics, they introduce a novel aspect by successfully applying self-attention mechanisms in this context.
However, the dataset used in this study is comprised solely of Italian users, which may limit the generalizability of the findings. To validate these results further and ensure the method's robustness across various demographic and ethnic groups, additional studies involving more diverse populations are necessary.
Moreover, further research could explore integrating this handwriting-based approach with other diagnostic tools for a more holistic evaluation of AD, potentially increasing the diagnostic accuracy and effectiveness further.
In summary, our study contributes to the growing field of AI in healthcare by demonstrating that advanced machine learning techniques can significantly enhance early AD diagnosis, potentially leading to better patient outcomes during the critical initial stages of the disease. This method not only confirms the viability of handwriting analysis as a diagnostic tool but also opens new avenues for research into its application for other neurodegenerative conditions.
Footnotes
Acknowledgments
The authors have no acknowledgments to report.
Author contributions
Lei Kang (Conceptualization; Funding acquisition; Methodology; Software; Visualization; Writing – original draft); Xiaolei Zhang (Formal analysis; Investigation; Validation; Writing – review & editing); Jitian Guan (Data curation; Investigation; Resources); Kai Huang (Data curation; Resources); Renhua Wu (Funding acquisition; Investigation; Project administration; Supervision; Writing – review & editing).
Funding
This work has been partially supported by the grant 62206163 and 82020108016 from National Natural Science Foundation of China, STKJ2023076 from Science and Technology Major Project of Guangdong Province.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
The data and codes used for the analysis are available from the corresponding authors upon request.
