Abstract
Background:
Alzheimer’s disease (AD) is a neurodegenerative disease that drastically affects brain cells. Early detection of this disease can reduce the brain cell damage rate and improve the prognosis of the patient to a great extent. The patients affected with AD tend to depend on their children and relatives for their daily chores.
Objective:
This research study utilizes the latest technologies of artificial intelligence and computation power to aid the medical industry. The study aims at early detection of AD to enable doctors to treat patients with the appropriate medication in the early stages of the disease condition.
Methods:
In this research study, convolutional neural networks, an advanced deep learning technique, are adopted to classify AD patients with their MRI images. Deep learning models with customized architecture are precise in the early detection of diseases with images retrieved by neuroimaging techniques.
Results:
The convolution neural network model classifies the patients as diagnosed with AD or cognitively normal. Standard metrics evaluate the model performance to compare with the state-of-the-art methodologies. The experimental study of the proposed model shows promising results with an accuracy of 97%, precision of 94%, recall rate of 94%, and f1-score of 94%.
Conclusion:
This study leverages powerful technologies like deep learning to aid medical practitioners in diagnosing AD. It is crucial to detect AD early to control and slow down the rate at which the disease progresses.
INTRODUCTION
The availability of voluminous neuroimaging data and efficient classification algorithms has motivated researchers to develop artificial intelligence (AI) systems for the early detection of Alzheimer’s disease (AD). Early diagnosis of AD leads to therapeutic development in the patient’s disease condition and improved health care. The significant changes to the brain regions of the patient aid in detecting AD. The synapsis, cell connection area in the brain, is where neurotransmitters are released. Synapsis is the channel for information flow between the sender and receiver neurons. Around the synapsis, there is a deposition of amyloid-β and tau tangles. The amyloid-β causes cell death, disrupting the communication between neurons. Tau tangles in the synapsis area block the nutrients supply to the neurons. The end of brain cells results in the shrinkage of the whole brain volume. Cell death also causes changes in hippocampus shape, cortical thickness, and cerebral regions [1]. Death of brain cells leads to the onset of AD. AD affects brain function progressively over a period. AD patients in the initial stage forget events and conversations, and in later stages have difficulty in planning and decision making, developing language and speech problems. In the final stages, they lose their way even in familiar places [2]. In severe cases, AD patients show symptoms such as weight loss, seizures, skin infections, and increased sleeping [3]. AD is challenging to diagnose as there is no single test to diagnose the disease. It is diagnosed with clinical, imaging, and cognitive tests on patients. Detecting AD at an early stage can help patients plan their future and treatments. Early detection saves the costs of medical and long-term care [4]. The latest technology development in AI has enabled automated diagnosis of AD with deep learning and machine learning algorithms. Deep learning techniques can diagnose AD automatically with precision with little human intervention. Early detection of AD and timely treatment result in the delay of disease advancement [5]. This motivated the paper’s authors to develop an AI model for early AD detection. The standard dataset used by researchers includes the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, where the dataset is labeled as dementated, very mildly demented, mildly demented, and moderately demented [6].
The advent of the deep learning paradigm has facilitated the extraction of high-level features from medical images like magnetic resonance imaging (MRI), positron emission tomography, and computed tomography [7] followed by the classification of images as dementated or non-demented. Deep learning techniques give high accuracy for image classification problems. In recent years, deep learning has made remarkable advancements in the state-of-the-art classification accuracies in the fields of acoustic signal processing [8], medical image processing, speech emotion recognition [9], natural language processing [10], and pattern recognition [11]. The challenging task for traditional machine learning algorithms is acquiring a labeled dataset. The brute force method for classification compares the features of the image with the target image. This model performance deteriorates when images are distorted, disfigured, or noisy. Deep learning techniques overcome this limitation. The deep learning algorithms are apt for large and high-dimensional datasets. The past few decades have witnessed deep neural networks for classifying images, video, and speech signals. The sophisticated architecture of the deep learning model ensures effective discriminative feature extraction to improve model performance [12]. The architecture of the deep learning model allows the deep learning model to handle noisy data. Unlike the ML algorithms, the deep learning technique does not require human intervention and preprocessing phase to extract image features [13]. The proposed study is to develop an AI system that adopts convolution neural network (CNN) as the deep learning technique for classification. The CNN steps in where it can match the similarity of the images in degrees rather than the exact matching of pixels. CNN has gained the attention of researchers due to excessive power and performance over conventional machine learning algorithms. This research study aims to develop an automated tool for detecting AD with high accuracy and a low false negative rate. The proposed methodology is a novel methodology for AD detection. The preprocessing by Contrast Limited Adaptive Histogram Equalization (CLAHE), and the data augmentation followed by classification with a customized CNN model with apt parameters contribute to the novelty of the method. This study has customized CNN with deeper layers that avoid vanishing gradient issues. Hence there is considerable improvement in the performance of the model. Improved performance results in the reduction of false negatives in classification. False negative diagnosis of the disease can lead to delayed treatment and deterioration of the patient’s condition.
MATERIALS AND METHODS
Data source
The well-cleaned dataset, along with the efficient learning algorithms, has led to the design of accurate models for classification. In this study, the MRI images from the patients are subject to a deep learning methodology for classifying the patients as demented or cognitively normal. The MRI images were sourced from the Kaggle dataset [14]. The MRI images are of size 224×224. Conversion of the images to a suitable size allows them to fit into the proposed CNN model. The dataset has four classes: very mild demented, mild demented, moderate demented, and non-demented.
Artificial neural network

Proposed CNN-based AI system for AD detection.
Over decades computer vision was dominated by the artificial neural network-based CNN for image classification. CNN has been an area of interest to radiology researchers in the field of classification, object detection, and image reconstruction. CNN finds applications in areas like computer vision, face recognition, speech recognition, gesture recognition, and image classification. CNN is a deep learning technique in use by data scientists for the effective classification of images [15]. The advantage of the pre-trained CNN is the transfer learning. The architecture of the different models varies in terms of internal layers. This class of algorithms can generate an output based on prior knowledge of data. CNN can be categorized based on the characteristics of its layers. The Alex Net, GoogleNet, and Visual Geometric Group (VGG) have sequentially connected shallow layers. GoogleNet aims to achieve accuracy at a low computation cost. It performs feature extraction by merging, transforming, and splitting functions. GoogleNet uses convolution kernels of much smaller size that regulate the computation. The GAP is the end layer that replaces the FC layer. While the advantages of GoogleNet lie in achieving high accuracy at low computation cost, the disadvantage is heterogeneous topology representation leading to information loss [23]. VGG proved that smaller filters of size 3 by 3 perform better feature extraction than large-size filters. VGG is one of the efficient image classification techniques, but VGG has a high computation cost. The ResNet and DenseNet have more deep layers compared to GoogleNet. ResNet introduced the concept of cross-connections between layers and prevents vanishing gradients issues, but the weights being isolated is a limitation. DenseNet exhibits cross-layer depth-wise convolutions. The DenseNet differentiates the preserved features from added ones by concatenating the previous layer’s features. In the CNN with deeper layers, the vanishing gradient problem in sparse layers is overcome; hence, this model has improved performance. CNN has convolution, pooling, and fully connected layers [24]. Every layer transforms the input image into informative data form that is useful for classifying the images.
AI system for AD detection with deep learning technique
Figure 1 shows the architecture of the AI system for AD detection. CLAHE enhances the contrast of the MRI images. The contrast-enhanced MRI images undergo data augmentation that introduces generalization to the model. The data augmentation phase is essential in training the model with versatile data samples that improve the model’s performance [16]. The convolution layer extracts the image’s interesting features using kernel shifting to improve the model’s accuracy. Pooling reduces the feature map dimension to limit the parameters that the model learns. The fully connected layer performs the image classification based on the extracted features. The next step is CNN regularization to avoid overfitting the model [17]. Adaptive movement estimation (Adam), a replacement optimization algorithm for stochastic gradient descent, is known for excessive memory and good optimization. The optimizer customizes the attributes in neural networks. The loss function detects the errors in the model.
Preprocessing
CLAHE is one of the most common contrast enhancement methods to preprocess MRI images. CLAHE enhances the inherent structural information of the objects in the image. Improving the image’s contrast enhances the visibility of the minute objects in the image. CLAHE is an extension of adaptive histogram equalization (AHE) where the pixels are enhanced based on the histogram of the neighborhood pixels [18]. In CLAHE, setting a clipping limit to the images can eliminate over-enhancement problems in AHE. This clip limit also prevents the over-enhancement of noise and shadowing effects. The MRI images subjected to CLAHE show contrast enhancement without noise enhancement. Resizing the images to 224×224 pixels is performed after CLAHE. The pixel value is changed proportionally to the importance of neighboring pixels.
Data augmentation
Data augmentation is a preprocessing technique that avoids the overfitting problem in the model by modifying the dataset by adding transformed images of the same dataset [19]. Deep learning models cause overfitting due to low bias and high variance. Overfitting the model gives an increased performance on training data and a low performance on test data. Data augmentation resolves imbalanced data issues in AI models. The data augmentation technique aims at modifying the data to improve model generalization. Data augmentation increases the size of the dataset by cropping, zooming, re-scaling, vertical flipping, and horizontal flipping. The data augmentation technique makes the model robust by modifying the dataset with various input images. Classification preceded by data augmentation improves the model accuracy to a great extent.
Convolution layer
The convolution layer is the prime component of CNN. The kernels of the convolution layer extract features from the MRI images. In this layer, the features are extracted from input images by filtering. The convolution kernel for image manipulation is a two-dimensional matrix of size n × n. The kernel is a matrix of random values, and each value is called kernel weight. The kernel weights are multiplied by the pixels in the input image section of size n × n and summed up to get a single pixel value. The convolution kernel slides in steps to process all the pixels in the 2D image. The sliding can be horizontal or vertical and continues for the entire image. The multiplied values of the kernel weights and the image pixel intensity create the feature map of the output. The kernel size defines the convolution layer and combination of multiple kernels form filters. If the kernel is 2D, then the filter is three-dimensional. The convolution kernel is applied to the input image and manipulated as transposed convolution, separable convolution, or dilated convolution [20]. The tensor is an array of pixels, a two-dimensional matrix with size n × n. The pixel-wise product of the kernel and input tensor is computed to set the output value in the respective position of the output tensor. The output tensor is the feature map. The different kernel sizes can extract various features. The feature extraction depends on the hyperparameters of convolution, such as size and the number of kernels. The kernel size can be 5×5 or 3×3. Equation 1 represents the convolution process for an image I with the kernel k.
The center of the kernel does not align with the side of the image, as the input image has zero padding. The zero-padding process is padding the image with zeros on all sides to fit its dimensions. The kernel and the input image convolve to generate activation maps. Dot operation applies to the kernel and image area of the same size while convolving. The patch selection moves along a certain distance called the stride value for the entire image [21]. The calculated dot product of the activation matrix is divided by the sum of values in the kernel matrix. The kernels shared across the images introduces weight sharing. This weight sharing allows the kernel to capture all the features of the image and learns the spatial hierarchy of features. It increases the effectiveness of the model by optimizing the parameters. The hyperparameters refer to the size of the kernel, the number of the kernel, zero padding, and stride values. The convolution layer learns the deterministic features of the data. The convolution layer extracts the features from the MRI images of AD patients.
Pooling layer
The pooling layers optimize the feature map dimensions. They perform subsampling of feature maps. The complexity of the data reduces with the reduction of feature map dimensions. The pooling reduces the number of parameters learned by fully connected layers. This optimization step enhances the classification capability of the CNN model [22]. The pooling layer slides a 2D filter over each channel of the feature map. This pooling process summarizes the features underneath the filter region. If n
h
, n
w
, and n
c
are the height, width, and number of channels in the feature map, then the output image dimension after the pooling layer is n
h
* n
w
* n
c
. The different pooling forms include maximum pooling, minimum pooling, average pooling, and the average of squares pooling.
F i is the component feature extracted from ith patch, and n is the number of overlapping features. Maximum or minimum pooling is recommended for images where the region of interest is localized. The average pooling operation is considered when the region of interest is distributed. The pooling layer uses the ensemble method of pooling. The type of pooling has an impact on backpropagation from the output for parameter updating. With the max or min pooling, only a few patches contribute to the gradient [23].
Dense fully connected layer
The final layer is a fully connected ANN that performs the image classification based on the extracted features. The fully connected layer extracts feature from medical images and classify them. The fully connected layers receive the mid and low-level features of the medical images and generate a high-level abstraction, the final layer in a typical neural network. The output of the pooling layer, which is a vector, is the fully connected layer’s input. The fully connected layer is the conventional multiple-layer perceptron neural network where each neuron is connected to every other neuron. The final fully connected layer typically has the same quantity of output nodes as the number of classes [24]. The fully connected layer classifies the MRI images as cognitively normal or demented.
CNN regularization
There is a pressing need for CNN regularization due to the issue of overfitting. This overfitting issue can be handled by several generalization methods like drop out, drop weights, data augmentation, and batch normalization. In this study, we adopt batch normalization for the CNN regulation. In each activation layer, batch normalization reduces the internal covariance shift. The MRI images come from different sources, so the transition is primarily due to continuous weight updating during training. Introducing a new batch normalization layer increases the training time, but it is advantageous. It handles the poor weight initialization problem of vanishing gradient and drastically reduces the network convergence period. In a nutshell, the reduction of overfitting improves the model performance [25].
Loss function
The loss function evaluates the model performance. These loss functions in the output layer evaluate the predicted error of the classification model with the training data. This error shows the gap between the predicted and the actual results. The two parameters of the loss function are the prediction output and the actual output. The loss function chosen to evaluate this proposed system is the cross-entropy function. The log loss function is calculated with the formula in Equation 6.
Optimizer
The optimizer in any neural network aids in customizing the attributes of the neural network to improve the model’s prediction accuracy. The initial weights of the neurons are set with a particular strategy that varies for different optimizers. The network is updated with training epochs, but the network should find the optimum epoch to minimize the error. The parameter learning rate is the parameter that updates the step size. In this study, the optimizer chosen is Adam which has the advantages of more memory and lower computation power. Adam computes individual adaptive learning rates based on momentum. The Adam leverages vital features of RmsProp as well as AdaGrad [27]. The learning rate is scaled by squared gradient as in RmsProp.
Figure 2 represents the architecture of the DenseNet layer for the proposed AD detection model. The preprocessed MRI images are input to the convolution layer, and extracted features are given to the pooling layer for optimization of the feature map and finally to the fully connected layer to be classified as demented or non-demented.

Architecture of DenseNet layer for proposed AD detection model.
Performance evaluation
Studying the performance of state-of-the-art methodologies is inevitable to compare the model performances. The standard metrics for performance evaluation of any classifier model are accuracy, precision, recall rate, area under the curve (AUC), and f1 score [28]. Accuracy is the ratio of the number of accurate classifications to the total number of sample of classifications. The sum of the true positive (TP) count and false positive (FP) count affects the number of accurate classifications the model predicts.
Precision is yet another metric to evaluate the model performance. Precision states what proportion of the correct classification of the labeled feature was also correct according to the previous classification.
Accuracy rates the model’s performance based on positive and negative predictions, whereas precision rates the performance based on positive predictions. Recall rate is the count of true positives detected correctly as true positives.
Improving the precision reduces the recall rate. The receiver operator characteristic (ROC) curve is the graphical representation of the probability that plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at varying thresholds. The AUC is the area under the classifier’s ROC curve. AUC depicts how well the model can classify the data samples. With AUC value being 1 the model can accurately classify all the test samples in the dataset.
Experimental design
In the proposed study, Kaggle Dataset provides the MRI images. The CNN architecture is selected in this study as it minimizes computational complexity and enhances classification accuracy. There are eight convolution 2D layers. The padding function is the default value defined for the Sequential model. The first convolution layer has 16 filters. The second and third layers are designed with 32 convolution filters. The fourth and fifth layers have 64 convolution filters. The sixth, seventh, and eighth layers have 128 convolution filters. Four Max pooling 2D layers are used. The pool size is 2×2 and the stride value is 2. The activation function used here is Rectified Linear Unit (RELU). The activation function is a piece-wise linear function. The inputs have a spatial dimension, so the values are converted into a vector. The proposed model has four dense layers; each dense layer has 32 hidden layers, and the final dense layer has 12 layers. In the final layer, SoftMax as an activation function improves the accuracy [22]. The loss function is categorical_crossentropy. The optimizer used is Adam, and the optimizer learning rate is 0.002. The proposed deep-learning model for AD diagnosis was implemented with the open-source language Python. All experiments were performed on Windows 10 operating system with 64-bit 4 GB RAM, 500 GB SSD, and Intel i5 10th Gen chip. The proposed system was implemented using Google collab resources. The model was trained with the GPU of Google Collab. The programming language used is Python3 https://www.python.org and deep learning libraries of TensorFlow https://www.tensorflow.org.
RESULTS
The images are preprocessed in this proposed AI model by CLAHE and augmentation technique. In medical images, contrast enhancement is crucial for removing noise and improving the visibility of minute structures. The image clarity directly impacts the accuracy of the model. The data augmentation step improves classification accuracy since the flipped images in the dataset create a generalized dataset that allows the deep learning algorithms to identify features of augmented images. The augmentation techniques used are horizontal flip, vertical flip, zoom, resize, and greyscale conversion. Pooling layers in the proposed model reduce the unwanted features and detect important patterns that differentiates between demented and non-demented images. The convolution filter size is maintained as 3×3 to reduce the computational complexity. Adding eight convolution layers showed the highest accuracy. The customized CNN model has added value to the improved performance of the model. The activation function adds the nonlinear factors to eliminate redundant data while preserving features. This feature mapping is excellent for solving complex nonlinear problems [29]. The model efficiency improves with RELU as an activation function. RELU is used as an activation function over Sigmoid as it has the property of reducing the vanishing gradient issue. In RELU gradient value is constant and leads to a faster learning process. In the Sigmoid activation function, the gradient becomes very small over a period, resulting in a dense representation. Sparse representation is preferred over dense representation. This model was designed with eight convolution layers. The entire methodology works on minimizing the loss backpropagation technique. In the training phase, the number of epochs given is 23. The standard metrics to evaluate the model’s performance include accuracy, precision, recall rate, and f1-score. The model performance for the specified dataset for demented and non-demented train samples exhibits an accuracy of 98%, precision of 94%, recall rate of 94%, and f1-score of 94%. The accuracy of the model is higher than the state-of-the-art methodologies. The model shows an improved precision rate as the number of false positives detected by the model is less. The number of false positives drastically affects the model’s precision rate. The number of true negatives is low; hence the recall rate is high. The area under the curve is 0.97, which indicates that the model performance is outstanding.

Accuracy versus Epochs.
DISCUSSION
The proposed system in this study of AD detection is exhibits improved model performance compared to the existing methodology. Figures 3 6 showcase the efficiency of the proposed method. The visualization of the training and testing results for various metrics against the epochs is shown in the graphs. Figure 3 exhibits the history of accuracy for train and validation data for epochs from 0 to 10. This graph indicates a sharp increase in the accuracy in the first two epochs from 0.94 to 0.98, and then the accuracy remains constant. The flattened curve represents that the accuracy is consistent for the remaining epochs. The flattened curve indicate that few epochs are required to train the model. In Fig. 3, the flattened validation curve is an upper bound to the training curve and demonstrates that there is no overfitting of the model. The gap between the training and validation curve is linearly proportional to overfitting. The smaller the gap between the training and validation curve, the lesser the overfitting of the model [30]. Figure 4, representing the loss history, shows a decline in the linear curve from 6 to 1 for the first two epochs. It indicates that the loss of training data is rapid in the first two epochs and remains flat for the remaining epochs. The loss of the validation set is constant, emphasizing that the model generalizes to the new data pattern. Figure 5 shows the history of the AUC for each epoch. The AUC for the training dataset is close to 1 for the epochs above 2. This indicates that the proposed model has a high classification accuracy for validation data. Figure 6 represents the precision rate for each epoch varying from 0.88 to 0.94. The precision grows sharply below the second epoch and after which it is constant. Figure 7 shows the history of the f1-score, varying from 0.86 to 0.94 for different epochs. The graphs discussed above show that the model’s performance has shown remarkable linear improvement within the first few epochs.

History of loss versus Epochs.

Area under curve versus Epochs.

Prediction versus Epochs.

History of f1-score versus Epochs.
Table 1 compares the performance of state-of-the-art methods with the proposed methodology. The proposed methodology shows improved accuracy and recall rate compared to the existing methods. Model exhibits an accuracy of 98% higher than ResNet model. The precision is 94%, and recall rate is 94%, much higher compared to the ResNet classifier model of Odusami [31]. The model’s f1-score is higher than the other methodologies considered in this study. The high performance of the model is the strength of this study. It is not recommended for any classification model’s performance to be evaluated solely on the accuracy metric. The other metrics, such as precision and, f1-score are vital in model evaluation. Regarding this study, all the standard evaluation metrics are experimentally computed and tabulated. Precision refers to the rate of false positives in a model’s classification. The high precision rate indicates that the false positives are few. False positives resulting from the disease condition can lead to chaos for patients and their families as patients undergo unnecessary tests and treatments. The recall rate is high in the proposed model compared to the existing state-of-the-art methods. The high recall rate indicates that the false negatives are relatively small. False-negative diagnosis of disease condition leads to delayed treatment of the patients. A patient diagnosed with AD at an early stage have enough time to plan their future and have higher chances of slowing down the progress of the disease condition [32]. These notes make the proposed methodology superior compared to the state-of-the-art methods. The proposed methodology’s advantage is that the dataset includes only the MRI images of the patients. The dataset does not include cognitive tests and the adverse effects due to their interpretation differences. Cognitive tests, whether computerized or not, have pros and cons [33].

MRI images of non-demented, very mild demented, moderate demented and mild demented patients in the dataset.
Comparison of proposed methodology with existing methods
aMethodology is the classifier or deep neural network technique adopted by the authors in the research study for classification of the test dataset. bClassification category are the labelled classes in the dataset. The deep neural network techniques classify the dataset into classes specified in the classification category. cThe research study listed in the table above is compared on the accuracy, precision and recall rate.
CONCLUSIONS
In recent years several CADx were developed based on hand-crafted feature extraction techniques followed by conventional ML classifiers. Adopting deep learning techniques over conventional ML classifiers has considerably improved the performance of AI models. The conventional ML classifiers require segmentation and feature extraction of the brain region to classify AD-affected images. This study exhibits better performance compared to the existing models as the customization of the CNN model is done carefully. Data augmentation plays a significant role in fine-tuning and generalizing the dataset. This voluminous dataset is a demand of every deep learning model to give higher accuracy. The novelty of this study is preprocessing with CLAHE, and data augmentation prior to classification by customized CNN with unique parameters. Deep learning technique is a boon to clinical radiologists in disease diagnosis. Radiologists can rely on AI models as a second opinion to diagnose the patient’s disease condition. The AI model is built with deep learning algorithms with hidden layers present abstraction in selecting deterministic features in classifying the images. The deep learning models demand large dataset for improved model performance. As every coin has two sides the AI deep learning model has its pros and cons. Future research work will compare the various deep learning techniques with the benchmark dataset.
Footnotes
ACKNOWLEDGMENTS
We sincerely thank Mr. Armaan Ziyad of the Yara International School, Riyadh, for programming the deep learning model and the data visualization for this research paper.
FUNDING
This study is supported via funding from Prince Sattam bin Abdul University project number (PSAU/2023/R/1444).
CONFLICT OF INTEREST
The authors have no conflict of interest to report regarding the present study.
