Abstract
Background:
The Pentagon Drawing Test (PDT) is a common assessment for visuospatial function. Evaluating the PDT by artificial intelligence can improve efficiency and reliability in the big data era. This study aimed to develop a deep learning (DL) framework for automatic scoring of the PDT based on image data.
Methods:
A total of 823 PDT photos were retrospectively collected and preprocessed into black-and-white, square-shape images. Stratified fivefold cross-validation was applied for training and testing. Two strategies based on convolutional neural networks were compared. The first strategy was to perform an image classification task using supervised transfer learning. The second strategy was designed with an object detection model for recognizing the geometric shapes in the figure, followed by a predetermined algorithm to score based on their classes and positions.
Results:
On average, the first framework demonstrated 62%accuracy, 62%recall, 65%precision, 63%specificity, and 0.72 area under the receiver operating characteristic curve. This performance was substantially outperformed by the second framework, with averages of 94%, 95%, 93%, 93%, and 0.95, respectively.
Conclusion:
An image-based DL framework based on the object detection approach may be clinically applicable for automatic scoring of the PDT with high efficiency and reliability. With a limited sample size, transfer learning should be used with caution if the new images are distinct from the previous training data. Partitioning the problem-solving workflow into multiple simple tasks should facilitate model selection, improve performance, and allow comprehensible logic of the DL framework.
Keywords
INTRODUCTION
The Pentagon Drawing Test (PDT) is a sub-test of the Mini-Mental State Examination (MMSE), which is one of the most widely used screening tools for cognitive impairment in older adults [1]. Examinees are asked to reproduce two pentagons intersecting into a quadrilateral-shape area. The PDT is a quick test to evaluate examinee’s visuospatial function, which is often impaired in neurodegenerative diseases, such as Parkinson’s disease and Alzheimer’s disease [2, 3]. Studies have demonstrated that this test is effective in differentiating various types of cognitive deficits and predicting a global cognitive dysfunction [4–10]. The PDT is typically scored in a binary fashion, with 1 point for correct reproduction of the entire figure and 0 for an unsuccessful task.
Scoring of the PDT is manually performed by specialists or trained technicians. While this task may be easy for a limited number of cases, it becomes increasingly challenging and time-consuming in the face of big data, such as community-based screening or individual lifelogs. In addition, to determine whether the scoring criteria are met is in fact a subjective judgment. It may potentially be subject to various subtle uncertainties in the figure such as the presence of ambiguous shapes, slightly curved edges, and asymmetric pentagons. Like other subjective assessments, the inter- and intra- rater reliability can be affected by numerous factors like judge’s internal standard, experience, attention, and mistakes [11–13]. In these regards, automatic scoring of the PDT by artificial intelligence (AI) may overcome both limitations by virtue of AI’s superior efficiency and consistency over humans. Moreover, it may reduce cost and facilitate remote and electronic administration of the PDT and MMSE, which is in accordance with the trend of digital health during the pandemic and in the future.
Deep learning (DL) has emerged as a promising AI technique that markedly promotes data-driven research in healthcare [14, 15]. DL-based algorithms are distinct from traditional machine learning approaches by their ability to capture complex representations from raw data [16]. Particularly, convolutional neural networks (CNNs) are the most widely used DL architectures in various fields, especially in computer vision. They have been shown to match or even outperform human experts in numerous image-based tasks in medicine, such as anomaly detection, diagnosis, and outcome prediction [17–20]. Ideally, these models are trained on tens of thousands or even millions of images to achieve the expert-level performance and sufficient generalizability [17, 22]. However, these gigantic-size datasets are rare and acquiring them can be highly expensive or even impossible for most researchers.
In the matter of a limited sample size, supervised transfer learning is currently the standard approach to DL. Specifically, scoring the PDT in a binary way may be translated into an image classification task, which is a classic DL problem. Images are assigned with labels that represent the scores (e.g., pass or fail) given by authorities. A pre-trained CNN model is then refit with these image-label pairs to leverage previously learned patterns for the new task. The major advantage of this approach is that the learning process starts with some basic knowledge in solving related tasks instead of from scratch, the latter of which is not ideal in DL with limited data availability. However, recent studies have shown that such an approach may not always be effective, especially on images that are distinct from the previous training data (e.g., transfer learning from natural images to medical images) [23–25]. In this regard, an alternative approach to scoring is based on the shapes and positions of those geometric components identified in the figure. This strategy converts the research question into an object detection and localization task, which can also be accomplished by a CNN-based model.
The goal of this study was to develop a DL framework for automatic scoring of the PDT. To determine the optimal strategy, two approaches based on CNN models were designed and compared, including: 1) the classic strategy that addresses an image classification task using supervised transfer learning; and 2) an alternative approach based on the object detection and localization method (Fig. 1). Each approach was trained and tested on an approximate total of 800 images using a fivefold cross-validation approach. Overall, the results of this study should not only suggest the feasibility of developing a DL-based framework with limited data, but also provide insight into each strategy in performing computer vision tasks in medicine.

An overview of the study design showing the flowchart of each DL framework.
MATERIALS AND METHODS
Data collection
This study was conducted in full accordance with Good Clinical Practice and Declaration of Helsinki. Ethics approval and informed consent were waived by the local Institutional Review Boards due to the retrospective study design and use of non-personally identifiable data. Data were anonymously collected from two independent organizations serving to provide mental health assessments in China. All PDTs were previously conducted on paper as a part of geriatric assessment in community-dwelling adults aged over 50 years. These drawings were captured by mobile phone cameras and saved as JPG files with resolutions of at least 1500 pixels in each dimension. The first dataset consisted of 399 PDT images obtained from residents of Foshan, Guangdong Province (64.9±8.7 years old). The second dataset contained 424 images from individuals in Wuhan, Hubei Province (64.4±8.7 years old).
Data preprocessing
Conversion to standardized images
All image files were transferred to a desktop computer for data preprocessing. To reduce image noise and resize images to standard dimensions, color photos were transformed into black-and-white, square-shape images that contained the entire PDT figures. Specifically, a color photo was first converted from red, green, and blue (RGB) channels into a single grayscale channel (Fig. 2a) and the histogram information was generated (Fig. 2b). Transformation into a black-and-white image was accomplished using a binarization method, in which the threshold was empirically determined according to the following formula,

Steps for generating standardized images. (a) An original color photo was first converted into a grayscale image. There was a region of interest (ROI) that contained the drawn sketch, along with various noise such as the unwanted artifact and environmental shadow. (b) A histogram plot showing the distribution of pixel intensities was obtained. A darker pixel was represented by a lower intensity value, and a brighter one by a higher value. The ROI and the noise were reflected in the smaller peak and the background in the larger peak. A threshold was automatically set by the formula (1) to distinguish these two major components. (c) A binary image was generated based on the threshold, that all pixels with intensities lower than the threshold were set to 0 and the rest set to 1. The ROI was cropped using a rectangular window with the goal to capture the entire figure. (d) The cropped image was expanded into a square shape, without any deformation of the drawn figure.
where η1, η2, 1 - η1 - η2 are the proportional weights, I
xy
(1≤x ≤L
x
, 1≤y ≤L
y
) is the pixel intensity across the image (Fig. 2c). Finally, the drawn figure was cropped using a rectangular window (l
x
×l
y
) and expanded into a square (d×d) shape image (Fig. 2d), given by
Image preprocessing was conducted in Python [26].
Assignment of labels
Each PDT was scored by 3 referees who were experienced in cognitive assessments. Each expert independently reviewed these images and assigned scores of 0 (fail) or 1 (pass) based on the criteria mentioned earlier. Any discrepancy was discussed post hoc until consensus was reached. These scores represented the ground truth labels applied for training and testing the DL models.
Split of datasets
Models were trained and tested using a fivefold cross-validation approach. Specifically, both datasets were merged and randomly split into an 80%training set (n = 658) and a 20%test set (n = 165). This process was repeated for 5 times, each of which resulted in a completely distinct test set (Table 1). Data were separated in a stratified fashion to ensure class distribution was consistent in both training and test sets. Both models were trained and tested on the same collections of images to allow a fair comparison of performance. During training, a random selection of 20%data from the training set (n = 132) were used to validate the models.
Summary of labels and train/test split applied in the stratified fivefold cross-validation process
Strategies and models
Strategy 1: Image classification using supervised transfer learning
The first strategy was to apply supervised transfer learning in an attempt to directly capture the association between images and their labels. The algorithm was based on a pre-trained open-source CNN model (Inception-V3, Google LLC, Mountain View, CA, USA). Inception-V3 is a classic image recognition model pretrained on over a million natural images and commonly used as a base model for transfer learning in image classification tasks [21, 27]. This model has been shown to attain 78 %top-1 and 94%top-5 accuracies on the ImageNet dataset, both numbers have outperformed the majority of existing models [28]. It also has a relatively small network size (92 megabytes) and a short average processing time (6.9 ms per inference step). By virtue of these advantages, this model has been increasingly applied in the research of medicine and demonstrated superb outcomes [17, 29–31]. The original model of Inception-V3 consists of both symmetric and asymmetric convolutional blocks, with a total of 314 layers and approximately 24 million parameters [32]. In this study, the base layers of this CNN model were all preserved, and only the top dense layer was replaced with a custom layer that generated an output of 2 classes.
Training was performed in Python [26] using TensorFlow (Google LLC, Mountain View, CA, USA). Data augmentation techniques, including rotation, flipping, shearing, and zooming, were randomly performed on the training images to enhance data variability. These images were then resized to 299×299 pixels using bilinear interpolation. The learning rate was set at 0.0001 and regularized by the Adam optimizer [33]. The model’s performance was assessed by accuracy and categorical cross-entropy loss at each epoch. To prevent overfitting, training session would be terminated if there was no progress in reducing validation loss for consecutive 15 epochs. Trainable parameters were first allowed only in the top fully connected layers. After these parameters were well trained, the top 2 blocks consisting of the last 65 layers (i.e., 250 to 314) were further fine-tuned using a lower learning rate of 0.00001 to yield the final model.
Strategy 2: Object detection and localization followed by set analysis
In the second strategy, scoring of the PDT was fulfilled by identifying the quantities and positions of pentagons and quadrilateral in an image. Specifically, a PDT was determined as accurate only if both of the following requirements were met: 1) there were exactly two pentagons and one quadrilateral; and 2) both pentagons were intersecting, with an overlapping area whose position was consistent with that of the quadrilateral.
An open-source object detection model (You Only Look Once, or YOLO) was applied to this task. YOLO is a light and fast object detection framework based on a CNN architecture. Compared to the traditional two-stage models (e.g., region-based CNNs), which identify objects by an initial region proposal stage and a subsequent classification stage, YOLO combines both steps into a single-shot process by dividing images into grid cells [34]. As images are only passed through the model once, it yields a major leap forward in speed by as many as 180 folds [35]. Moreover, recent studies have demonstrated that YOLO has comparable or even greater precision compared to the traditional models [36, 37]. Therefore, it attracts increasing attentions in both academic and commercial settings especially for real-time object detection and localization tasks [38–42]. In this study, a median size of the latest YOLO model (YOLOv5m) that contained a total of 21.4 million parameters with a mean processing time of 2.7 ms was employed with a custom 2-class output [43].
To train this model, annotations were first made in the training images to denote the region of each pentagon and quadrilateral with a bounding box. The spatial information and the label of each box were recorded in a separate document generated by LabelImg [44]. These images and annotations were used to train the YOLO model from scratch. Training was performed in PyTorch, a Python library for machine learning [45]. The training session was set for 100 epochs, and monitored by the losses, precision and recall at each epoch. The final model was determined by the minimum of validation loss throughout training.
The trained YOLO model generated results regarding the class and coordinates of each geometric shape detected in an image. This information was used by the DL framework to score the PDT according to the criteria mentioned earlier. Specifically, the probabilistic score of a PDT image was calculated as
where p1, p2 and q represent the confidence levels of the two pentagons and one quadrilateral, respectively. The conditional probability function η is described in Fig. 3. This score represented the likelihood of this image being determined as a “pass” by the framework.

The flowchart of the binary conditional probability generated by the function η used in Equation (3).
All DL models were trained and tested on a workstation with a graphics processing unit (GeForce RTX 2080Ti, NVIDIA, Santa Clara, CA, USA) and the associated computing toolkits (CUDA v11.3 and cuDNN v8.2). Coding was performed in Python 3.8.
Statistics
Descriptive statistics were applied where deemed appropriate. The overall predictability of a DL model was evaluated by the receiver operating characteristic (ROC) curve. The area under the ROC curve (AUROC) was measured using the predicted probability. The optimal cutoff threshold of the curve was determined at the point with minimal distance to the upper left corner (i.e., the maximum sum of sensitivity and specificity). For each model, this threshold was obtained during the training process on the validation set and applied to the test set. The numbers of PDTs correctly and incorrectly scored by a framework were displayed in a confusion matrix. The performance metrics for each DL framework included accuracy, recall, precision, and specificity. Results were averaged over the 5 cross-validation folds and expressed as mean±standard deviation. All statistical analyses were performed using Python [26] and Excel (Microsoft Corporation, Redmond, WA).
RESULTS
Image preprocessing
All binary images were properly generated based on the protocol, with the figure edges clearly visible and noise sufficiently removed. This was confirmed by manual inspection of all generated images. Conversion to standardized, black-and-white images reduced the average image size by 51 folds without losing the essential graphic features of the drawings.
Model performance in scoring the PDTs
With fivefold cross-validation, the first DL framework demonstrated a mean accuracy of 62.4%, recall of 61.9%, precision of 65.3%and specificity of 63.1%(Table 2). In stark contrast, these metrics were substantially improved for second DL framework, averaging 93.8%, 94.6%, 93.0%and 93.0%, respectively. The second framework also demonstrated lower standard deviations of these scores, indicating greater consistency in performance across test folds. The superior predictability of the second model was further reflected by the near-perfect AUROC of 0.954, compared to that of 0.725 for the first one (Fig. 4). These results suggested that the strategy based on object detection and localization considerably outperformed the classic approach in this task.
Summary of model performance. All metrics were averaged over five folds of cross-validation and reported as mean±standard deviation

The ROC curves of both models in scoring the PDTs. Each curve in a lighter color represents the outcome from 1 of the 5 cross-validation folds. The dotted diagonal line indicates a random classifier.
DISCUSSION
Although there are a few recent DL studies for various types of cognitive assessments [46–48], little effort has yet been made on the PDT. To the best of our knowledge, this article represents one of the two papers that apply DL in automatic scoring of the PDT and the first study that is based on image data only. A recent work by Park et al. has developed a mobile phone application based on U-Net to score the digital PDT [49]. Their model is designed for a more sophisticated scale that consists of 4 scoring items with a total of 11 points, including the number of angles, distance between the pentagons, contour integrity and presence of tremor. Mobile sensor data are necessary for the functionality of this model, including the spatial coordinates, timestamps, and touch events. However, these data may not only increase the computational workload and framework size, but also potentially limit the practicality of the model. Specifically, a score is not possible without any required sensor inputs, thereby precluding the model’s usage on a paper PDT, which is more common especially in developing countries. In this regard, the image-based protocol proposed in the current study allows a broader applicability. It may be used on an electronic PDT where only a digital image output is required, as well as on a traditional paper PDT simply by taking a photograph of the drawn figure (e.g., using a mobile phone camera). Scoring the PDT in a binary way is also much more common in clinical practice. In addition, processing image-only data can potentially improve efficiency by reducing the complexity of the entire framework.
This study also sought to compare two different strategies in developing the DL framework. The classic strategy, in which the research question is treated as an image classification task and solved by supervised transfer learning, is a standard DL technique for analyzing image data in medicine. Under this approach, images (e.g., x-ray, computed tomography) are assigned with ground truth labels that represent desired output values, such as consensus from experts, results of gold standard diagnostic tests, or observed outcomes. These image-label pairs are then used to train a CNN model that maps the image inputs to the label outputs. In particular, the CNN model, which has been pre-trained on a larger dataset for a similar task (e.g., classifying non-medical images), is retrained to “transfer” the previously learned knowledge to the new task. This strategy is favored by researchers because it is methodologically simple and requires only limited manual intervention during the training process (i.e., providing labels). It can also make use of well-trained open-source models on a small-size custom dataset, which is common with respect to the proprietary medical image data. However, there are intrinsic differences between natural and medical images. For example, medical images are typically presented with non-RGB channels and interpretations often rely on small and local graphical variations. In fact, recent studies have suggested that the effect of transfer learning for medical images may not always be ideal [23–25]. In the current study, the performance of this approach was substandard on monochrome images resembling typical medical imaging data. This finding also lends support to the limitation of the classic strategy and may likely be ascribed to the differences in image characteristics and the nature of the tasks between the two circumstances. Specifically, the pretrained CNN model classifies natural images (e.g., dogs, flowers, apples) by a number of unique graphical features of the objects, such as color, size, contour, shape, and texture. However, most of these features become indistinctive between the two classes in the current study where all images are visually similar. Therefore, classifying a “passed” or a “failed” image can be difficult based on limited availability of graphical information. Meanwhile, a logical approach to scoring the PDT consists of two steps: identifying correct geometric shapes and determining their spatial relations. The classic strategy in fact combines both steps into a single image classification task, which is complex and may require big data for satisfactory performance. Both disadvantages could lead to the poor outcomes of the classic strategy applied in this study, with only a small to moderate sample size. Altogether, this result may further suggest transfer learning based on non-medical images should be used with caution on medical images, especially when only limited data are available.
Efforts were made in this study to overcome the limitations associated with the classic strategy by partitioning the task into sub-steps as described earlier. Specifically, the PDT was scored by identification and localization of any pentagon and quadrilateral in the image, followed by a predetermined algorithm approximating to set analysis. This strategy effectively optimized the DL task and assigned it to an object detection model appropriate for this task. Object detection is a computer vision task that involves localizing and classifying the objects in the image. This approach is also based on a CNN model that maps the graphical features to the outputs of both class labels and spatial coordinates. In this study, the object detection model was in fact trained to recognize the pentagons and quadrilaterals only. Compared to classifying the entire figure as “pass” or “fail” as in the classic strategy, this task is less complicated and therefore more likely accomplishable with limited amount of distinctive graphical information and a limited sample size. By specifying the subsequent rules for scoring, the generalizability of this strategy is also more robust to any instances that present differently from those training images. As a result of these advantages, the strategy of object detection and set analysis substantially outperformed the classic strategy in scoring the PDT on the test set, indicating the superiority of this approach for the task. A mean accuracy of 94%and AUROC of 0.95 strongly imply a clinically applicable DL framework for automatic scoring of the PDT.

Examples of drawings on the test set and the scores given by each model. Each example shows a pair of images that have identical drawings but reflect the difference in the decision-making process between the two strategies. The score on the left is predicted by the first framework and on the right by the second one. Predicted scores that match the ground truth labels are marked in green and otherwise in red. The numbers shown on top of the boxes reveal the confidence levels of the geometric shapes detected by the second model.
Inspection of the images and their predicted scores provides insight into the logic of each strategy and allows a more intuitive comparison between them. Examples of the test set images are shown in Fig. 5. Both frameworks were able to accurately score the figures that represented relatively simple tasks with evident patterns for decision-making (Fig. 5a, b). Meanwhile, the object detection approach managed to score the majority of challenging figures that the classic strategy failed to (Fig. 5c–e). These cases typically present with only subtle defects from a successful copy of the whole figure (e.g., a pentagon-shape intersection), or with some less common but allowable variations from the original figure (e.g., two pentagons with a different relative position or with different sizes). Apparently, the object detection approach is more accurate in scoring these figures by properly recognizing the subtle differences and generalizing the decision-making strategy to these situations. Moreover, the information of boxes and labels shown on the image makes it possible for humans to interpretate and inspect each score given by the model. Finally, although the object detection approach did not score all images correctly according to the referees’ consensus, the misjudged cases typically demonstrate ambiguous details. In Fig. 5f, for instance, the polygon on the right was recognized as a pentagon by the model, leading to a “pass” score that was opposite to the referees’ decision. However, as these details are typically associated with low confidence scores given by the model (e.g., 0.64), a potential improvement would be to finetune the threshold of each class to filter out any pentagon or quadrilateral in an ambiguous shape before the final decision is made.
The second approach also represents an attempt to decrypt the black box associated with DL. Deep neural networks, including CNNs, are often criticized for lack of transparency [50]. This is especially true for a DL model trained with the classic strategy because the entire decision-making process is reflected in a single neural network. The multilayer nonlinear network structure may not only lead to incomprehensible logic of a model, but also generate nontraceable predictions that are subject to bias or errors. This non-transparency is a major obstacle against widespread application of DL in healthcare. In this regard, efforts have been recently taken to tackle the black box issue by splitting the decision-making process of a DL framework into a workflow with distinct steps [17, 20]. This strategy may simplify the task in each step and optimize model selection for each task. It may also allow inspection of the outcomes generated by each model, thereby providing a solution for debugging and improving model performance. In the current study, the second DL framework was designed in a similar fashion, separating the recognition of geometric shapes from determining their spatial relations. This strategy in fact resembles the reasoning process of humans in scoring the PDT. It may suggest a direction for developing understandable AI in medicine.
Image preprocessing was performed in this study to convert the color photos into standardized black-and-white images. Although it seems unnecessary because both CNN models (i.e., Inception and YOLO) are capable of handling images with RGB channels, this step is in fact beneficial to the DL framework. For one thing, the binarization method based on dynamic thresholding effectively reduces the noise and preserves the drawn figure in the image. In this way, the graphical information that is irrelevant to the task is largely removed and the critical features are presented in a standard fashion, both of which may potentially simplify the computer vision task and enhance model performance. For another thing, it significantly reduces the image size by 40–60 folds, thereby saving memory usage and improving training efficiency. Standardizing the images may also be practically favored, as the current framework trained on the paper PDT can be immediately generalizable to the digital PDT, whose image is typically presented in the same style. Future studies may be conducted to investigate the benefits of this method quantitatively.
Several limitations should be noted in this study. First, only a small to moderate sample size was obtained, and data sourced from two independent institutions were combined instead of being separated as an internal and an external set. Although a cross-validation approach was employed to alleviate this limitation and to ensure unbiased assessments of model performance, the generalizability of each model should ideally be evaluated on bigger data from a greater variety of sources. Second, despite an effective preprocessing protocol, some artifacts were still present in the standardized images, such as printed test instruction, written notes, and shadows. They should have negatively affected model performance to some extent. Third, the current study protocol was only aimed for a binary scoring scale. There are a few sophisticated scales of interpretating the PDT, with a possible total score ranging from 6 to 15 points [4, 49,51–53]. These scales are designed to focus on more details of the drawn figure, such as the symmetricity of both pentagons and the level of line straightness. Although they have been reported to be effective for several neurological conditions, these scales are less frequently used for a general purpose. Moreover, developing a DL model for any of these sophisticated scales based only on image data is a more challenging task, which may probably require bigger datasets and more complicated frameworks. Therefore, it was not pursued in the current study. Last, there were a few steps that required manual intervention, including assignment of ground truth labels, annotation of objects for the second framework and cropping the drawn figures from the original photos. However, these tasks were likely required during model development only. In particular, manual cropping of the drawings was necessary in this study because there were numerous artifacts randomly appearing in these paper PDTs that were not previously conducted or maintained for the purpose of this study (e.g., printed test instructions next to the drawings, physician notes overlapping with the figures). For model deployment in the future, these artifacts may simply be avoided or minimized by implementation of a few requirements (e.g., the figure should be drawn on a clean sheet or with sufficient surrounding blanks). In this regard, developing a mobile or desktop application for automatic scoring of the PDTs is feasible. Despite these limitations, efforts have been made in this study to ensure a rigorous protocol and the validity of outcomes, including standardizing images, splitting the dataset, and comparing two frameworks on the same data. The exceptional outcomes obtained by the object detection strategy also implies a clinically applicable DL framework and a potential direction for understandable AI. The future work should be targeted at establishing a fully automatic smart phone application based on the current framework, combining multiple models for predictions, applying this method to other scoring scales or drawing tests (e.g., the clock drawing test and the Rey–Osterrieth complex figure test), and bridging these assessments to clinical outcomes. The ultimate goal is to enable cost-effective, precise, and reliable AI for high-quality healthcare in the era of big data.
Summary
Developing a DL framework for automatic scoring of the PDT was explored in this study using two distinct CNN-based approaches. The classic strategy of applying supervised transfer learning to image classification was substantially outperformed by an alternative approach based on object detection and localization. The superior outcomes of the second approach suggest a clinically applicable DL framework and provide a direction for developing understandable AI in medicine.
Footnotes
ACKNOWLEDGMENTS
Y.L. conceptualized and designed the study, analyzed the data, performed computer programming and statistics, assessed the AI models, prepared the figures, and was a major contributor in writing, editing and revising the manuscript; J.G. retrieved the data, performed computer programming, prepared the figures, wrote, edited and revised the manuscript; P.Y. provided data resources, retrieved the data, and edited the manuscript. All authors have reviewed, discussed, and approved the manuscript.
We appreciate the staff from Guangdong Yunjian Intelligent Technology Co., Ltd. for their time in assisting data collection. We also acknowledge the neurologists from Guangzhou First People’s Hospital for their valuable opinions in interpretating the test results.
Jiajie Guo received supports from the National Natural Science Foundation of China (Grant number 51875221).
