Abstract
Handwriting problems, also known as dysgraphia, are defined as a disorder or difficulty in producing written language associated with writing mechanics. The occurrence of handwriting problems among elementary students varies from 10 to 34%. With negative impacts on educational performance, handwriting problems cause low self-confidence and disappointment in the students. In this research, a pen-tablet was employed to sample children’s handwriting, which revealed online features of handwriting such as kinematic and temporal features as well as wrist and hand angles and pen pressure on the surface. This digitizer could also extract the online handwriting features when the pen was not in contact with the surface. Such features are called in-air features. The purpose of this study was to propose a method for diagnosing dysgraphia along with an evaluation of the impact of in-air features on the diagnosis of this disorder. A rich dataset (OHF-1) of online handwriting features of dysgraphic and non-dysgraphic students was prepared. After the extraction of a huge set of features and choosing a feature selection method, three machine learning methods, i.e. SVM, Random Forest and AdaBoost were compared and with the SVM method, an accuracy of 85.7% in diagnosing dysgraphia was achieved, when both in-air and on-surface features were included. However, while using purely in-air data or merely on-surface features, accuracies of 80.9% and 71.4% were achieved, respectively. Our findings showed that in-air features had a significant amount of information related to the diagnosis of dysgraphia. Consequently, they might serve as a significant part of the dysgraphia diagnosis.
Keywords
Introduction
Handwriting problems in children have been documented with various developmental disabilities. Moreover, abnormal patterns of handwriting have been reported in a wide variety of mental and neurological disorders e.g., depression, OCD, Parkinson’s disease, schizophrenia, as well as developmental coordination disorder. Knowing the factors associated with poor handwriting performance is beneficial to guiding correction and reconstruction strategies. Scientists trust that the components related to handwriting performance are motor planning, eye-hand coordination, visual-motor coordination, and hand functioning [1].
Children with impaired handwriting, write illegible contents, which are often graded inappropriately by the teachers. Handwriting problems commonly cause the students to suffer from poor reading skills and wasting too much time doing their homework. Handwriting problems reveal when reading becomes difficult for children or when they write too slowly. Consequently, the quality and quantity of handwriting can be interpreted in terms of readability and speed, respectively [1].
The occurrence of handwriting problems among elementary students varies from 10 to 34%. Such handwriting problems may have intense consequences for an individual regarding self-image and success in school [2]. The importance of handwriting in school performance is well documented. Poor writing skills have a direct influence on educational performance giving rise to low self-confidence and low self-regard, leading to plenty of frustration and disappointment in the child, the parents, and the teacher. Handwriting problems are closely related to low academic achievement and low self-confidence. Compared to their peers, poor handwriting skills in these children can increase the time required for writing tasks such as doing homework. Sometimes, students cannot read their messy handwriting, which leads to inadequacy in note-taking, chaos in their homework organization, and rejection by their peers in teamwork, which is quite common in classroom learning. In the classroom, the children’s need for concentration and effort while writing increases, which may cause a reduction in the attention to spelling, grammar, and lesson content that, in turn, may further jeopardize their academic achievement [3].
Researchers have realized that handwriting problems may have severe consequences for the student’s overall success, well-being, attitudes, and emotional behavior [4, 5]. These findings illustrate the significance of identifying handwriting problems as soon as possible, as preventive and corrective measures [6].
Since the traditional diagnosis of dysgraphia is time-consuming, many children with the disorder are not currently diagnosed, and therefore, no treatment is provided. With the high speed of an automated test, more students may be diagnosed and treated. Students with lower degrees of disorder, whose condition is not empirically diagnosable, can be identified using the automated test. Moreover, with the automatic diagnosis system, one can have better control over the treatment period and compare the treatment results with those from the previous times. In this study, it was attempted to diagnose the dysgraphia disorder automatically, which owing to the high speed of the automated test and lack of necessity for a specialist, helps to screen a broader range of students.
Traditional handwriting studies were limited to observable result-oriented tests. Nonetheless, computers and digitizer-based technology have led to the development of an innovative method for handwriting evaluation. This means that a digital tablet can provide insight into the process of writing during the actual function of writing [1].
As well as being used in the diagnosis of the dysgraphic disorder, online handwriting features, have been used to identify other diseases and disorders such as Parkinson’s and attention deficit hyperactivity disorder. Drotar et al. [7] quantitatively evaluated Parkinson’s disease using online in-air features. Rosenblum et al. [8] utilized online spatial, temporal, and pressure features to diagnose Parkinson’s disease. Each of the studies [9–17] has used several features extracted from online pen data to examine the correlation between Parkinson’s disease and handwriting and its diagnosis.
In [2], using temporal, characteristics of dysgraphia were identified by employing handwriting classification methods and utilizing spatial and pressure features (total time for doing the writing task, total time for doing the task in the fifth decile, total time for doing the task in the first decile, time fraction elapsed on paper, average letter spacing, average pressure, and pressure standard deviation) and the method of linear support vector machines. J. Mekyska et al. [18] used a set of features such as velocity, acceleration, vibration, direction, duration, time of the pen movement in the air, and the angle of the pen, with the random forest method of linear differential analysis to diagnose dysgraphia. In comparing children with and without dysgraphia, using the analysis of dynamic features of handwriting, Hen-Herbst et al. [19] concluded that children with dysgraphia needed more time to complete their tasks and draw larger lengths and widths than the control group. Besides, in a study that dealt with the behavioral effects such as those of handwriting on performance, individuals were divided into negative, positive, and neutral groups by inducing mental load; then, they were compared based on analyzing their handwriting, and it was demonstrated that negative groups produced narrower lengths and widths of pen strokes than the neutral group. In contrast, the positive group needed shorter durations than the neutral group [20]. Morello et al. [21], analyzed the correlations between motor parameters in normal and dysgraphic children using motor features reported that they spent more time on doing the assigned tasks. The results showed that online handwriting features can be effective in diagnosing dysgraphia [22–26].
The purpose of this study was to diagnose dysgraphia and underline the importance of the in-air features of the pen in the diagnosis of this disorder. To do this, 101 students were asked to copy a couple of verses on a pen-tablet. After collecting the raw pen data for each subject, a vast extraction of kinematic, temporal, and spatial features was conducted. The vector data was converted into scalar values with various statistical functions. After the feature selection step by the SVM method, the samples were categorized into three groups depending on the utilization of in-air data, on-surface data, or both types of data. Ultimately, the effective features in each group were displayed, and a number of them were examined.
A new dataset (OHF-1) of online handwriting features of dysgraphic and non-dysgraphic students, with the participation of 101 students and 15 teachers was introduced as well. This dataset included pen-tablet data, handwritten images, and personal information in the form of 505 files for 101 students, which are described in detail in section 2.2.
The contributions of the present work are as follows:
Demonstrating the importance of in-air features in the diagnosis of the dysgraphic disorder Establishing a system for diagnosing dysgraphia that can assist independently or as a psychiatrist’s assistant Extracting important correlated features associated with dysgraphia Collecting a rich database from dysgraphic and non-dysgraphic students that will be available for the future studies
The rest of the paper is organized in the following manner. Section 2 describes how the data was collected, introduces raw features, and provides an explanation of how they were extracted and categorized. The results from this work are presented in Section 3.OHF1. In section 4, was attempted to use deep learning to solve the problem. Section 5 discusses the results, comparing them with those from other studies.
Experimental Study
Subjects
A total of 101 students, consisting of 49 dysgraphic students and 52 regular students from the second, third, and fourth grades, participated in the study. They were all right-handed, with 41 students being female and the rest being male. The subjects were from four schools and 15 classes, and both groups matched in age, gender, school, and class. All participants were born in Iran and spoke Persian as their mother tongue.
The HPSQ [27] questionnaire was used to label the participants. It included eleven items completed by a teacher who had been adequately acquainted with the students in the classroom. The teacher gave the student a score between 0 and 4 for each item. A higher score for each question indicated the student’s weakness in writing skills. If the student was scored a total of 14 or higher, they would be considered a dysgraphic child.
OHF-1 Dataset
The students were asked to copy a couple of verses written on the board on paper placed on a digitizer. For the online collection of the children’s handwriting data, it is necessary to utilize a digitizer with high sensitivity to the movements of the pen, which in addition to the direction of the pen movements, records the hand angle, the amount of the wrist rotation, as well as the pressure applied to the paper. For this purpose, a Wacom digitizer Model Intuos Pro Paper Large we used, which was able to record the following data from the pen almost every five milliseconds (see Fig. 1):
Stroke number (X, Y, and Z): pen movement coordinates. Pressure (8,192 pen pressure levels): the amount of pressure that the pen puts on the tablet. Altitude (30-90 degrees): pen angle to the tablet surface. Azimuth (0-359 degrees): pen rotation rate. Time: sampling time. BTN: a boolean value, 0 for in-air and 1 for on-surface movement.

X, Y, Z, and Altitude/Azimuth information.

Handwriting samples from a typical student and a dysgraphic student demonstrating in-air and on-surface trajectory of the pen.
In addition to recording pen information when the pen is in contact with the digitizer screen (on-surface), this digitizer can record the pen data even when the pen is in the air and has not come in contact with the tablet surface. We call it in-air. In-air pen data recording is conducted as far as the pen is not more than 10 mm away from the tablet screen. To get the above information from the pen, we developed software that, after the student finished writing the verses, stored the pen data in a record every five milliseconds. Depending on how long the writing process took, the records were saved.
Our dataset consisted of “pen-tablet raw data”, “On-Surface, In-Air, and Both images”, and “personal information”. We called this dataset OHF-1 (the dataset is available at https://github.com/aminic/OHF-1). Some sample images from this dataset are shown in Fig. 2 and Appendix.
Figure 2 (a) illustrates a handwriting sample from a typical student with an HPSQ score below 14, while Fig. 2 (b) shows a handwriting example from a dysgraphic student with an HPSQ score above 14. More samples from our database are provided in Appendix.
Kinematic handwriting features
Initially, the following kinematic features were computed for each record of raw pen data. Table 1 provides a kinematic feature description. From the above kinematic features, several other features, e.g. the number of changes in velocity (NCV), the number of changes in acceleration (NCA) as well as temporal features such as the total amount of time that the pen spent in the air on-surface/in-air were extracted. The features extracted at this stage are provided in Table 2. For subsequent processing, the vector data such as velocity to scalar values was converted using statistical functions. The statistical functions we used are as follows:
Arithmetic mean, harmonic mean, geometric mean, trimmed means (5, 10, 20, 30, 40, 50), and mode. Maximum, Minimum, Max Position, Min Position, Max Relative Position, and Min Relative Position. Skewness, kurtosis, and moments (1, 2, 3, 4, 5, 6). Median, percentile (1, 5, 95, 99), quartiles (1, 3), and deciles (1, 2, 3, 4, 5, 6, 7, 8, 9). Range, interquartile range. Standard deviation, variance, and Shannon entropy.
Temporal features and post-processing on kinematic features
Temporal features and post-processing on kinematic features
Finally, to evaluate the impact of the in-air features on dysgraphia diagnosis, all features related to in-air and on-surface data were categorized into two separate groups. The following steps were taken for the three different groups below:
Using merely in-air pen features Using merely on-surface pen features Using a combination in-air and on-surface features
Dimension reduction methods frequently fall into two categories, i.e. feature selection and feature extraction, each of which has its characteristics. On the one hand, feature extraction methods achieve dimension reduction by combining the main features. Thus, they can build a set of new features, which are usually more compact and more distinctive. On the other hand, feature selection reduces the dimensions by eliminating irrelevant and duplicate features. Feature selection is very useful for applications in which the main features are important for the model interpretation and knowledge extraction, since during this process, the main characteristics of the dataset are preserved. In view of the fact that in this study, we sought to identify the most important features associated with the dysgraphic disorder so that they can be used in the feature studies, the feature selection method was employed to reduce the dimensions. In this context, feature selection methods have become an integral part of the learning process to deal with data of higher dimensions. The proper feature selection can improve an inductive learner from various aspects, including learning speed, generalization capacity, and simplicity of the extracted model.
To reduce the dimensions, the data were first analyzed using the Mann-Whitney U test, and the significance level was set at P < 0.05. Features that gained values equal to or greater than 0.05 in the Human Whitney test were eliminated and identified irrelevant data. Afterward, the features with redundancy were addressed. Reducing data redundancy increased speed, reduces complexity, and speeds up the training phase.
Subsequently, the most important features were acquired based on the Pearson and Spearman correlations. Table 3 lists the most important features when the data exclusively includes in-air pen features. Table 4 shows the most important features when the data has only on-surface pen features. The most important features for the case in which the data consists of a combination of all in-air and on-surface pen features are listed in Table 5. Ultimately, to attain the best subset of features, the recursive feature elimination (RFE) [28] method was used, which recursively eliminates the features and builds the model with the remaining ones.
The ten features that are highly correlated with the target feature (in-Air)
The ten features that are highly correlated with the target feature (in-Air)
The ten features that are highly correlated with the target feature (On-Surface)
The ten features that are highly correlated with the target feature (All)
Our purpose was to build a model for diagnosing dysgraphic disorder among students. To this end, three machine learning methods, i.e. SVM, Random forest and AdaBoost were used and compared and the Python programming language was applied along with the scikit-learn library [29] for the implementation.
The fundamental notion of an SVM classifier is to calculate a maximum margin hyperplane, which separates two different classes of data. To train the nonlinear separated functions, the dataset is implicitly mapped utilizing a kernel function, for which the separating hyperplane is situated in the space of higher dimensions. The newly found samples are then grouped based on their hyperplane side. The radial basis functions (RBF) [30] kernel were utilized, which may be defined as
In the above equation, gamma is a parameter by which the RBF function width is controlled. Using a grid search, the penalty parameter and the gamma kernel parameters were optimized. A search was performed over a specific grid defined by the Cartesian product of the two sets C = [2-8, 2-7, . . . , 27, 28], and Gamma = [2-12, 2-11, . . . , 211, 212].
The random forest algorithm [31] is a classification method consisting of a large number of separate decision trees that act as an ensemble. Instead of relying on a single decision tree, a random forest does the prediction based on each and every tree according to the majority of votes and considers the final result as the output. The more the number of trees in the forest, the higher the accuracy and therefore, the problem overfitting is avoided.
The AdaBoost algorithm [32] is an iterative procedure that attempts to approximate by combining a large number of weak classifiers. Starting with an unweighted training sample, it builds a classifier, e.g. a classification tree to generate the class labels. If a training dataset is misclassified, the weight of that training data is increased (it is boosted). A second classifier is then constructed using the new weights. Again, the misclassified training dataset increases its weight and the procedure is repeated.
Cross-validation is a beneficial technique for assessing the performance of machine learning models. This method helps to realize how the machine learning model we have developed may be generalized to an independent dataset. Using cross-validation, we can test a machine learning model in the training phase to assess its performance and get an idea of how to generalize our machine learning model to an independent dataset. The k-fold cross-validation is one of the most common types of cross-validation, which is widely used in machine learning. The primary dataset is divided into k equal subsets. Each subset is called a fold. Here, for the validation of the classifier, the leave-one-out cross-validation method was utilized, which is a type of k-fold cross-validation, in which k is equal to N, i.e., the number of samples in the dataset.
The performance of the trained classifier based on the calculation of accuracy, sensitivity, precision, and F1-score over the experimental samples is defined as follows [33]:
The SVM classification criteria for diagnosing dysgraphia using in-air pen features, on-surface pen features, and a combination of both are provided in Table 6. The Confusion Matrix of classifiers for All Features is reported in Table 7 and The ROC curves are shown in Fig. 3 and the loss function curves of SVM classifier illustrated in 4.
Performance of classifiers
Confusion Matrix of classifires for All Features

ROC curves.

LOSS curves.
As for the validation of the proposed method scalability, the model was tested with 10 samples in the first step. Then the test was continued with 20, 30,... up to 101 samples to obtain the accuracy and time spent in each step. The accuracy of SVM classifier with different sample sizes illustrated in Fig. 5. Additionally, Fig. 6 shows the time spent in each step. Also, in order to validate the robustness of the proposed method, noise data was added to the dataset and the proposed method was able to remain stable despite the presence of noise.

The accuracies with different sample sizes.

The required times to train with different sample size.
It was attempted to use deep learning to solve the problem in a way that will be discussed subsequently. However, due to the low number of training samples, no acceptable results and accuracy were obtained. In addition, the deep learning method works as a black box and eventually, it does not help much in recognizing the handwriting features that affect dysgraphia diagnosis. In order to achieve better results, it is necessary to collect much more data, which was not possible according to the existing conditions, and given the amount of the available data, the SVM method performed better.
In the applied method, which used deep learning, first, each of the features of the pen collected every 5 milliseconds were considered as a time series, and LSTM and CNN methods were used for the automatic classification of dysgraphia. In the CNN method, the time series was first transformed into an image of fixed dimensions through normalization in the time dimension. Each of the pen signals were converted into an image separately. The sum of the signals was obtained in one single image as well.
Discussion
A total of 101 students (49 students with dysgraphia and 52 more without dysgraphia) were asked to use an ink light pen to copy a couple of verses written on the board on A4 paper placed on a digitizer. After collecting the raw pen data for each subject, a vast extraction of kinematic, temporal, and spatial features was conducted. The vector values were converted into scalar values with various statistical functions. After the feature selection step using the SVM method, the samples were categorized into three groups depending on the utilization of in-air data, on-surface data, or both types of data. In the case of using in-air data, an accuracy of 80.9% was achieved; in the case of using on-surface data, accuracy of 71.4% was achieved; using both datasets simultaneously, an accuracy of 85.7% was achieved.
Our results revealed that psychiatrists could use handwriting evaluation as a complementary method for diagnosing dysgraphia. We believe that by using new handwriting features, taking more subjects and diverse tasks from students, and further adjustment of machine learning techniques, it is possible to improve the prediction accuracy. Here, it was demonstrated that online handwriting features could reveal the effect of dysgraphia on handwriting and accordingly could be helpful for diagnosis. In addition, a standard questionnaire was used to differentiate between the dysgraphic and non-dysgraphic groups based on the scores that teachers would give the students; factors such as teacher experience and taste could also be effective in this test.
The advantage of using online handwriting features to diagnose disorders and diseases is that the access to handwriting signals in a clinic or at home is relatively easy [10]. The dataset was collected with a pen-tablet connected to a laptop at school. The commercial tablet in the market was used without any hardware changes. The software environment is simple and user-friendly which makes it possible to use it at home, and this is an important advantage over other approaches.
Our findings showed that in-air features had considerable information regarding the diagnosis of dysgraphia. Consequently, they may serve as a significant part of dysgraphia diagnosis. The kinematic features, azimuth, altitude, and the number of changes in acceleration were identified as the most effective features related to dysgraphia, as well. Contrary to our beliefs, although a digitizer with high sensitivity to pressure was used, this feature did not play a significant role in our work.
Two important in-air features were curve length and VNCV. Figure 7 shows that children with dysgraphia hold the pen for a longer time in the air when writing. The results are consistent with previous results [2]; it also appears that children with dysgraphia need more time to decide to write words. Because of higher values of the VNCV (in-air) feature for children with dysgraphia (Fig. 8), it seems that the control and stability among children suffering from dysgraphia are less than those among non-dysgraphic children.

Curve Lenght (in-air) feature.

VNCV (in-air) feature.
Figure 9 illustrates the importance of the azimuth feature when writing on the surface, which shows that the hand and wrist rotation in dysgraphic children is different from that in non-dysgraphic children, which may be due to elementary mistraining of how to hold a pen. The second important on-surface feature was the duration feature (Fig. 10), which again shows the importance of time and temporal features concerning dysgraphia.

Azimuth (on-surface) feature.

Azimuth (on-surface) feature.
Footnotes
Appendix
In this appendix, a few samples are provided which were located in the boundary region, and the system was unable to diagnose them accurately (Figs.11, 12, 13, and 14).
