Abstract
Handwritten signature for the identification and authentication of an individual has been widely used in the biometric systems. Due to the intra-class and inter-class variabilities, signature verification has become one of the most challenging problem in the biometric technology. Furthermore, the offline handwritten signature can be forged by the skilled persons due to its static nature. Therefore, in this paper a deep learning-based method using convolutional neural network (CNN) for online signature verification has been developed. Different values of the convolutional kernels such as 1×1, 3×3 and 5×5 are used to extract the discriminative features at multi-scales. The features of the initial and middle layers of the CNN are combined to create more powerful features. An up-sampling method with bilinear interpolation has been used to add the features of convolutional layers with different spatial dimensions. Both the addition and concatenation methods have been used to aggregate the convolutional features. A convolutional transpose method is applied to increase the depth of the convolutional layers while performing an addition operation on the layers with different depths. Finally, the concatenated features are passed to the fully connected layers for high-level feature extraction and classification. To evaluate the performance of the proposed method, an android application was developed where; a custom database of 985 online signatures collected from 197 users has been created. The problem of inadequate training data for online signature verification has been addressed through the data augmentation method. The experimental results show that the deep aggregated convolutional feature representation method achieves an accuracy of 99.32% on the custom developed online signature database.
Keywords
Introduction
Biometric verification is getting more common mean of authentication due to the availability of cheaper yet efficient biometric recognition technology. Human biometrics is generally divided into two wide categories such as physiological biometrics and behavioural biometrics and the signature is a type of behavioural biometric [1]. The signatures can further be divided into two types: offline or static signatures and online or dynamic signatures. The static or offline signatures are usually drawn on the paper for example, on bank cheques, legal agreements, and documents. For verification, first, the offline signature is scanned, or its photograph is taken and then given as input to the computer for verification. The offline signature contains the information related to the overall shape of the signature. However, the offline signature does not contain any information regarding exactly how the signature was drawn. Hence, the offline signature can be compromised by making the similar signature by an expert. An offline signature is just an image of the signature as shown in Fig. 1(a). On contrary, the online or the dynamic signature is made on the touch pad or touchscreen of a smart device such as mobile phone, tablet, laptop etc. and is directly given as input to the computer for verification. The online signature also contains the information about the overall shape of the signature. Additionally, the online signature contains the information related to the speed or velocity of the signature, number and sequence of strokes and the pressure applied to draw the signature. From these additional attributes several features related to the human behaviour may be extracted from a signature. Those features will help the classifier to distinguish different signatures more accurately from each other. In contrast to the offline signature, each online signature collected for this paper is stored in the form of a file (usually text file) that contains information about x-coordinates, y-coordinates, pressure, date and time as illustrated in Fig. 1(b). Generally, two kinds of the features are obtained from the signatures: local features and the global features [2]. In global features, the same features are extracted from all the signatures. Regardless the size of the signature, all the signatures are treated in same way by the feature extraction algorithm and the same number of features are extracted from all the signatures. While in the local features, different numbers of features are obtained from each signature. The number of features usually depends upon the size of the signature.

Some examples of offline and online signatures in the new handwritten signature dataset. (a) Offline signature images (b) Online signature files.
After feature extraction another important task of an online signature verification system is the template matching or classification. Several researchers have calculated distance between two signatures for identifying the similarity between them. They mostly use Euclidean Distance [3], Manhattan Distance [4] and Dynamic Time Warping (DTW) [5, 6] for matching the templates. Other matching and classification algorithms mostly used for online signatures include Support Vector Machine (SVM) [7] and Gaussian Mixture Model (GMM) [8]. Apart from conventional machine learning algorithms, the research community is also making greater contributions in the field of signature verification using deep learning.
Nowadays, the increased use of mobile devices has encouraged the practice of online signatures for signing the legal documents and agreements [4]. Furthermore, there are various applications of the online signature verification including the verification of persons in an online transaction system, to fill income tax applications or tender forms, to sign legal documents or agreements and to send and receive encrypted emails [9]. Keeping in view the increased use of online signatures, it has become essential to make the verification of online signature systems more reliable and accurate.
Due to the robust feature extraction capability of convolutional neural network and the effectiveness of concatenated or fused features used in several research domains [10–14], this paper proposes a deep learning-based aggregated convolutional feature representation network (DACFRN) for online signature verification. The network combines the features from the initial layers (low-level) and the middle layers (middle-level) of online signature and concatenates them to build a more powerful feature vector. The network uses both the addition and concatenation techniques to combine the features. To increase the depth of inadequate convolutional layers, a transposed convolutional layer is used, while for unequal spatial dimensions, an up-sampling method with bilinear interpolation is applied. The concatenated features are passed to the fully connected layers for extracting the class specific features and performing classifications. The DACFRN network is trained and evaluated on the custom developed database of online signatures collected from 197 users.
In today’s era the use of mobile devices has become very common. It is essential to enable robust, faster, and light weight online signature systems to be used by mobile devices. In this effort Napa et al. proposed an online signature recognition system for mobile devices [4]. They used MCYT-100 dataset and SUISIG dataset to train and test their proposed system. Additionally, they also collected their own dataset of online signatures using mobile phones. Their system represented the online signatures in terms of histograms. From the histogram data, the x-coordinate, y-coordinate, and pressure data was extracted. They used Manhattan distance to match the signatures and achieved the lowest equal error rate (EER) of 1.15%. Apart from mobile devices hand gesture-based signatures are also gaining the atentions of research community.
Compared to machine learning, the field of deep learning is getting more and more attention by the research community in general [15–18] and in the field of biometric recognition in particular [19, 20]. For this reason recently the focus of research in the field of online signature recognition is diverting towards the deep learning methods. In a recent research, Hefny and Moustafa [21] proposed a CNN based deep learning architecture for online signature recognition. They proposed the use of Legendre polynomials coefficients as the features to uniquely identify a signature. They performed experiments on the 2266 online signatures from SigComp2011 Dataset and obtained an EER of 0.49%. Similarly, Park et al. [22] developed a CNN based model to recognize the online signatures by calculating a forgery-sensitive loss. They used Long-Short Term Model (LSTM) to extract different strokes from a signature. They used SUISIG dataset to perform experiments and achieved an EER of 1.80%.
The mobile devices may easily use online signature verification and there is a strong need of light weight online signature recognition systems to be used by mobile devices. In this effort [23] proposed a lightweight CNN model to robustly identify the online signatures. They used three different datasets including SVC 2004 dataset, MCYT-100 DB1 and SUISIG-Visual corpus. They used depth-wise separable convolutional layers and achieved an EER of 13.42% using only one sample per subject for training. Similarly, to reduce the size of training data, Manabu [3] demonstrated to train a network with one sample only. In this research the Euclidean barycenter-based DTW barycenter averaging (EB-DBA) method was proposed for online signature classification using single template only. A method based on averaging the time series has been used to extract the features from the online signatures of MCYT-100 dataset. The Classification was performed using EB-DBA and an EER of 1.34% was achieved. Researchers have also been using different combinations and variations of CNN for the problem of online signature recognition. Abigail et al. [24] used CNNs followed by Recurrent Neural Network (RNN) to classify the online signatures. Due to presence of highly abstract filters, their proposed model has shown the ability to recognize the whole signature. They have used SigComp 2009 and SVC 2004 datasets for various experiments and achieved the testing accuracy of up to 97.05%. Like the network proposed by [24] some other researchers also proved the power of RNNs. Similarly, [6] used RNNs for online signature recognition. They used three different publicly available datasets named as Mobisig, MCYT-100 and e-BioSign for the experiments. They trained their proposed network with all three datasets and used both DTW and Recurrent Adaptive Networks (RANs) for the training and achieved the lowest EER of 0.23% and 1.81% for random forgery and skilled forgery, respectively. In the same way [25] proposed a deep network based on RNNs. Their proposed model maximized the difference between the genuine signatures and skilled forgeries and tried to increase the difference between genuine and forged signatures above the threshold value. Further, they classified the signatures using DTW distance. They evaluated the model on SVC-2004 online signature dataset and achieved an EER of 2.37%. However, their proposed system has some limitations like requirement of high computational resources and greater amount of data. Additionally, for proper training their system requires the forged signatures, and it cannot work properly by training with the genuine signatures only.
As the online signature contains time signals or time information, several researchers have used DTW for online signature recognition [3, 25]. Like those researchers Abhishek et al. [7] have also used the features of warping path for the problem of online signature verification. They proposed the fusion of normalized DTW scores with the scores based on warping path. They tested their proposed method on two datasets named as MCYT-100 and SVC-2004. They used Support Vector Machine (SVM) for classification and achieved a lowest EER of 2.76%. Similarly, Xiaomeng Wu et al. [5] performed optimization on Siamese networks to learn about the time series effectively. To enable the deep learning model to learn the time series they built the DTW blocks at the top of the Siamese network. They used MCYT-100 dataset for various experiments and reported the EER of 2.4% by using 50% data for training and 50% for testing. While the lowest EER reported was 1% by using 90% of data for training and 10% for testing.
Online signature features
This research extracts the global features from online signatures. The features have been extracted from the attributes that are stored in an online signature file. Some of the global features extracted in this research are discussed as follows:
Signature height: The signature’s height is obtained by obtaining the difference between the smallest and largest values of the y-coordinate. Signature’s height is calculated with the help of following equation:
Signature width: The signature’s width is obtained by obtaining the difference between the smallest and largest values of the x-coordinate. Signature’s width is calculated with the help of following equation:
Width-to-height ratio of signature: The signature’s width-to-height ration is calculated to minimize the difference in size of the signature due to use of different input devices.
Minimum and Maximum values of x-coordinate: The minimum and the maximum values of x-coordinate are extracted as the separate features from the signature.
Minimum and Maximum values of y-coordinate: The minimum and the maximum values of y-coordinate are also extracted as the separate features from the signature.
Signature duration: Total time taken to make the signature has been extracted by subtracting the starting time of the signature from the ending time of the signature.
Total pen-ups: The total number of times for which the pen is lifted from the surface of the device while making signature has been recorded as separate feature.
Total duration for pen-ups: Total time for which pen was not touching the surface of the device during the signature has been extracted as the feature named as total duration for pen-ups.
Maximum pressure: From all the values for pressure that are between 0 and 1, the maximum value of the pressure has been extracted.
Average pressure: Like the maximum pressure the average value of the pressure is also calculated from all the values of pressure. Average pressure is also a fractional value between 0 and 1.
In addition to the above features, the minimum and maximum values of width and height of signature, the difference between initial and ending points of x and y directions, the velocity in x and y direction and the total pen-downs were also calculated.
The general framework of the DACFRN for online signature verification is illustrated in Fig. 2. The framework is based on the three components (1) convolutional and pooling layers (2) deep feature aggregation and (3) concatenation and classification layers.

Framework of the deep aggregated convolutional feature representation network developed for online signature verification. “C1 - C6” represent the convolutional layers, “32–256” represent the number of filters in each convolutional layer, “P1” and “P2” represent the max pooling layers, “T1” and “T2” represent the convolutional transposed layers, “UP” represent the up-sampling process, “F1” and “F2” represent the fully connected layers and “D1” and “D2” represent the dropout layers.
The dimensions of the input signature were fixed to 1×21. However, CNN model takes 3-dimensional input. Hence, the dimensions of the input signature were reshaped to 1×21×1, where 1, 21 and 1 represent the width, height, and depth of the signature, respectively. The convolutional and pooling part of the network uses six convolutional layers and two max pooling layers. The convolutional layers are configured with the hyperparameters as: {number of filters: kernel size}{32: 1×1}, {32: 1×1}, {64: 3×3}, {64: 3×3}, {128: 5×5} and {256: 5×5}. To get convolutional features at multi-scales, the first two convolutional layers use 1×1 filter size, while the middle two layers use 3×3 and the last two layers use 5×5 kernel size. This will help to recognize the online signatures drawn on the surface of devise with multi-resolutions. To down-sample the height of the signature, the two max pooling layers use 1×2 pooling window. This will reduce the height of the signature while keep the width as consistent. Each convolutional layer is followed by a ReLU activation layer. The value of the padding parameter is set to ‘same’ with a stride value of 1. This will keep the spatial dimensions of convolutional layer’s output same as its input.
The deep feature aggregation uses up-sampling and transposed convolutional layers to increase the spatial dimensions of the signature and the depth of the convolutional layers. The spatial dimensions of the signature after max pooling “P2” are reduced to 1×5×128, while before “P2” pooling layer, the dimensions of the signature are 1×10×64. To add the features of “C5” and “C6” first, the depth of the “C5” is increased to 256 using transposed convolutional layer and then performed an addition operation. To add the features of “C4” with the features of addition vector, an up-sampling has been performed to increase the dimensions of the addition feature vector. The depth of the “C4” layer is 64, hence a transposed convolutional layer is applied to increase the depth of the “C4” layer to 256. Finally, the features of “C4” and up-sampling are added together. These aggregated features and the low-level features after “P1” pooling layer are concatenated and passed to the fully connected layers.
The deep feature aggregation uses element-wise addition operation to add two feature vectors. The addition operation does not add additional parameters in the network while training. Hence, with the addition operation two feature vectors are added together if they have the same spatial dimensions i.e., W1 = W2, H1 = H2 and D1 = D2, where W, H and D, are the width, height and depth of the feature map. The addition operation is performed as:
Where a x and b x are the feature vectors of two layers at different locations in the network and “+” is an addition operation.
In the concatenation operation, the feature vectors of two or more layers are stacked together to make a single feature vector. The feature vectors are stacked either horizontally or vertically. To perform concatenation operation, the spatial dimensions of all feature vectors should have the same size i.e., W1 = W2 and H1 = H2. However, the depth of the feature vectors can be different. The concatenation operation is performed as:
Where a x and b x are the feature vectors of two layers at different locations in the network and “|” is a concatenation operation.
The classification part of the network uses two fully connected layers with 512 and 256 neurons, respectively. Each fully connected layer is followed by a ReLU activation function and a classification layer followed by a Softmax function to perform the classification. Each fully connected layer except the classification layer is followed by a dropout layer with a value of 0.2.
The DACFRN is implemented with Keras and Tensorflow deep learning libraries in Python. The implementation has been performed on the Intel Core i5-8300 H with 16 GB of Random-Access Memory and 4 GB NVIDIA GEFORCE GPU.
Network training and testing
The DACFRN is trained using Stochastic Gradient Descent (SGD) with a momentum value of 0.9, a learning rate of 0.005 and a weight decay of 0.00025. The network was trained for 200 epochs.
The performance of the DACFRN has been tested on the custom developed dataset of online handwritten signatures. The description of the dataset is given in Section 5.3 and the results obtained are presented in Section 5.4.
Online Signature Dataset
To train and test the DACFRN, a custom dataset of online signatures was developed where 985 signatures were collected from 197 users. To collect the data, a user-friendly android application was developed. Each user was allowed to make five online signatures on the touch sensitive surface of Samsung Galaxy Note 3 with a stylus pen. An example of the online signature drawn on the Samsung Galaxy Note 3 is illustrated in Fig. 3. The dataset was divided into training and testing sets with a ratio of 80:20. As the deep learning algorithms require more training data, the number of training samples was increased using augmentation techniques. Data augmentation techniques have been widely used in classification problems [26]. The x, y coordinates and the time value of each signature in the training set was slightly changed to create 4 more copies of each signature. Hence, the number of training samples for each user were made to 15.

An example of online signature drawn on the Samsung Galaxy Note 3.
The recognition accuracy obtained with the DACFRN model is shown in Table 1. The recognition accuracy has been measured using standard evaluation protocols such as precision, recall, f-score, overall accuracy, and EER. As indicated in Table 1, the values of precision, recall and f-score obtained with the DACFRN method are 99% respectively. The overall accuracy and the EER of the DACFRN model have been obtained by repeating the experiments for 10 times and the results achieved are 99.32% and 0.99% respectively.
Precision, Recall, F-Score and Overall Accuracy of the DACFRN on the custom developed online signature database
Precision, Recall, F-Score and Overall Accuracy of the DACFRN on the custom developed online signature database
There are very few research works reported for online signature verification due to the insufficient dataset available for the online signature. Hence, the performance of the DACFRN is compared with the state-of-the-art network models developed for the online signature verification using traditional or deep learning techniques. Table 2 shows the EER of the DACFRN model and compares with other models. As shown in Table 2, the DACFRN comparatively performs well than most of the methods developed for online signature verification.
Comparison of DACFRN method with state-of-the-art online signature methods
Comparison of DACFRN method with state-of-the-art online signature methods
In this paper, a deep aggregated convolutional feature representation network for online signature verification has been presented. The network aggregated the low and mid-level features of convolutional layers using add and concatenate methods. To add two feature vectors of convolutional layers, an up-sampling with bilinear interpolation was used to increase the spatial dimensions of small convolutional layers, while a transposed convolutional layer was used to increase the depth of the convolutional layers. Finally, the aggregated features and the low-level features were concatenated to create more powerful features. The DACFRN was evaluated on a custom developed dataset of online signatures collected from 197 users and achieved overall accuracy of 99.32% and an EER of 0.99%. In this paper, only five samples of signatures were collected from each user, in future, more samples will be collected. Both online and offline signatures of the same user will be collected to make the signature recognition system more secure and authenticate. Furthermore, the method will be evaluated on the publicly available datasets of online signatures.
Footnotes
Acknowledgments
The authors are thankful to the University of Sindh, Jamshoro for providing research opportunity and the participants who participated in the collection of online signature database for this research. The authors would like to thank NVIDIA Corporation for providing Quadro RTX6000 GPU for the research.
