Abstract
Nowadays, people become more connected to the internet using their mobile devices. They tend to use their critical and sensitive data among many applications. These applications provide security via user authentication. Authentication by passwords is a reliable and efficient access control procedure, but it is not sufficient. Additional procedures are needed to enhance the security of these applications. Keystroke dynamics (KSD) is one of the common behavioral based systems. KSD rhythm uses combinations of timing and non-timing features that are extracted and processed from several devices. This work presents a novel authentication approach based on two factors: password and KSD. Also, it presents extensive comparative analysis conducted between authentication systems based on KSDs. It proposes a prototype for a keyboard in order to collect timing and non-timing information from KSDs. Hence, the proposed approach uses timing and several non-timing features. These features have a demonstrated significant role for improving the performance measures of KSD behavioral authentication systems. Several experiments have been done and show acceptable level in performance measures as a second authentication factor. The approach has been tested using multiple classifiers. When Random Forest classifier has been used, the approach reached 0% error rate with 100% accuracy for classification.
Keywords
Introduction
Nowadays, most of sensitive or personal data that are transferred through our handheld devices such as mobiles and tablets become a main target for hacking and/or misused. Most of these activities which involve sensitive data must be highly secured [1]. Thus, solutions are needed and motivated researchers to contribute in these solutions. User authentication using passwords is a common authentication mechanism [16]. But, it is not a sufficient to use password alone since it is subjected to be hacked. Combining passwords with behavioral based strengthen the security level for authentication systems. This is because the hacker has to capture the user typing behavior along with the password itself to gain access to the systems [4,33]. Keystroke dynamic (KSD) is one of the well-known behavioral authentication mechanisms which relies on capturing and utilizing the human characteristics and measurements for typing rhythm. The KSD is a biometric assumption that different people typing in a unique and distinguishable ways. So, embedding the typing behavior in authentication systems will strengthen the systems security level [4,58,69].
It has been demonstrated that, the combination of the password and KSD mechanism increases authentication power. Furthermore, KSD rhythm consisted of combinations between timing features (such as Down time, Up time of each key pressed/released), and non-timing features (such as Pressure, Size, and Position XY of each key pressed/released). These features can be extracted from any handheld device to be processed and used. The combinations of these features have a remarkable enhancement on the performance of authentication systems. Based on this combination, user authentication system on handheld devices has been proposed. In addition, a virtual keyboard has been developed and installed on Android device to collect timing and non-timing features. The proposed system has been implemented and tested on CMU benchmark dataset of KSD and our collected dataset, named UJ-KSD dataset (The University of Jordan). In addition, an extensive comparative analysis has been conducted between authentication systems based on KSDs. The Carnegie Mellon School of computer science (CMU) benchmark data set of KSD designed to target only personal computers with text consisted of the timing features only. It would be useful to have a benchmark for touch screen devices. The lack of existing benchmark datasets for handheld devices was a reason that had encouraged us to plan for building a benchmark dataset that being compared with CMU dataset.
Thus, the remarkable effect of combining non-timing features with the timing features was distinguishable. Several experiments have been conducted, in which the obtained results revealed that password and KSD behavioral system provides acceptable level in authentication performance measures. The system was tested using multiple Neural Networks classifiers algorithms such as Random Forest (RF), Multi-layer Perceptron (MLP), Naïve base, and Sequential Minimal Optimization. When RF classifier was used, the system reached 0% for error rate with 100% accuracy for classification on the trained dataset; and achieved 98% accuracy with only 2% for equal error rate in the tested dataset. These results outperform the performance of similar system reported in the literature. This behavioral based system was found to be a cost effective authentication mechanism, because no additional specific tools (Software/Hardware) were being required to the implementation of such a system [15,20]. As a conclusion, the combining the behavioral based with password is a solution that will strengthen security authentication mechanisms; especially, in the mobile applications using handheld smart devices [30,33]. The main motivations of this work are system usability through internet and strengthen the security level for authentication systems. As well as the lack of existing benchmark datasets for handheld devices was a reason that had encouraged us to plan for building a benchmark dataset that being compared with CMU dataset.
In this paper, the neural networks and machine learning classifications are used as main techniques. Multiple classifiers are used in order to get the lower error for authentication based system. The subsequent structure of this paper is as follows: Section 2 presents a background on authentication and KSD. Section 3 provides behavioral authentication in details. the public datasets and performance provided in Sections 4 and 5. The methodology and techniques were deployed in Section 6. Section 7 highlights the experiments and results. Finally, Section 7 provides conclusions and future works.
Background Information
Securing a system is achieved by maintaining the requirements of security. The early meaning of computer security is defined by NIST [1,30]. This definition is maintained by preserving the CIA triad: Confidentiality, Integrity, and Availability. Although the CIA triad is defining most of the security objectives, additional concepts are needed to present the whole security picture; authenticity and accountability are the most common ones. Authenticity is the property of being genuine user for the system, and being able to be verified and trusted. This means verifying users who they are. Accountability highlights the security goal which generates the requirements for actions of an entity to be traced uniquely on the system. Accountability helps the system auditors for reporting and investigating any security incidents [4,33].
Authentication is achieved by something that the user knows such as passwords, something the user has such as credit cards or Automated Teller Machines (ATM) cards, or the user owns such as fingerprint or iris. And lately, a new mechanism for user authentication is based on the user behavior such as typing rhythm behavior [1,5,6]. The most commonly used mechanism is based on the user knowledge, such as passwords, and security questions. So, simply if you know the secret information for an account, then you should be the owner of that account [16,41,44]. The advantage of this type of mechanisms is: there is no additional requirement needed to be installed. But, all the requirements concentrated by designing a database that stores all the needed information and secrets in it, by saving the password itself, or the hashed value of the passwords. The main problems come from this type of authentication mechanisms are: they can be forgotten, guessed, or stolen [46,51].
Since many people use very simple and easy to be guessed passwords, they are usually use a combinations related to themselves, their birthdays, their kids names, etc., and can be hacked. However, they also use the same password for access to such applications and websites in the internet. Some of these websites may be not secured and transfer the information in an unencrypted format which can be easily hacked and recovered [15,16,51]. Another major problem of password comes from registering for a website as new user; the process sometimes requires a security question used for future in case of restoring the lost account. Users tend to use dummies or incorrect answers. They are usually suffering from social engineering threats or losing the required piece of information when it required being entered in the application form or website pages [8,23,24].
The second type of authentication mechanism is the physiological biometric based type which depends on the human body attributes itself such as finger print, retina, and iris [44,52]. This mechanism is usually used in banks to identify the users based on their human attributes. This mechanism requires the user enroll some human attributes that can be scanned and digitally stored. The advantage of this mechanism it has lower probability to be stolen or hacked when compared with other mechanisms such as passwords. The main problem of these mechanisms is the additional cost needed to install special tools or hardware to do the scanning of the human attributes [31,53,54].
The third mechanism is biometrics based mechanism. This is based on identifying the living individual based on their physiological or behavioral attributes [5,15,58]. The behavioral based mechanism depends on the user behavior such as typing on keyboards, clicking and moving patterns of computer mouse or tapping pattern on touch screen devices in mobiles and tablets. One of these behavioral based mechanisms is the typing rhythm which is called keystroke dynamics (KSD). The KSD is one of the well-known behavioral authentication mechanisms which rely on capturing and utilizing the human characteristics and measurements for typing rhythm [14,57]. The KSD is a biometric assumption that different people typing in a unique and distinguishable ways. So, embedding the typing behavior in authentication systems will strengthen the systems security level [15,65].
The KSD is considered as the best mechanism from the behavioral based authentication family in term of cost effective solution and the remarkable performance measures; It is the cheapest type in cost, since there are no additional tools required to be added or installed; as it requires the already existing hardware or software on the desired keyboard or keypad on the used device [4,51,55]. Also, it provides a remarkable performance measures and metrics such as accuracy and error rates [15,20,58].
As a conclusion, authentication mechanisms can be fulfilled based on user knowledge such as passwords, by using object and token such cards, by using physiological biometric attributes of human such as finger print or iris, or behavioral attributes such as keystroke dynamic typing. However, password is not sufficient mechanism to use in authenticating users remotely, while the authentication using objects or biometrics based is not applicable in the internet applications and websites [5,6]. However, the password can be combined with keystroke dynamic (KSD) typing. Such a combination constitutes a promising solution. In this work, the neural networks and machine learning classifications are used as main techniques. Multiple classifiers are used in order to get the lower error rates for authentication based system. The proposed approach uses timing and several non-timing features. These features have a demonstrated significant role for improving the performance measures of KSD behavioral authentication systems when compared with other studies provided in the literatures.
Behavioural Authentication
The KSD rhythms are described using several combinations of features. These combinations are extracted and processed from authentication system Graphical User Interfaces (GUI). These GUI could be opened from personal computers (PC) screens or virtual keyboard on handheld devices. Many features are being extracted such as timing and non-timing features [4,20,26].

KSD User authentication features.
Figure 1 describes major of these features. Timing features are such as down time, up time, and duration time
The basic features in typing behavior are the timing features [19,40,64]. Timing features can be recorded in a mobile device using a special timer. This timer could communicate with the operating system using event handlers which can be captured when a key is pressed and/or released on a mobile device. These timing parameters could be categorized in to two types: press time and release time for each key. Press time is the time when the key is pressed down (D) and the release time is the time when the key is released up (U) [12]. These timing features could be extracted by capturing the period of time for each event of n characters, where n represents total number of characters in the sequence. Let us define the K, D, and U sets as follows.

Keystroke dynamic Timing features.
Other features such as finger pressure [7,62], position on key, distance [67], speed of key typing [37], and size of key surface touched by finger [7,39,62] are non-timing features. These attributes change during the period of keys are pressed and can be captured from interface to identify users. For every pressed key, we can extract features similar to those which are summed up in Table 2.
The uses of KSD were proposed in the literature for more than three decades [44]. Their analysiss have been shown significant performance with significant results. For example, authors in [44] provided a significant result of zero percentage errors in authentication systems by using timing information only. The proposed datasets were collected and their features derived from classical keyboards from personal computers [5,21,47].
Authors in this area proposed good contribution for existing unique datasets [27,31]. It was remarkable that introducing a reference dataset was difficult. Although the existence of such datasets was remarkable in the literature, it was helpful to compare the performance among different research datasets. These datasets were used to evaluate research performance results [33]. The first public dataset benchmark for KSDs was proposed by the Carnegie Mellon School, in 2009. It is well-known by (CMU) Keystroke Dynamics dataset, which was dedicated for timing information only using personal computers and regular keyboards [11,41].
Performance of KSD based system is usually measured in terms of various accuracy measures and well known error rates. Namely, False Rejection Rate (FRR) represents the probability of correct users who are rejected from the authentication system. It is calculated by finding the percentage of number of rejected genuine users from the total number of the rejected users. In statistics, it is Type I error [15,18]. False Acceptance Rate (FAR) represents the error value which describes the possibility of incorrect user being accepted to access the KSD system. It is calculated by finding the percentage of number of accepted hacker users from the total number of the accepted users; this is, in statistics, Type II error [19]. Also, Equal Error Rate (EER) is a common error measure of KSD systems. It represents the area in which the percentage of accepted users and the rejected users are equal on the error rate curve. The lower the value of EER, the better performance for the system is recorded [4,25].
Table 3 summarizes surveys for last recent research done in this area. These researches were being classified based on the features being extracted, the used performance measures, and handheld devices with Android based operating system [17,53,60]. By observing these results, most of recent researches were based on Machine Learning (ML) and Neural Network (NN) models. These models were dynamically adaptive and continuously adapted to affect the performance of the KSD systems. This was one of the reasons that most authors were chosen the NN models in their work. The NN model was adaptive to change; it was also flexible to acquire new features that could be extracted from a suggested handheld device in the future. The flexibility and adaptability to change made the NN models scalable more than the classical statistical based model. Nowadays, more features were added to the mobile devices, so the need of dynamic models which are adapted to change and being flexible to be modified in the future is major issue in KSD systems design [38,45,49].
Recent studies of KSD authentication systems for Android mobiles platform
Moreover, most recent researches were targeted the handheld devices more than regular computer and personal computers, because most of sensitive data were navigated through handheld devices such as mobiles. And they were subjected to be hacked, stolen, and/or misused. Most of these activities involve sensitive data which must be highly secured. The PCs were less used than the mobile devices, because the mobility features in the handheld devices, and the ease of use when and where ever the user been [35,36].
This section summarizes the methodology for developing the proposed Keystroke Dynamic (KSD) authentication system based on timing and non-timing features, using NN classifiers. These features are collected and extracted using a virtual software keyboard installed on the target handheld device.

Framework of this research.
Furthermore, these features can be examined using several methods, such as, statistical methods, Neural Network (NN) methods or using combined approaches (Hybrid approaches) [40,42].
The main goal of this research is to implement KSD to be applicable in authentication system. Figure 3 describes the conceptual framework for this research. The suggested KSD based authentication system should be secure, highly reliable, with very small error rate, and easy to use. The hacker has to know the user typing behavior in addition to the password itself to be authenticated as correct user in the desired system [66,68,70].
The implementation of this KSD authentication on handheld devices is based on a two phase authentication model: Enrollment and Authentication. In the first phase, data is collected from the user and it is stored in a database to identify the user later. The second phase is the real “
All enrollments to authentication systems work in the procedure that consists of four steps [16,62,63]:

General KSD Authentication system model.
In the suggested model, this phase is reduced to Identification of User as defined by Eq. (1). It is a function (f) applied on the typing behavior collected features.

General KSD Authentication system model – Enrollment phase.
The second phase of real authentication process in the behavioral authentication system includes three main steps [62]. The flow of this phase is described in Fig. 6 as follows.

General KSD Authentication system model – Authentication phase.
These steps mainly fulfilled the verification function
Thus the authentication of the KSD is reduced to Authentication of User, defined as the function
Authentication of User = f3(typing behavior features), Where
The composition of the
Description of KSD authentication model events status
These are two main events type of the KSD system model, with four different suggested status values. Also, there are two cases that the system successes in authenticating users, whatever the user was (represented by case 1 and case 4 in Table 5). And there are other two cases in which the system fails to authenticate users (represented by case 2 and case 3 in Table 5). Summarizing the status of the KSD model would be as follows.
When the system fails to authenticate users (Accepts hackers or rejects genuine users).
When the system successes to authenticate users (Accepts genuine users, or rejects the hacker users). It is represented by Green color in Fig. 7.
Based on these two main statuses, we could imply evaluation metrics to evaluate the KSD model performance.

Flow chart of analyzing events status.
The implementation of the KSD system proceeds according to three steps: Data collection and feature extraction; Data analysis and building and methods chosen for classification; and Performance evaluation of KSD system.
There are many features possible to extract from mobile device’s keyboards, but it was unpractical to extract every possible feature. Therefore, it was essential to decide which features were needed for the authentication task. Virtual typing was typing using a touch based keyboard an entering the credentials in the specified places in the text boxes [51,62]. We have enforced user to use the new developed virtual keyboard instead of the standard software keyboard. This was why all mobiles vendors denied developer to read all typing features directly from the built in and already installed keyboard. This keyboard interface is shown in Fig. 8 and Fig. 9.
Since the keyboard is software, users were forced to choose the developed keyboard, so the data collection for this system was done via Java application which is installed on an Android based handheld device. During the user typing, the virtual keyboard was able to record every action for keystroke events done by users.

1st screen interface for KSD application.

2nd screen interface for KSD application.
The captured data contain time in millisecond, pressure, size, and location (
The chosen mobile platform was Android OS. Data collection was performed on Sony Xperia tablet Z, with Android version (5.1.1) or above. Each user did the typing on the interface multiple times. For each typed character by the user, multiple features were sensed and stored on the device storage as a text file. See a sample of these files on Fig. 10.

Sample of raw data that was collected by the virtual keyboard application in Android devices.
The features were captured and stored on the handheld device by calling some functions. In our case of using Android OS, the motion classes provide functions to let the virtual keyboard be able to collect input data for each key pressed/released [28,52]. Table 5 describes the main Android functions that were used in this work to develop the virtual keyboard.
Android System calls events
Based on the literature [2,25,62], the common methods of data analysis and dataset building for classification were empirical methods with different techniques. These are mainly classified into: statistical with static text [7,64], statistical with dynamic text [2], neural network with static, neural network with dynamic [9,37], hybrid with static text and hybrid with dynamic text [62].
The proposed model in this work was built based on NN and on multiple classifiers in which the typing behavioral authentication samples were trained and tested. Multiple experiments to train and test the suggested model using the available user’s samples were conducted. Our training and testing were done using WEKA toolkit. The KSD model of KSD system is as shown in Fig. 11.

Authentication in KSD systems model.
The designed KSD system based on the suggested authentication model should be adaptive model. It was constructed using well known neural networks and machine learning algorithms, such as Multilayer Perceptron (MLP), decision tree (DT), and Random Forest (RF). The core function included in the NN model was the decision making module. This module was dynamically adaptive and continuously adapted to affect the performance of the KSD model. It was dedicated to optimize the accuracy of the system using some dynamic threshold. This threshold should be determined at early beginning of model setup. This threshold could be dynamically changed based on the required performance evaluation for the system performance metrics. This was one of the reasons to choose the NN models in this work. The NN model was adaptive to change. It was also flexible to acquire new features that could be extracted from a suggested handheld device in the future. The flexibility and adaptability to change made the NN models scalable. Nowadays, more features were added to the mobile devices, so we need a dynamic model which is adapted to change and being flexible to be modified in the future. Figure 12 describes simple design of NN model with dynamic threshold.

Decision making process in NN authentication system model using the dynamic threshold.
The decision making function
Decision making of identification of user = f4 (Predicted_ID, Stored_ ID), where
This formula took the Predicted_ID, Stored_ID as input parameters, while the output was an event of accept or reject as a final result. These two values were calculated based on Eqs. (1) and (2). The Predicted_ID was calculated by using
Decision Trees (DT) and Random Forest (RF)
Decision Trees (DT) and Random Forest (RF) provide a simple way of learning function that maps some input data to a desired output. Where the input data can be a mix of categorized or numeric variables, while the output represent the classification result. A DT is a tree which can be represented by Directed Acyclic Graph (DAG) in which consisted of [32,34,48]:
The square box represents nodes in the DAG which means decisions.
The circular box represents nodes in DAG which means Random transitions.
While the edges or branches representing possible paths from one node to another in the DAG, usually represented by binary values (yes/no, true/false)
Figure 13 describes these trees architecture. This consists from root nodes and children nodes; the arcs or arrows represented a decision path which could be taken when the algorithm executed. The specific type of DT which be used for ML contained no random transitions. To use them for classification, we forward the row of data or a set of features and started at the top level nodes which named the root. And then through each path we take a decision based on the values of each sample at that node [50].

Decision Tree classifier architecture: A DT is a tree which can be represented by Directed Acyclic Graph (DAG) in which consisted of nodes and edges.
RF is the generalized form of the RT. The forest is consisted from many trees. Each tree represents a decision of classification for one sample data. If one or a few features are very strong in the classification predictions, they would be selected in many other trees in the forest. Different parameters are controlling the algorithm flow. The two main parameters are the entropy and information gain. Entropy is a value represented how the samples of the dataset are homogenous. RT and RF algorithms used the entropy to calculate the homogeneity of samples (If the sample is completely homogeneous the Entropy = zero, and if the sample is equally divided, then it has Entropy = 1). If the dataset is being split upon different attributes from the root to the child nodes. And the Entropy for each branch is calculated. Each time we did the split, we should subtracted the entropy after split from the entropy before split; the result is the called the Information Gain. Calculating the Entropy values using the frequency table of one attribute is represented in Eqs. (5), where E symbols used to describe the entropy value of one attribute/feature H. The value
Building the DT or RF is all about finding attribute that returns the highest information gain in each path or branch (i.e., the most homogeneous branches). This gain is represented in Eq. (7). The Gain(V,Z) represents the value of gain based on split the tree into two features V followed by feature Z. The Entropy(H) represents the value of entropy of the branch taken when using the feature H for doing the split in the root node; while the Entropy(V,Z) is described in Eq. (6).
We have four main features (Time, Size, Pressure, and Position of Coordinates) in each sample that being used to predict the fifth feature which was user ID in the dataset samples. The first thing needed to do was to determine which attribute will be put into the root node at the top of the tree: Time, Pressure or Size… etc. And the entropy of each category of the predicted feature should be calculated. Assume that that the first node in the DT will be the

Continuing splitting the internal node for further branches. The P1 value of Pressure feature help in splitting for further branches.
For the provided example, suppose that the result of tree in Fig. 14a was the Class ID of User1 and it was resulted as a classification decision. This step could be done in multiple trees (in case of forest of trees). So the voting for a final result, required to build multiple trees with multiple combination of feature splitting. Then, the final decision would be the majority voting of these results. See Fig. 14b which provides the RF architecture and whole procedure that happened in classification process.

RF classifier architecture, where left hand side tree produces Class (Identification) A and all other trees produce Class B.. In this case, the Random forest would be voted for class ID = B.
Multilayer Perceptron (MLP) is a feed forward Neural Network (NN) model that maps sets of input variables onto a set of outputs decisions. MLP consisted of network which built from multiple layers of nodes. It is used commonly in learning variety of nonlinear decision dataset includes images, voices, as well as the KSD data. MLP main features are layers and weights [17,56,72].
Layers
The MLP simplest architecture is consisted of minimum three layers; an input layer, hidden layers, and output layers. Each node sometimes is called neuron if it has only one neuron, or each layer could be consisted of multiple neurons.
Weights
Weights values are set initially, and then their values are learned from sample set values that flows from input layers to the output layer. MLP network usually use different parameters to configure the NN, such as:
Learning rate: it used to ensures that the values of weights converge to the right decision, without producing large error values in the output decision (Ranges between 0 and 1, and the default value = 0.3).
Momentum rate: this value applied to affect the weight deviation in the network, to achieve faster training speed (Ranges between 0 and 1, and the default value = 0.2).
Number of nodes or neurons: Represented the number of nodes or neurons in each layer. The notation
Equations (8) and (9) are describing these parameters relationships [17,56,72].
The flow of MLP function is started by applying the inputs to the neurons using a linear combination of attributes and their weights as in Eq. (10), where M is representing the output vector of the neurons, D is the input vector of features, W is the weight vector. Then, apply M vector to the activation function which is described in Eqs. (10) and (11) [56,72].
Figure 15 describes the sequence of these equations when mapping set of inputs onto set of outputs in each neuron in the network.

MLP simple architecture. The vector of [X1 X2 X3
Performance of KSD based system was measured in terms of various accuracy and error rates measures [4,25]. Namely, False Rejection Rate (FRR), False Acceptance Rate (FAR), and Equal Error Rate (EER) are the common error measures of KSD systems. The EER represents the percentage of accepted users and the rejected users are equal on the error rate curve. The lower the value of EER, the better performance for the system is recorded [4,25]. Figure 16 shows the curves of two error rates FRR and FAR drawn with respect to the sensitivity of system decision threshold. The best value could be taken when these two curves met.
There are two metrics to describe the success rate of the KSD model. These are True Acceptance Rate (TAR) and True Rejection Rate (TRR). There are also two metrics to describe the fail rate of the KSD model. These are False Acceptance Rate (FAR) and False Rejection Rate (FRR). Each metric has simple formulas to be calculated. All these metrics described in Table 6.
There were many ways to calculate the EER [20,22]. One of the simplest ways to find out the EER could be directly calculated based on Eq. (12).
Description of evaluation metrics
Description of evaluation metrics

Performance metrics used to evaluate the KSD system.
In this section, the conducted experiments are described. This includes building dataset, training and testing of the Neural Network (NN) classifiers based on the suggested KSD model. These experiments demonstrate the implementation of the KSD system and a suggested dataset benchmark. This dataset was constructed over six months, and built based on the unique combination of features we have proposed. The experiments prove that the behavioral KSD authentication identifiers are measurable and have distinctive characteristics which could be used to label and authenticate users.
Fifty volunteers from different ages, genders, and backgrounds participated in different experiments. Dataset records were collected over a period of about six months. All volunteers were trained first on the same device which was used for the enrollment phase. The training and testing were done on WEKA toolkit using multiple NN models. After the user had a good orientation for using the installed keyboard and typing on the device, three trials were being discarded from dataset when users typed first time on the device. Also, users had the ability to type freely on the device.
Participants came from different backgrounds; they were universities’ students, engineers and information technology (IT) sector employees. There were 22 (44%) engineers, 18 (36%) IT related employees, and 10 (20%) students participated in dataset collection. The dataset of the 50 users includes 25 male and 25 female. Each one of the fifty users was associated with his/her own user ID {1, 2, 3, 4, 5
Users were asked to type their names and“
One of datasets or benchmarks is the Carnegie Mellon School of computer science (
The characteristic of the proposed KSD system is demonstrated by the following experiments.
(Implementation: Classification capability of KSD classifier using collected dataset and comparing with a public dataset among multiple classifiers).
The main goal of this experiment is to examine the KSD system classification capability using our collected dataset. This was highlighted by testing the KSD system using multiple NN classification algorithms. The dataset details were described in the Table 7.
Experiment 1 Setup
Experiment 1 Setup
Using the setup and requirements of experiment 1 in Table 7, this experiment conducted to compare the classification capabilities of different classifiers. This experiment was done on two phases to evaluate the KSD system. These two phases are training and testing. The training results on the dataset were provided in Table 8a. It shows the classification results among multiple classifiers using the training set only (taking all samples as training set for the classification algorithms). The RF classifier gets the lowest EER with 0%, the MLP with deep learning configuration classifier comes also with 0%. The other classifiers such as NB, MLP, and SMO come then with 1.1%, 1.65%, and 9.05 %, respectively for EER values. From this experiment results, it is proved that the RF classifier and MLP with deep learning configurations are powerful classifiers with highest performance measures.
From another side, the testing phase conducted using training and testing mechanism on the suggested dataset presented on Table 8b. This testing based on cross validation with percent split of 10% from the dataset samples. Table 9 shows the classification results among multiple classifiers. The RF classifier gets the lowest EER with 1%, the MLP with deep learning configuration classifier comes next with 3.9%, and the Naïve Bayes and MLP classifiers come then with 4.1% and 8.05%, respectively.
Experiment 1: results of classifications based on collected dataset among multiple classifiers algorithms (Training only the dataset)
Notice that the RF classifier has the best classification capabilities among all experiments trials. This was based on RF advantages, as the predictive performance capabilities for this classifier could be competed with the best supervised learning algorithms. Besides that, it provides reliable features importance prediction based on its procedure for feature processing and samples classifications. It needs little pre-processing data to be done. The data does not need to be rescaled or transformed before the classification conducted on the provided data.
Experiment 1: results of classifications based on collected dataset among multiple classifiers algorithms (Training and testing using cross validation with 10% split)
Comparative analysis between the suggested dataset proposed by this work and the public dataset, using the training set results, the numbers are in %
Moreover, there is still a chance to do some closer comparative analysis between the suggested dataset and the public dataset published by Antal et al. [10]. We have used the results provided in Tables 8a and 8b to compare with prior dataset. This comparison provided in Table 9, which implied how the performance of KSD was improved in our suggested dataset. Most of classifiers had got better results when compared with the results of public dataset. The unique combination of features we have chosen helped us to reach a significant accuracy measures which reached 100%, and EER with 0%, using the training set, while they have reached 92.9% for accuracy and 3.55% for EER. We have reached a significant classification performance when using RF classifiers in this comparison as we reached 100% for accuracy and 0% for EER. When using the public dataset, the results were 92.7% for accuracy and 3.65% for EER when the RF classifier was used. The NB classifier also provided better results than the other tested classifiers, which got 97.8% for accuracy and 1.1% for EER.
Experiment 2: Setup and details
In this experiment, the contribution of the non-timing features was provided. The dataset file was divided into two sub files. The details of the dataset split were described in Table 10. The results of classifications among different classifiers are provided in details in Table 11. The role of the timing and non-timing features in the final dataset are highlighted in separated rows in the table, bold font and red color highlight the combined result.
This combination has a direct effect on the dataset classification results by improving the performance measures. Again the RF classifier got the perfect match between the accuracy and time was taken to build the model. With a reasonable time, the RF scored the highest performance measures. Based on the goal, keeping the balance between the times taken to build the model, the performance measures RF would win in the final comparison.
Experiment 2 Results of classifications of multiple classifiers using cross validation 10% using the collected dataset among multiple feature combinations
Experiment 2 Results of classifications of multiple classifiers using cross validation 10% using the collected dataset among multiple feature combinations
Note that classifiers such as MLP and deep MLP had provided good results in KSD experiments. However, despite its power against larger and more complex datasets, they were extremely hard to interpret with huge NN setup. They required longer time to achieve good results. As well, one of the biggest advantages of using Decision Trees and Random Forests was the ease in seeing what features or variables contributed to the classification.
In this experiment, the effect of the training factor of the NN classifiers on their performance measures would be studied in details. The dataset was collected on multiple cumulative sessions; in which users were asked to type their passwords on multiple sessions to guarantee the quality of typing. Also, the goal for this is to keep the user comfortable when he/she was asked to type many passwords per session. During the collecting period, we did not force them to type when they were busy or having multiple duties at the time of typing session. Table 12 describes the details of datasets files collected for this analysis.
Experiment 3 Setup and details
Experiment 3 Setup and details
The dataset is divided into four dataset files. The first dataset contains the typing behavior of 50 users; in which each user had typed 10 trials for the predefined password Dataset 2 includes the typing of 50 users, and each typed 20 trials into two sessions. The third and fourth datasets includes the typing of 30 and 40 trials in three and four sessions, respectively. Using the random classifier and comparing the performance measures among the 4 datasets, it was remarkable that the training factor played an important role that affected the performance directly. The RF classifier got the highest performance values when the user got the maximum experience on the virtual keyboard. This would help when applying the KSD systems as second factor authentication on handheld devices. The users in nature would type their passwords hundreds of times on different applications. This will improve the performance of system as the typing trials will increase. Dataset-4 with 40 typing trials per user got the highest performance measure; it got 98% for accuracy of classifications, and reached a 0% value for FAR, and 2% for FRR, while the EER was the minimum for all dataset files with 1% only. It was noticeable that the learning factor improved the performance of classification of the selected classifiers. By repeating this experiment for multiple classifiers, the detailed results summed up in Table 13.
Experiment 3 Results of using NN classifier using cross validation 10% on multiple datasets
Beside the performance measures, the time was considered to build the model in each dataset also. The larger is the dataset, the larger is the time taken to build the model of each classifier algorithm. Furthermore, this time was needed to compare different classifiers performance for each dataset. It was remarkable that the RF classifier got the perfect performance in term of higher accuracy, reasonable lower time and lowest error rate. These algorithms in some cases such as SMO and NB were recorded lower time taken to build the model, while they recorded also larger error rate and lower accuracy. Here, we are looking for a tradeoff between the time taken to build the model and the performance measures. The RF outperformed all the compared classifiers on the 4 dataset files and recorded the significant values in performance measures and time. The summarization is shown in Table 14.
Experiment 3 Results of using NN classifier with cross validation 10% on multiple datasets, where the entries for Accuracy in %, and run time in (sec)
We have summarized the recent research results including our results in Table 15. This table shows that the suggested dataset that was developed and designed by this work reached significant results performance. The choice of these listed results was based on these researches that shared the following criteria: Dedicated for Android touch screen devices, QWERTY keyboard standard was used, Timing and non-timing features were extracted, and Password specific template complexity (the chosen character combinations consisted of alphabetic, numbers, and special characters).
By looking at the results in Table 15, we have such points to be highlighted first based on the features extracted. The timing features was the main procedure was taken in all listed researches. The timing features affecting the performance of KSD systems, where the more features were used the more accuracy were resulted and minimum error were recorded. The authors in [15,18,20] recorded reasonable performance in accuracy and EER. Especially the author in [18] scored a significant accuracy (99.35%) when using PIN password in their dataset. They provided promising results using NN model that collected PIN passwords only (only digits between 0–9). Also, authors in [29] scored a remarkable lower error rate (only 0.72%) when using similar PIN password. This password is not practical in real applications. Most of websites now forcing users to choose complex password; this password should be combinations of text, numbers, and special characters.
Summarized results for KSD authentication systems
Many authors choosed this template in the listed research such as Authors in [8]. They have used the same password template (.tie5Roan1) that we chose. We have provided more non-timing features compared with their dataset features, as well as timing features. As previously stated, we used DU, DD, UD and UU timing features beside the pressure, size and position XY coordinates. We have stated clearly by experiments the role of non-timing features in improvement of the KSD performance measures. Also, authors in [62], provided more features in their combinations compared with authors in [8]. They used only one timing feature (DU) compared with our work. As we have reached accuracy 100%, and the EER was 0% for the provided dataset on training phase. While it scored 98% accuracy and 1% for EER when it was tested based on cross validation techniques.
Finally, it was remarkable that the researches that used the NN techniques were provided more accuracy results and less error rate. This is why that the NN was listed frequently on the latest researches provided in the literature. The NN model was adaptive to be changed; it was also flexible to acquire new features that could be extracted from a suggested handheld device in the future. In conclusion, the flexibility and adaptability to change made the NN models more scalable. Nowadays, more features were added to the mobile devices, so we need a dynamic model which is adapted to change and being flexible to be modified in the future. Authors such Alghamdi et al. [2] and Calot et al. [20] presented example of statistical model based. While our research and other based on NN such as [18] scored higher accuracy measures and lowest EER.
This study presents authentication system based on Keystroke dynamics (KSD) combined with password. The behavioral based system was found to be a cost effective authentication mechanism, because no additional tools (Software/Hardware) were being required to the implementation of such a system. Taking in consideration that authentication by passwords could be combined with another mechanism of authentications; such as Object/Token based or physiological biometric based. But, these are not feasibly applicable via remote access or via internet. Moving to combine passwords with behavioral based will strengthen the security level for authentication systems. This is because the hacker has to capture the user typing behavior beside the password itself to gain access to the systems.
Hence, password has been combined with KSD typing. Such a combination has been demonstrated a promising and effective solution. Based on this combination, user authentication system on handheld devices has been proposed. In addition, a virtual keyboard has been developed and installed on Android device to collect timing and non-timing features. Furthermore, we have proposed the use of complex password consisted of text small letters (a–z), numbers, capital letters (A–Z), and special characters (!@#$%ˆ&* etc.). The proposed system has been implemented and tested on CMU benchmark dataset of KSD and abuilt one, named UJ-KSD dataset.
Fifty volunteers from different backgrounds have been participated in data collection for different experiments. Data entry samples were collected over a period of about six months. All volunteers were trained first on the same device which was used for the enrollment phase. The training and testing were done on WEKA tool kit for machine learning and neural networks purposes.
Several experiments were executed to prove that the KSD use in the authentication systems would improve the system performance and strength the security level measures. So, this goal was fulfilled by proposing KSD model which was tested based on NN multiple algorithms. WEKA toolkit enables researchers to tune their experiments based on different methods and multiple classifier. For example our approach was tested using RF, MLP, Bayes Net, and SMO classifiers. When RF classifier was used, the approach reached 0% for error rate with 100% accuracy for classification.
The empirical results that we obtained performed better than other results reported in the reviewed literature. Especially, the authors in [18] scored a significant accuracy (99.35%) when using PIN password in their dataset. Other authors such as [43] got a similar result of (accuracy with 99%) when using PIN only password combinations. While [29] scored a significant value for error rate with only (0.72%) when using the same PIN password. Most of websites now forcing users to choose complex password; this password should be combinations of text, numbers, and special characters. This was an important reason that we utilized “Real Password”, which is more practical.
The password that was chosen for this proposal has higher levels of complexity and more realistic. This password is more close to the password standard got from Authentication Management Standard that derived from NIST Special Publication for PCI Data Security Standard [1,12,30]. This standard stated that all the requirements and policies to create a strong password in critical applications and e-payment through handheld touch screen devices. So, we have used the same password template (.tie5Roan1) to provide more applicability for our suggested KSD model.
The Carnegie Mellon School of computer science (CMU) benchmark data set of KSD designed to target only personal computers with text consisted of the timing features only. It would be useful to have a benchmark for touch screen devices. The lack of existing benchmark datasets for handheld devices was a reason that had encouraged us to plan for building a benchmark dataset that being compared with CMU dataset.
In the summed up researches, the race between them was how to find the best combination of features that would describe the user identity in more accurate manners. The more description we got the more accuracy to use in the KSD systems [3,71].
Conclusions
This paper presents extensive comparative analysis conducted authentication systems based on KSDs. It proposes a prototype for a keyboard which was developed for collecting timing and non-timing information from KSDs. This work is designated especially for touch screens and handheld mobile devices. This model deployed appropriate methods to investigate unique combination of behavioral features. The deployed methods are the well-known neural network (NN), machine learning (ML) algorithm, multilayer Perception (MLP), and random forest (RF) classification methods.
Recent research focused on feature combination using regular keyboards and on specific features such as timing features. Additional features, if added would improve the strength of authentication as is shown in this research. Modern mobile devices were provided with many new features as the weight of finger force especially with touch-screen devices, where the force can be measured. Also, the physical distance between characters was important. Moreover, using complex password consisted of capital letters, special characters, and small letters have some effects on KSD performance.
Proper implementation of KSD might yield more acceptances to online banking end E-Payment applications. Building a KSD model that being secure, reliable had a possible applicability through internet. It could be easily integrated with other authentication system as a second factor authentication for remote access applications, specially, a financial system that enables payment through internet.
The suggested KSD system for user authentication system is feasibly practical in various internet critical applications including banking systems and healthcare systems. It was a step forward towards e-society with appreciable economic advantage.
More features and classifiers algorithms can be selected and examined in future work. Multiple classifications algorithms that use multiple learning algorithms like NN, DL, MLP, and RF classification methods are needed to be further investigated in order to reach optimized performance evaluation using some performance metrics such as significant accuracy, and minimum error rate in classification performance.
