Abstract
Nowadays, society is in constant evolution, which allows constant production of new knowledge. In this way, citizens are constantly pressured to obtain new qualifications through training/requalification. The need for qualified people has been growing exponentially, which means that resources for education/training are limited to being used more efficiently. In this paper we will focus in the design the user model, so, we propose an innovative approach to design a user model that monitors the user’s biometric behaviour by measuring their level of attention during e-learning activities. In addition, a machine learning categorization model is presented that oversees user activity during the session. We intend to use non-invasive methods of intelligent tutoring systems, observing the interaction of users during the session. Furthermore, this article highlights the main biometric behavioural variations for each activity and bases the set of attributes relevant to the development of machine learning classifiers to predict users’ learning preference. The results show that there are still mechanisms that can be explored and improved to better understand the complex relationship between human behaviour, attention and evaluation that could be used to implement better learning strategies. These results can be decisive in improving ITS in e-learning environments and to predict user behaviour based on their interaction with technology devices.
Introduction
With the evolution of computer science, it was possible its introduction in the field of education since a large number of computer systems has been developed for this area, for example, the e-learning systems [15]. However, most e-learning systems were only able to manage and share documentation between tutors and learners. Then emerged the Intelligent Tutoring Systems (ITS) that have been the object of study by AI researchers. Several theories have been applied to verify and establish their effectiveness [24].
ITS are learning environments that help learners to master knowledge and skills. ITS have implemented intelligent algorithms that adapt to users and allow the application of complex principles of learning. An ITS should normally work with only one user, because users differ in many dimensions and the goal is to be sensitive to the idiosyncrasies of individual users. Some basic activities of ITS should incorporate active learning of the user, interactivity, adaptability, and feedback.
The main goal of ITS is to make these technologies adaptable to users, based on their individual characteristics and needs. Thus, the ITS offers individual tutoring benefits automatically and autonomously, making each user progress at their own pace (shown in Fig. 1). Also applying the concept of adaptive learning to ITS, we will have a powerful learning tool. Learning skills are vital to achieving success, whether at school or at work and are becoming increasingly important for social communication. One of the real challenges of ITS is to develop fully interactive support [3].
However, we know that ITS’s are complex computer programs that manage various heterogeneous types of knowledge. Thus, building such system is therefore not an easy task. It is necessary that the team that will build the ITS be well equipped to face various problems related to the construction process. In fact, the resources needed to build an ITS comes from various research fields, including artificial intelligence, cognitive science, science education, human-computer interaction and software engineering. This multidisciplinary makes the process of building a fully ITS a challenging task, since the authors may have very different views of the system. Some promote pedagogical accuracy (ensuring that tutorial decision-making is based on pedagogical principles), while others focus on the effective diagnosis of student errors (using knowledge structure and appropriate algorithms to interpret student’s decisions correctly) [19].

Multidisciplinary nature of intelligent tutoring systems [24].
Currently, one of the main problems related to learning is the level of attention that learner dedicates in the execution of the proposed tasks. The level of attention that each user devotes to a given task is increasingly affected by the evolution of the use of the Internet and social networks. The use of the Internet and social networks have a high impact on the attention since they offer information that captivates the user, negatively influencing the level of attention of users in the application of tasks that really matter. It is crucial to improve learning process and solve problems that may occur in environments that use new technologies [8]. Attention is a complex process through which an individual is able to continuously analyze a set of stimuli and, within a sufficiently short period of time, choose one to focus on. Most people can only focus on a very small group of stimuli at a time.
By monitoring the user’s attention, it is possible to improve the effectiveness of the system so that only the relevant material is provided so that the user can evolve. Additionally, the system can provide the feedback of the results and based on that information, readjust the system to prevent undesirable situations and improve the performance of users [5].
Another factor to consider is the learning style. The learning style not only specifies how a user learns and enjoys learning but also indicates how the user learns better. In this way, the system can be adapted to the individual user and improved learning. When ITS methodologies do not support a specific learning style, user will find it more difficult to learn and acquire knowledge. When users are doing learning activities using ITS, it is extremely important that the system receives feedback from the users’ work in order to detect possible learning problems at an early stage, avoiding appropriate teaching methods [10,18].
In this paper we will focus in the design the user model, so, we propose an innovative approach to design a user model that monitors the user’s biometric behaviour by measuring their level of attention during e-learning activities. In addition, a machine learning categorization model is presented that oversees user activity during the session. We intend to use non-invasive methods of intelligent tutoring systems, observing the interaction of users during the session.

Problems of current tutories.
Furthermore, this article highlights the main biometric behavioural variations for each activity and bases on the set of relevant attributes it will be developed a machine learning classifiers to predict users’ learning preference.
The results show that there are still mechanisms that can be explored and improved to better understand the complex relationship between human behaviour, attention and evaluation that could be used to implement better learning strategies. These results can be decisive on improving ITS in e-learning environments and to predict user behaviour based on their interaction with technology devices.
The fast development of the last decades of ICT technologies has benefited all areas of knowledge. According to [17], these technologies has been applied in Education very late. However, from these technologies emerged the Virtual Environments of Learning (e-learning), in which learners interact as if they were in a real environment.
These environments are combined with other applications that provide intelligent tutoring and are called Intelligent Virtual Environments or ITS. These ITS aim to adapt to the user profile, applying techniques that best suit each one, in order to obtain better learning results. Presently, there are several of these tutors, but they did not fully achieve the desired goals since it does not consider an important element that affects student learning: their emotional state. Some of these tutors only assess the user’s emotional state at the end of the work sessions, and is not enough to improve learning. However, there are some problems that may occur with ITS, which are presented in Fig. 2.
ITS
The typical architecture of the existing ITS has the following four basic components: the Expert Model, the Student Model, the Tutor Model and the Interface [1]. Figure 3 presented this architecture.

Generic architecture of ITS [1].
The Domain Model, also known as Expert Model or Knowledge Model, consists on the elaboration of the concepts, facts, rules and problem-solving strategies of the domain in context. This model serves as a source of specialized knowledge, a standard for assessing user performance and for diagnosing errors [1]. This model performs the analysis of data, and can also make assumptions about the knowledge of a user since it observes the actions performed by that user. From these observations, this model, with access to some relevant knowledge in a given situation, will use it to make the most correct and appropriate decision.
The Student Model is an overlap of the Domain Model. This model highlights the cognitive and affective states of the user in association with its evolution as the learning process progresses. As the user works step-by-step in the process of problem-solving, the system engages itself in the model of the tracking process [1]. This model contains the dynamic representation of the user’s emerging knowledge and skills. No ITS can operate without a full understanding of the user.
The Tutor Model is also called educational or teaching module strategy. The tutoring module is the part of the ITS that designs and regulates instructional interactions with the user. This model accepts information from the Student Model and the Domain Model. It is closely linked to the Student Model, makes use of the knowledge about the user and of his own structure of tutorial objectives, to conceive the pedagogical activity to be introduced. It also monitors the user’s progress, creating a profile of strengths and weaknesses in relation to the production rules (the so-called “tracking of knowledge”). Furthermore, this model regulates instructional interactions with the user. The Tutor Model is, therefore, the source and orchestrator of all pedagogical interventions [1].
The interface is the front-end interaction with the ITS. This system integrates all types of information needed to interact with the user through graphics, text, multimedia, video, menus, etc. The interface model is the communication component of the ITS that controls the interaction between the user and the system, and the communication is processed in both directions. The Interface translates between the internal representation of the system and an interface language that is understandable to the user [1].
In the educational field, it is important to have an adaptive and intelligent system to improve learning. This system takes into account the accumulated information of an individual or groups of individuals over time. The ITS consists of a system that applies the Ambient Intelligence techniques (AmI) to provide better and more support to the educational system users [3]. Yet, what is observed in most of these systems is that they are adaptive or intelligent, but not adaptive and intelligent.
Attention is a complex process through which an individual is able to continuously analyze a spectrum of stimuli and, within a sufficiently short period of time, choose one to focus on. Attention is a cognitive process that consists of the concentration of a stimulus or a small group of stimuli and dismisses others [16]. Most individuals can only focus simultaneously on a very small group of stimuli, this implies ignoring other stimuli and perceptible information.
Attention is the first step in the learning process. Learners can not learn, understand or even remember if they do not listen properly. When the individual does not “Pay Attention”, the learning process fails. For almost everyone, it is easy to pay attention to the subjects or things that are interesting or stimulating to them, but it’s more difficult to be attentive to boring things [13].
Currently, one of the main problems related to learning is the amount of attention a learner spends when performing a proposed task [16]. The level of attention of each person is increasingly affected by the evolution of the use of the Internet and social networks. These two factors have a high negative impact on attention because they offer other interesting information that causes inattention.
Nowadays, we have to deal constantly with email notifications, social networks, messaging applications, and so on. We live immersed in beeps, vibrations, notifications and flashing icons, which constantly attract our attention and distract us [12]. Even if we return immediately to our task, the fact that we have to consciously evaluate the stimuli to decide that this is not important at the moment has already caused damage to our brain, causing it to waste resources [7,25].
Learning styles
There are many definitions of the concept of learning style. Some of these definitions are: [10,22]: a different way in which an individual learns; a learning method, a individual preference or the best way for an individual to think, process information and demonstrate learning; the preferred means of an individual to acquire knowledge and skills; habits, strategies or regular mental behaviors related to learning, particularly the deliberate educational learning that an individual displays.
A learning style is a method that allows an individual to learn better. Different people learn in different ways, each preferring a different learning style. However, people have a mix of learning styles, but some may find they have a more dominant learning style. Others may find that they have different learning styles depending on the circumstances. There are several models developed by various authors that try to represent the way people learn [18].
Learning styles can be defined as being cognitive, effective, and physiological characteristics that serve as relatively stable indicators of how learners perceive interaction and respond to their learning environments [14].
Learning is improved if the system detects and classifies individuals’ learning preferences, choosing the most appropriate learning methods, and avoiding potential learning problems.
The theories of learning style is a rather controversial theory because some authors think that the existing theories are very close, having little scientific support. For these authors, the learning style is not a straightforward concept, so, if a person prefers or feels more comfortable with one learning style, it does not mean that tomorrow the same person does not prefer a different learning style [10].
Machine learning categorization performance
Machining learning is a learning technique that find patterns in data, patterns that provide knowledge and enable quick and accurate decision making. The machining techniques allow the system to be trained and create a set of standards, enabling the creation of a prediction [28].
There are several algorithms that it can be applied, namely Support Vector Machine (SVM), Nearest Neighbors, Naïve Bayes, Neural Networks and Random Forest. SVM is a supervised learning technique for data search, pattern acceptance and classification based on statistical learning theory [26]. The Nearest Neighbor algorithm is the simplest nonparametric decision procedure, which assigns the unclassified observation (input sample) the closest class/category/label of the sample (using metric) in the training set [6]. Naive Bayes offers a simple approach with clear semantics to represent the use and learning of probabilistic knowledge [28]. In recent years, neural networks were used to control design techniques and several solutions were presented for nonlinear systems [10]. Random Forests are a combination of predictors of trees so that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest [2].
Framework
Based on the state of the art, the idea is to create an ITS system that suits each user. In this first phase, the architecture of the ITS student model is presented. In this model, it was necessary to define the following parameters: levels of attention, learning style, pattern behaviour interaction of the user and their emotional state. After these parameters defined, the ITS student model has to adjust and adapt the level of learning difficulty to each user, depending on their parameters. Thus, it was necessary to develop a new architecture due to the void found in the literature review, since most systems developed so far are invasive and intrusive. Then came the idea of creating a non-intrusive and non-invasive approach, based on observation of behavioural changes of an individual or a group, in relation to the behavioural and emotional pattern.
The architecture of the developed system (shown in Fig. 4) is divided into three main parts: the lowest level with the devices that generate the data; the intermediate level where the ITS cloud is located; and the highest level, the client system.

System’s framework [27].
At the lower level, there is a log application that runs in background, saving the required events. This application has a device that generates raw data that describes the user’s interaction with the mouse and keyboard. There are also flexible sensors that use available information from other measurements and process parameters to calculate and estimate the amount of raw data. The raw data generated is stored locally until synchronized with the web server in the cloud at regular intervals, usually every 5 minutes. In this layer, each event is encoded with the required information (i.e., timestamp, coordinates, click type, key pressed, etc.).
The intermediate level is subdivided into four layers: the storage layer, the analytic layer, the classification profile layer, and the emotion classification layer.
In the storage layer, a MongoDB database stores the data received from users when they are synchronized. The MongoDB database is a database-oriented cross-platform document that provides high performance, high availability and seamless scalability. MongoDB, besides being a data storage mechanism also provides native data processing tools, such as Map and Reduce and Aggregate pipeline. Both procedures can operate on a shared collection (partitioned on multiple machines with horizontal scale).
In the analytical layer, some processes were developed to prepare the data received for later calculations, such as the removal of small errors. For example, when the backspace key is continuously being pressed, this type of error is removed from the system. Thus, these data are evaluated according to the metrics presented. In addition, the system receives this information in real time and calculates, at regular intervals, the values of behavioural biometry and the estimation of the general level of attention of each user. These tools make analytical and statistical analysis in real time, being quite powerful tools. This analysis is useful for ad-hoc queries, pre-aggregated reports, and more. MongoDB provides a large set of aggregation operations that process data records and return corrected results, allowing the use of these operations at the data layer to simplify application code and limit resource requirements.
In the classification profile layer, all user indicators are interpreted. Based on the pre-processed data and the construction of the metadata that will support decision making, the system will classify the profile of each user. When the system presents a sufficiently large set of case studies, it is possible to make classifications accurately. The classifier, in real time, will classify the data received from the different levels of attention, creating the learning profile of each user. With these results, it is possible to obtain a profile of the learning style.
The classification emotion layer has all the data about the users’ emotions as well as the construction of metadata that will support decision making. The system will classify the user’s emotional profile and, when the system has a high set of data, will enable the precise classification of those emotions. Note that mouse movements and keyboard usage patterns help predict the user’s emotional state.
Finally, the client system is an available front end at the client layer, where users can view the tasks that must be completed. In addition, for ITS administrators, information about the user’s level of attention is displayed at the client layer. The graphical user interface consists of a module that allows the creation of graphs (CHART) and the creation of virtual teams (ROOM or Classes) so that the administrator can intuitively visualize the behaviour of the user.
The process of data extraction features begins with the collection of interaction events with electronic devices. This collection is performed by a specifically developed application that is installed on each computer or laptops. This application runs in the background, saving the necessary events without requiring user intervention. It is, therefore, a non-intrusive and non-evasive system [5]. Flexible sensors used available information from other measurements and process parameters to calculate and estimate their values.
The data collected by the application installed on the user’s device results in a record of the event set caused by user interaction with the mouse and keyboard. However, this information alone, by itself is useless, since it does not allow the extraction of relevant and interpretive information. It is, therefore, necessary to process the information related to the events recorded and to transform that same information into the characteristics presented for study. So that the information from the event log caused by mouse and keyboard interaction can be retrieved and analyzed. To process the logged events, an application was used that receives as input the logged events and transforms this data set into parameters that will be evaluated by the defined metrics, which are the result of the application as output.
The following events are acquired by the application and sent to the server for processing [20]:
MOV, timestamp, posX, posY: an event describing the movement of the mouse, in a given time, to coordinates (posX, posY) in the screen; MOUSE DOWN, timestamp, [Left | Right], posX, posY: this event describes the first half of a click (when the mouse button is pressed down), in a given time. It also describes which of the buttons was pressed (left or right) and the position of the mouse in that instant; MOUSEUP, timestamp, [Left|Right], posX, posY: an event similar to the previous one but describing the second part of the click, when the mouse button is released; MOUSEWHEEL, timestamp, dif: this event describes a mouse wheel scroll dif, in a given time; KEY DOWN, timestamp, key: this event identifies a given key from the keyboard being pressed down, at a given time; KEY UP, timestamp, key: this event describes the release of a given key from the keyboard, in a given time.
The individual logs created by the application for each user are processed to compile information that can characterize the user’s behaviour during their interaction with the computer. This subsection details the resources extracted from these interaction event logs. Figure 5 presented the general information of the dataset collected from the user’s behavioural biometrics.

General view of dataset of behavioural biometrics.
Based on the framework described in Section 3, a set of behavioural features are monitored and preprocessed by the proposed system. Through these features, it enables the development of a classification model capable of determining the task at hand executed by the user, given the influence of the user’s biometric behaviours. For this study, Keystroke Dynamics, Mouse Dynamics and Attention Performance Metrics were selected to this end. To monitor students’ performance, behavioural information was collected from a group of students during high-end tasks in a school environment.
Mouse Dynamics data-log output analyses the individual’s mouse behaviour and calculates his/her behavioural biometrics. These features aim at quantifying the individual mouse performance. Taking as an example of the movement of the mouse, one never moves it in a straight line between two points, there is always some degree of curve. The larger the curve, the less efficient is the movement is [5]. The following example is shown in Fig. 6.
Keystrokes Dynamics data-log output analyses the individual’s keyboard behaviour and calculates his/her behavioural biometrics. These features aim at quantifying the individual keyboard performance. Taking as an example the keys pressed, the writing velocity of each person is different and in a different time, people write at a different velocity. In Fig. 5, it is showing how the information of the keyboard from a dataset can be shown.

Mouse movement example.

Sequence of applications used by a specific student, with the date in which the student switched to other application and the time spent interacting with it.
Attention Performance Metrics analyses the individual’s using tasks and calculates his/her time spend in each task. These features aim at quantifying the individual work-task performance.
When the student uses an application that does not match any of the known rules for a specific task, the application name is saved so that the teacher can later decide if a new rule should be created for it. An example of the output of this process is depicted in Fig. 7.
Based on the framework described in Section 3, a set of behavioural features are monitored and preprocessed by the proposed system. Through these features, it enables the development of a classification model capable of determining the task at hand executed by the user, given the influence of the user’s biometric behaviours. For this study, keystroke dynamics, mouse dynamics, attention performance metrics, and type of learning style were selected to this end.
Mouse dynamics and keyboard dynamics
As described in Section 3.2 mouse dynamics describe an individual’s behaviour with a computer-based pointing device (e.g. mouse). Recently, mouse dynamics have been proposed as a behavioural biometric, under the premise that mouse behaviour is relatively unique among different people. An example is presented in Fig. 8 where it is compared the mouse velocity and the mouse acceleration of two groups of users for the same activity.
Another way of monitoring the user’s behaviour in human-computer interaction (HCI) is based on keystroke analysis presented in Fig. 9.
Attention performance
In addition to the behavioural features mentioned in the previous subsection, which describe the user’s interaction with the electronic devices, the system also records the applications used by the user. For this, it records the date and time each user moved to another application, registering the user ID, timestamp, and application name. By default, applications that are not considered related to pre-defined tasks count negatively for the quantification of attention. Regarding the level of attention the following features are monitored:
Activity Timer – time between the beginning and the end of the task; Main App. Total Time Usage – total time spent on an application in order to solve the task (i.e. the Adobe Photoshop app.); Main Application Percentage Usage – percentage of use of the application for task solver.
To do this we measure the amount of time, in each interval, that the user spends interacting with work-related applications. The algorithm thus needs knowledge about the domain to classify each application as belonging or not to the set of work-related applications. Whenever an application that does not match any of the known rules for the specific domain is found, the application name is saved so that the team manager can later decide if a new rule should or should not be created for it [9].

Mouse movement example.

Keyboard movement example.
The interest of researchers in the field of HCI and to analyze if the effects of stress, affective states, fatigue and other factors on the individual’s interaction patterns with the technological devices is increased. As when considering interactions between people, interactions between people and technological devices also have two channels: one transmits explicit messages (i.e. the actions we perform on the computer) while the other transmits implicit messages (i.e. how we do it). As research has been demonstrating, we perform actions differently according to our state. The inclusion of this type of information in HCI projects of the next generation is seen by many experts as the way to the production of true systems of human consciousness, able to understand and adapt to the state of the user at any moment [21].
The emotions of the user’s hand and by extension the movements of the computer mouse have a direct relationship with the psychological – sentimental condition of the user. To be more specific, the way by which the mouse is moved (orbit, speed, intervals of immobility, direction) can demonstrate the user’s condition [4,9].
The way a user type indicates his state of mind. Quickly pressing the keyboard can mean an altered emotional state such as anger or stress. On the other hand, taking too much time can mean sadness or fatigue. Typing dynamics, which measure the typing rhythms of an individual, have been the subject of considerable research in recent decades and their use for recognition of emotions has shown promising results [5,23].
For each student, the stored data are aggregated, summarized and sent in 5-minute intervals. The average value of the range is used. Figure 10 shows the type of information these resources provide. In this figure, the evolution of the attention performance of a certain user during the performance of an exam is presented using two of the resources: duration of the click and speed of the mouse. The duration of the click decreases until approximately the middle of the exam and then increases to an overall maximum. The mouse speed increases to approximately the same point in time and then begins to decrease. Both features point to an initial improvement in attention performance (faster clicks and increased mouse speed), followed by degradation. This figure reveals a classic effect of stress: attention tends to improve for some time after the beginning of the stressor stimulus (eustress), after experiencing a drop in attention. These features allow for an individualized view of how stress affects each user, and each point of interruption, behaviour, or overall attention performance can be planned under stress.

Real-time performance: evolution of one student’s interaction performance during the exam. Upper: click duration. Down: mouse velocity.
The identification of learning style depends on the type of task performed. In addition, it is also necessary to take into account the level of attention in this task and to make a qualitative evaluation of the responses given, in order to evaluate the performance of the user. For the task results are the most accurate possible, it is necessary that data is constantly stored. Then it generates the set of data that will be used in the identification algorithm. From there, the system executes the dataset in the algorithm and analyzes the results. Subsequently, the data is stored in the database. Finally, the data is manipulated and the data set is analyzed in order to identify the learning style [11].
The level of attention of the user, the type of task, and the evaluation of the task will indicate the learning style of a user. An example is presented in Fig. 11.

Data set example.
In order to validate the proposed system, we have implemented a study that took place at the High School of Caldas das Taipas, located in the north of Portugal where assessment activities take place on the computer and laptops. In this kind of assessment activities, when students enter in the room, they have their computer or laptop. Each computer or laptop has a keyboard, a mouse, and a screen. For this purpose, a group of volunteer students (9 girls and 13 boys) from the last year of the high school vocational course, whose average age was 17.6 years (
The assessment activity begins at the same time for all students and they log in the pre-defined software using their personal credentials and the activity begins. The experiment took place in four different multimedia lessons, with a maximum of 100 minutes to complete the assessment a task using Adobe Photoshop application. In each assessment, different activity styles for the same subject were applied, where the first lesson was based on exercises related to video editing; the second lesson, the proposed task was related to image editing; in the third lesson, the proposed task was related to text editing; and in the fourth lesson, the proposed task was related to audio editing.
All involved participants presented computing proficiency and the rooms were equipped with similar computers, where each participant was randomly assigned to a computer. Information regarding each assessment’s duration is presented in Table 1.
In addition to the biometrics features captured each case study was labelled with the respective activity (i.e. video, image, text and audio editing). Moreover, based on the biometric features recorded from different soft sensors, the distribution of each feature (e.g. mean, median, standard deviation, etc.) are presented in different scales. In order to solve this problem, it was necessary to apply features scaling (i.e. normalisation techniques). In this study, the two methods used were max-min normalisation and Z-score normalisation.
Min–max normalisation technique is a normalisation strategy which linearly scales a feature value to the range
With this, several machine learning categorisation methods were used to predict the student’s activity, through the analysis of his/her behaviour in HCI. Several classifiers were trained and tested in order to determine the most efficient method to categorise the student’s activity, where the most applied methods in the scientific literature were taken into account. The set of classification methods trained and tested were: Support Vector Machine, Nearest Neighbour, Naive Bayes, Neural Network and Random Forest.
Summary of the characteristics of each assess activity
Summary of the characteristics of each assess activity
As for the validation method, a split validation method was used in order to determine the classification attention performance, where 2/3 of the study cases were used for training the classifiers while the remaining 1/3 was used to test it. Table 2 presents the set of results for the attention performance of the classifier.
Looking upon the outcome, some conclusion can be taken into account: (a) according to the trained and tested classification methods, the Random Forest method presents overall the best performance, with a correct percentage of classifications of 87.5%, while Support Vector Machine presents the worst performance; (b) through the application of Feature Scaling techniques, the performance of the applied classifiers showed an improvement between [6.25%–25%], where the greatest improvement is verified in the neural network classifier; and (c) the performance of the classifiers is dependent on the quality of the features and the total number of case studies analysed (i.e. 48 case studies).
As for the number of leaves/features for each decision tree, 9 was the number that presented the best performance. Moreover, the features relevance of the model is presented, where the Activity Timer (all time) is by far the most important one to predict the student’s activity, followed by the Duration Distance Clicks (ddc), Time Between Keys (tbk), Key Down Time (kdt), Distance Point to Line Between Click (dplbc) and Distance Between Click (dbc).
The respective confusion matrix of the Random Forest’s model is presented in Table 3, where only the Video activity presents an misclassification of 40% of total cases (i.e. 2/5 cases were misclassified as Audio editing activity).
Comparative analysis of machine learning categorisation performance
Random forest: confusion matrix
Given the set of conclusions, the Random Forest method was selected to categorise the student’s activity. Additionally, this model was optimised through the application of hyper-parameter optimisation. In other words, in order to optimise the Random Forest’s classification performance, it was required to find the optimal number of leafs/features (i.e. between [1–11] leafs/features) and number of trees (in this study it was modelled between [1–500] trees) that best suit the model and minimises the validation error function. For this, an exhaustive grid search was used, where it is trained and validated a set of parameter values for each model.
In the end, the model with the lowest error rate was the selected one. Figure 12 presents the set of results from this process, where it shows that the model displays an average minimised error when the number of decision trees is 80.

Random forest classifier analysis.

Data set example.
To analyze the students’ behaviour throughout the assessment lesson activity and, more precisely, the evolution of student’s attention and performance, we have obtained these results. After applying a survey to the students, they indicated that their preferred style was: for 70 percent image and for 30 percent was video. Based on this result and in the results obtained in the four assessment lessons, presented in Fig. 13. We can conclude that: (1) the time spent in the application defined by the system has the higher level for lesson assessment in the video, which is the preferred style only for 30 percent of the users. In the preferred style of most of the users, image, they have a level of attention of 91,36, lowest average. (2) In case of mouse velocity, the higher level was for the assessment level of video and image. Based on the preparation process mentioned, we now focus on the relevance of the features analyzed for the prediction of the user’s learning method preference. With this work step, it is intended: on the one hand, to identify and remove unnecessary, irrelevant and redundant features of the dataset that do not contribute to the prediction of the student’s learning method preference; on the other hand, to verify, within the relevant features, those that present greater relevance, making it possible to order them by degree of importance.
For this study, the filter methods were disregarded, since the lack of direct correlation between a feature and the class (i.e. the student’s learning preference) is not enough evidence that the respective feature is not important in conjunction with other features. As such, for the selection of most relevant features, the Boruta technique (a wrapper method built around the random forest classification algorithm) was applied along with the 10-fold cross-validation process.
Conclusions and future work
This work is the first part of an ITS system. At this time, we have developed only the student model of the ITS system. A noninvasive and non-intrusive approach to an ITS is proposed based on the biometric analysis of work behaviour during different classes. More specifically, the system monitors and analyzes mouse dynamics, keyboard dynamics, and tasks in order to determine user performance. Several machine learning categorization models are presented (presented in Section 5 and in Section 6). The Random Forest model presented the best categorization performance, with a success rate of 87.5%, where erroneously classified cases were focused on video editing activity.
Based on the activities, we can reach: (a) for activity times less than 75 minutes, the most categorized style is text; (b) for activity over 90 minutes, the best categorized styles are image (60%) and audio (40%); and (c) for activity between 75 minutes and 90 minutes, the best-categorized style is video (80%) and audio (20%). In addition, mouse dynamics and keyboard interaction dynamics slightly support the machine learning model to determine users’ activities.
As future work, still in the development of the student model, the research will be focused on: (a) increasing the number of case studies available for analysis; (b) increase the number of quality resources that would allow better monitoring of the user’s attention performance; (c) detailed analysis of characteristics that influence user performance (eg through correlation of users’ endnotes with biometric behaviors); and (d) definition of different user profiles to improve the adaptive learning mechanisms of the platform. Furthermore, it is necessary to develop other models for complete ITS.
Footnotes
Acknowledgement
This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2019.
