Abstract
Accurate automated eye movement classification is a key technique in the field of eye tracking, and the identification effects are largely influenced by the noise and imprecision of an eye tracker, and the characteristics of eye movements. In this paper, we propose a novel segmentation and clustering-based identification algorithm (I-SC) to effectively recognize fixations, saccades and smooth pursuits in eye tracking. In the proposed algorithm, firstly we employ the velocity feature in the recorded eye data to identify the saccade segments, and then the standard deviation of the dispersion is used to divide the remaining data into segments. Finally, in each segment we define the average direct distance feature and adopt the method of clustering by fast search and find of density peaks (CFSFDP) to classify fixations and smooth pursuits. To demonstrate its effectiveness and robustness, the proposed I-SC algorithm is evaluated with the eye tracking dataset sampled from 11 participants by a commercial eye tracker. The experimental results show that the proposed mechanism can achieve up to an accuracy of 96.0% and a recall of 87.6%, which is a considerably better performance than both the Velocity and Dispersion Threshold Identification (I-VDT) algorithm and the Convolutional Neural Networks (CNN) algorithm. With our mechanism, accurate classification can be achieved even with the noise and imprecision of data from eye trackers.
Introduction
With the wide applications of eye trackers in different industries such as visual behavior analysis, medical diagnosis, and human-computer interaction, eye movement identification becomes extremely important in the field of eye tracking. There are six basic types of eye movements: fixations, saccades, smooth pursuits, optokinetic reflexes, vestibulo-ocular reflexes, and vergences [1]. Among them, fixations, saccades, and smooth pursuits are the three most common eye movement states, which have been widely studied [2, 7, 12, 19]. Fixation is the maintenance of the gaze in a constant direction, which has been widely used in human-computer interaction, medical diagnosis (e.g., autism diagnosis), and user attention analysis [3, 4]. Saccade is a quick movement between two fixations, which is an important indicator to study human brain, user behavior, etc.[26]. Smooth pursuit allows eyes to closely follow a moving object. Nowadays, smooth pursuits are also increasingly used in human-computer interaction, since it is more natural for users to interact with computers by smooth pursuits than fixations[5]. In addition, smooth pursuit is vital for diagnosing Parkinson’s disease[6]. Therefore, it is important and necessary to perform accurate eye movements detection. Due to the noise and the imprecision of eye trackers and the characteristics of different eye movements [2], it is exceedingly difficult to effectively identify smooth pursuits, fixations and saccades.
A lot of previous studies only focused on distinguishing fixations and saccades, since in early times static stimuli were more commonly used, such as pictures and texts [7, 8, 9, 10]. With dynamic stimuli like video clips becoming more and more popular, nowadays researchers are more interested in the joint identification of fixations, saccades and smooth pursuits [2, 11, 12, 13, 14, 16]. Existing eye movement classification methods can be roughly divided into four categories: threshold-based methods, hybrid methods, probability-based methods, and learning-based methods. The threshold-based methods which include principal component analysis identification (I-PCA), velocity and velocity threshold identification (I-VVT), velocity and movement pattern identification (I-VMP), and velocity and dispersion threshold identification (I-VDT) usually use two fixed thresholds to distinguish those eye movement types. Berg et al. proposed the I-PCA algorithm, but the performance of the algorithm was not evaluated in detail[18]. Komogortsev et al. compared three threshold-based methods (I-VVT, I-VDT, and I-VMP) and found that the I-VDT method can achieve the best performance in high-speed eye trackers[2]. Since the performance varies with the thresholds, these methods require careful configuration of the thresholds. To mitigate the impact of changes in thresholds on the results, the mixture of threshold-based and probability-based methods, called hybrid methods, were proposed. Kasneci et al. used a Bayesian Mixture Model to detect saccades from data points, and then employed I-PCA to identify smooth pursuits [13]. Larsson et al. employed the Rayleigh test for preliminary segmentation and then defined four features to distinguish fixations and smooth pursuits [12, 14]. Furthermore, Larsson et al. used binocular data analysis to improve the accuracy of eye movement classification [19]. Generally the accuracy of both the threshold-based methods and hybrid methods depends on the setting of threshold values. To avoid this disadvantage, Santini et al. proposed a completely probability-based method, the Bayesian Decision Theory Identification (I-BDT) algorithm [16]. The probability-based methods need some assumptions. For example, fixations and saccades have the same probabilities, or these eye movements obey the Rayleigh distribution which are sensitive to tremors and microsaccades. More recently, to achieve one time classification without thresholds, Hoppe et al. proposed a learning-based method which was based on convolutional neural networks (CNN) [20]. Zemblys et al. used random forest machine learning technique to detect fixations, saccades, and post-saccadic oscillations (PSOs), and the authors showed that the proposed method can achieve the same performance just as human experts [25]. However, The learning-based methods need large training data, and the identification accuracy is not high for noisy and low sampling rate data.
In this paper, we propose an efficient algorithm based on segmentation and clustering for the classification of ternary eye movement types. The proposed algorithm includes four steps. Firstly, preprocessing removes four kinds of inaccurate data. Secondly, the velocity feature of the eye data is used to distinguish saccade segments. Thirdly, the spatial standard deviation feature is adopted to divide the remaining data into different segments. Finally, a fast clustering algorithm is employed to achieve automatic classification with the average direct distance feature of segment. The proposed algorithm takes the advantage of the continuity feature of human eye movements, and takes into account the influence of the eye tracker’s system noise and imprecision on the eye data to improve the classification effects in low rate eye trackers. With a new feature for segment defined by the characteristics of eye movements, a fast clustering algorithm is employed to distinguish fixations and smooth pursuits without thresholds.
The paper is outlined as follows: the proposed algorithm and the evaluation procedures are presented in Section 2. A description of the dataset with eye movement signals is given in Section 3. The results are presented in Section 4, and finally, the algorithm and its potential are discussed in Section 5.
Methods
The raw eye data are classified into three eye movement types by domain experts and I-SC algorithm, shown in Fig. 1. The results of domain experts are used as ground truth, and the performance of I-SC algorithm is compared with the ground truth. The procedures of the proposed I-SC algorithm is shown in the upper part of Fig. 1, including four steps. The detailed descriptions of each step are as follows.
The I-SC algorithm implementation diagram.
In the stage of preprocessing, four kinds of inaccurate data are removed, including blinks, screen outliers, beginning of the experiment and one-sample spikes [21]. The raw data generated by the eye tracker in this experiment contain the sequences of gaze coordinates
Among the collected data, coordinates of some points are beyond the screen display area. Those points are defined as screen outliers and discarded. At the beginning of each test, the participant will have a short time to get used to the experimental environment, so the data collected at the very beginning period are unreliable. In our experiment, the data sampled from the first second are abandoned in each test. Moreover, in video-based eye trackers one-sample spikes appear when the corneal reflex or pupil of the eye image can not be detected correctly, which leads to the rapid change of point coordinates in an unexpected direction and back again. One-sample spikes can be filtered by the heuristic filtering rule proposed in [24].
Preliminary segmentation for saccades
The purpose of this step is to detect the saccade segments. One saccade is a very rapid eye movement with the typical velocity ranging from 30 to 500 deg/s and the duration ranging from 30 to 80 ms [1]. The velocity feature of point
where
In the preliminary segmentation procedure, we find out all high speed points with the velocity feature greater than 40
After removing the saccade segments, we continue to divide the remaining data into different segments according to the standard deviation of the dispersion feature. The dispersion feature
where
where
The eye movement changes with the dispersion feature and the standard deviation of the dispersion feature. 
Feature of segment
A new feature is defined to describe the segment direction and speed feature, that is, to compute the average velocity vector between the starting and ending positions of the segment. The segment is defined as
where
Feature
In the first step, for each segmental feature
where
For the point with the highest density, we take
In the second step, according to the local density
An example of the decision graph using our dataset.
In the third step, the non-clustered center segments are classified into one cluster center class, and the segment of the cluster whose density is higher than border region
Finally, according to the results from the third step, we can determine whether the segment is the fixation or the smooth pursuit.
The pseudo code of I-SC algorithm is concluded in Algorithm 1.
Segmentation and clustering-based identification algorithm (I-SC)[1] array of gaze coordinates –
Calculate point-to-point velocity feature
Mark all points
Mark the time interval between segmentation points
Initialize temporal window over first points in the remaining eye data;
Calculate dispersion feature of points in window and its standard deviation;
Mark all points
the time interval between segmentation points
Mark this segments as saccade segments;
Calculate
Calculate the
Form a decision map according to
Cluster the remaining segements into two classes
the mean
Mark
Mark
saccades, fixations, and smooth pursuits;
To evaluate the classification performance of the proposed algorithm, we designed the experiment to cover a wide range of induced as well as natural eye movements. In the experiment, both static and dynamic stimuli were considered. The static stimuli mainly produced fixation points, and dynamic stimuli generated smooth pursuit movement points. In order to attract the attention of the participants, we used a small flashing orange dot with 10 px radius in the black background as the stimulus. Before the experiment, static stimuli had been displayed on the screen as shown in Fig. 4a so that the participants could quickly get used to the experiment environment. There were four forms of stimuli: fixed flashing dots, straight trail, circular trail and Archimedes spiral trail. Most fixation data were produced when users watched the fixed flashing dots. Saccade data were produced when the flashing dot random switched to the next fixed point. Straight trail, circular trail and Archimedes spiral trail could produce smooth pursuit eye movements. The speed of these trails were range from 7 to 30
We used a low cost and low accuracy Tobii remote eye tracker to record gaze data at 60 Hz [23]. All participants sat comfortably at a distance of about 65 cm away from the screen. The eye tracker allowed participants to move their heads naturally. The stimulus software is developed in C++ with tobii sdk, which can save gaze data and timestamps. The display was a 19-inch Philips 190V1SB/93 with the resolution of 1440
As shown in Table 1, the experimental data were sampled from 11 participants, four woman and seven men, whose ages ranged from 20 to 60. Among them seven participants wore glasses. Firstly, we explained to those participants about the experimental notes and the stimuli. Then, participants were calibrated using a 9-points calibration provided by Tobii software. Finally, the assistant started the experiment software that automatically guided participants to test a sequence of stimuli. The dataset was manually labeled by two experts with eight years of eye-tracking experience (author PHM, WJN), who manually tagged raw data into fixations, saccades and smooth pursuits, which were used as ground truth. The average noise level of the dataset was 0.1837
Overview of the participants in the dataset
Overview of the participants in the dataset
The stimuli in the experiment.
The performance of our algorithm was compared with three approaches, the I-BDT algorithm, the I-VDT algorithm, and the CNN algorithm. The MATLAB code of the proposed I-BDT algorithm in [16] is available at [17]. The I-VDT algorithm was re-implemented according to the pseudo code in [2]. To adapt to our dataset, the parameters of the I-VDT algorithm were set as follows. The velocity threshold was
The overall architecture of the CNN algorithm is shown in Fig. 5, which we referred to the design of[20]. The CNN model in Fig. 5 consists of six layers, including the input layer, two convolution layers, max pooling layer, fully-connected layer, and the output layer. We adopted the gaze coordinates
Figure 6 shows an example of the classification results of fixations, saccades and smooth pursuits with four different algorithms based on the same data obtained from our experiment. The four subfigures in Fig. 6 are the results from the proposed I-SC algorithm, the I-VDT algorithm, the I-BDT algorithm and the CNN algorithm, respectively. In each subfigure, the abscissa is the time and the ordinate is the velocity magnitude in
Settings for intrinsic parameters for the proposed algorithm.
Settings for intrinsic parameters for the proposed algorithm.
The detailed structure of the CNN algorithm.
It can be roughly concluded that the points with high velocity value are mostly saccades, and the points with little fluctuation on velocity are usually fixations. Moreover, the smooth pursuits process often have higher average velocity level and wider fluctuation range than fixations. By comparing the manual and the algorithmatic results, we find that the I-SC algorithm can achieve the best consistency with the ground truth. However, some misclassification happens at the beginning or the end part of the smooth pursuits and saccades. Compared with the I-SC algorithm, the classification results of the I-VDT algorithm and the CNN algorithm have poor consistency with the ground truth. As we can see in Fig. 6b–d, the I-VDT algorithm and the CNN algorithm would treat the fixation points with wide fluctuation range as smooth pursuits, and misjudge the smooth pursuit points with little fluctuation as fixations. The saccades detected by the I-VDT algorithm are less than the ground truth. As for the I-BDT algorithm, it identifies all eye movements as smooth pursuits.
An example of fixations, saccades and smooth pursuit movements detected.
The performance is measured with four metrics per movement class, namely recall, precision, accuracy and F1 score.
According to the above experimental analysis, the performance of the I-SC, I-VDT and CNN algorithm was compared with that of accuracy, precision, recall, and F1 score. Table 3 shows the classification performance of I-SC, I-VDT and CNN. Obviously, the performance of I-SC is better than I-VDT in fixation and smooth pursuit. The performance measures of I-SC on saccades are not as good as its performance on fixations and smooth pursuits. The accuracy, recall and F1_score of I-SC are better than those of I-VDT, especially the recall. However, the precision of I-VDT on saccade is better than I-SC. Because I-VDT ignored many saccade points, and the detected saccade points often have high precision rate. As shown in Fig. 6 and Table 3, the performance of the I-SC algorithm is better than the other three algorithms.
To analyze the tolerance of our algorithm with noise data, we added different gaussian white noises to our dataset. The performance of I-SC and I-VDT algorithm under different noise levels is shown in Fig. 7. The left subfigure is the results of fixations, the middle one and the right one are the saccades and smooth pursuit. In each subfigure, the star represents I-SC algorithm and the square is I-VDT algorithm. The red line represents accuracy, the green is precision, the blue is recall, and the black is F1 score. The abscissa is the signal-to-noise ratio (SNR). As shown in Fig. 7, the fixations’ results of I-SC algorithm is superior to IVDT. The I-VDT is better than I-SC in saccades, but the recall is far lower than I-SC. The three indicators(accuracy, recall and F1 score) of I-SC in smooth pursuits are far superior to that of I-VDT. When the SRN is greater than 47, our method is better than I-VDT in the recall. From the above results, we can see that the performance of the I-SC algorithm with noise is better than I-VDT algorithm.
Classification performance of I-SC, I-VDT and CNN based on our dataset
The performance of I-SC and IVDT algorithm with different noise.
An algorithm for classifying fixations, saccades and smooth pursuits is developed for low precision eye trackers. In order to perform the discrimination, the algorithm takes the continuity and burstiness features of the eye movements into account, adopts the the standard deviation of the dispersion to characterize the burstiness feature, and applies the segmental feature to characterizing the continuity feature. The algorithm is evaluated with sampled data from both static and dynamic stimuli, and compared with the I-VDT, CNN and I-BDT algorithm. We believe that the proposed I-SC algorithm is effective for eye data with noise.
According to the performance measures of I-SC algorithm in Table 3, the fixation and smooth pursuit classification performance is better than saccade in the recall and precision metrics. Saccade misclassification occur after a significant eye movement, which is similar to the smooth pursuit or fixation data. From the experimental point of view, when participants are following stimulating points from one point to another, their eyes would firstly jump to an area around the destination point, and then accurately gaze at the point. Thus the gaze data from the second process would submerge in the noise or are similar to the smooth pursuit data, due to their short distances. These gaze data are classified as saccade data by the experts. Compared with the I-VDT and CNN algorithm, the proposed I-SC algorithm has a much higher recall, but the the precision of saccade classification is worse than I-VDT. That means I-VDT can only treat the large speed point as saccades, and ignore a lot of saccade data. Moreover, the fixation points with large noise are judged to be smooth pursuits, while the smooth pursuits with low dispersion are misjudged to be fixations in both CNN and I-VDT. Unfortunately the I-BDT identifies all data as smooth pursuits, because the feature
It should be noted that the proposed I-SC algorithm has the following limitations. Firstly, saccade detection and the standard deviation of the dispersion feature need thresholds, which will affect the results. In future work we can use the learning method based on some training data to automatically get the threshold. The second limitation is that the segmentation is not divided ideally where the smooth pursuit is changing into a fixation, which can be solved by finding new features.
Early studies focused on the distinction fixations and saccades. With the dynamic stimuli becoming more popular and the demands for disease diagnosis increasing, the identification of smooth pursuit becomes more and more important. The current classification methods for different eye movements mainly use high frame rate and high-accuracy eye trackers. With the increasing popularity of eye tracker application, how to achieve accurate classification for low-cost and low-accuracy eye trackers with natural movements is of great significance. The proposed I-SC algorithm can be used in many scenarios, such as the cognitive psychology, product availability assessment, attention analysis for commercial advertisements and disease diagnosis.
Conclusions
In this paper, we propose and evaluate the segmentation and clustering based identification (I-SC) algorithm to recognize fixations, saccades, and smooth pursuits with a consumer eye tracker-tobii. The algorithm takes the continuity and burstiness of eye movements into account. By using the standard deviation of the dispersion feature, the burstiness of eye movement can be reflected. The continuity of eye movement is indicated by the segmental feature. Clustering algorithm achieves the distinguishing of fixation and smooth pursuit without thresholds. The proposed algorithm is proved to achieve better performance than the I-VDT algorithm and CNN algorithm, demonstrating its capability to provide meaningful ternary classification.
For future work, we are interested in analyzing additional features for segmentation to further improve the performance as well as evaluating the algorithm with higher-resolution eye trackers. Moreover, an important step to achieve fully automatic eye movement classification is the reliable detection of saccades without thresholds.
Footnotes
Acknowledgments
This research is supported by the National Science Foundation (91338115), National S/T Major Project (2015ZX03002006) and the 111 Project (B08038).
