Identification of fixations,saccades and smooth pursuits based on segmentation and clustering

Abstract

Accurate automated eye movement classification is a key technique in the field of eye tracking, and the identification effects are largely influenced by the noise and imprecision of an eye tracker, and the characteristics of eye movements. In this paper, we propose a novel segmentation and clustering-based identification algorithm (I-SC) to effectively recognize fixations, saccades and smooth pursuits in eye tracking. In the proposed algorithm, firstly we employ the velocity feature in the recorded eye data to identify the saccade segments, and then the standard deviation of the dispersion is used to divide the remaining data into segments. Finally, in each segment we define the average direct distance feature and adopt the method of clustering by fast search and find of density peaks (CFSFDP) to classify fixations and smooth pursuits. To demonstrate its effectiveness and robustness, the proposed I-SC algorithm is evaluated with the eye tracking dataset sampled from 11 participants by a commercial eye tracker. The experimental results show that the proposed mechanism can achieve up to an accuracy of 96.0% and a recall of 87.6%, which is a considerably better performance than both the Velocity and Dispersion Threshold Identification (I-VDT) algorithm and the Convolutional Neural Networks (CNN) algorithm. With our mechanism, accurate classification can be achieved even with the noise and imprecision of data from eye trackers.

Keywords

Eye movement identification classification smooth pursuit clustering

1. Introduction

With the wide applications of eye trackers in different industries such as visual behavior analysis, medical diagnosis, and human-computer interaction, eye movement identification becomes extremely important in the field of eye tracking. There are six basic types of eye movements: fixations, saccades, smooth pursuits, optokinetic reflexes, vestibulo-ocular reflexes, and vergences [1]. Among them, fixations, saccades, and smooth pursuits are the three most common eye movement states, which have been widely studied [2, 7, 12, 19]. Fixation is the maintenance of the gaze in a constant direction, which has been widely used in human-computer interaction, medical diagnosis (e.g., autism diagnosis), and user attention analysis [3, 4]. Saccade is a quick movement between two fixations, which is an important indicator to study human brain, user behavior, etc.[26]. Smooth pursuit allows eyes to closely follow a moving object. Nowadays, smooth pursuits are also increasingly used in human-computer interaction, since it is more natural for users to interact with computers by smooth pursuits than fixations[5]. In addition, smooth pursuit is vital for diagnosing Parkinson’s disease[6]. Therefore, it is important and necessary to perform accurate eye movements detection. Due to the noise and the imprecision of eye trackers and the characteristics of different eye movements [2], it is exceedingly difficult to effectively identify smooth pursuits, fixations and saccades.

A lot of previous studies only focused on distinguishing fixations and saccades, since in early times static stimuli were more commonly used, such as pictures and texts [7, 8, 9, 10]. With dynamic stimuli like video clips becoming more and more popular, nowadays researchers are more interested in the joint identification of fixations, saccades and smooth pursuits [2, 11, 12, 13, 14, 16]. Existing eye movement classification methods can be roughly divided into four categories: threshold-based methods, hybrid methods, probability-based methods, and learning-based methods. The threshold-based methods which include principal component analysis identification (I-PCA), velocity and velocity threshold identification (I-VVT), velocity and movement pattern identification (I-VMP), and velocity and dispersion threshold identification (I-VDT) usually use two fixed thresholds to distinguish those eye movement types. Berg et al. proposed the I-PCA algorithm, but the performance of the algorithm was not evaluated in detail[18]. Komogortsev et al. compared three threshold-based methods (I-VVT, I-VDT, and I-VMP) and found that the I-VDT method can achieve the best performance in high-speed eye trackers[2]. Since the performance varies with the thresholds, these methods require careful configuration of the thresholds. To mitigate the impact of changes in thresholds on the results, the mixture of threshold-based and probability-based methods, called hybrid methods, were proposed. Kasneci et al. used a Bayesian Mixture Model to detect saccades from data points, and then employed I-PCA to identify smooth pursuits [13]. Larsson et al. employed the Rayleigh test for preliminary segmentation and then defined four features to distinguish fixations and smooth pursuits [12, 14]. Furthermore, Larsson et al. used binocular data analysis to improve the accuracy of eye movement classification [19]. Generally the accuracy of both the threshold-based methods and hybrid methods depends on the setting of threshold values. To avoid this disadvantage, Santini et al. proposed a completely probability-based method, the Bayesian Decision Theory Identification (I-BDT) algorithm [16]. The probability-based methods need some assumptions. For example, fixations and saccades have the same probabilities, or these eye movements obey the Rayleigh distribution which are sensitive to tremors and microsaccades. More recently, to achieve one time classification without thresholds, Hoppe et al. proposed a learning-based method which was based on convolutional neural networks (CNN) [20]. Zemblys et al. used random forest machine learning technique to detect fixations, saccades, and post-saccadic oscillations (PSOs), and the authors showed that the proposed method can achieve the same performance just as human experts [25]. However, The learning-based methods need large training data, and the identification accuracy is not high for noisy and low sampling rate data.

In this paper, we propose an efficient algorithm based on segmentation and clustering for the classification of ternary eye movement types. The proposed algorithm includes four steps. Firstly, preprocessing removes four kinds of inaccurate data. Secondly, the velocity feature of the eye data is used to distinguish saccade segments. Thirdly, the spatial standard deviation feature is adopted to divide the remaining data into different segments. Finally, a fast clustering algorithm is employed to achieve automatic classification with the average direct distance feature of segment. The proposed algorithm takes the advantage of the continuity feature of human eye movements, and takes into account the influence of the eye tracker’s system noise and imprecision on the eye data to improve the classification effects in low rate eye trackers. With a new feature for segment defined by the characteristics of eye movements, a fast clustering algorithm is employed to distinguish fixations and smooth pursuits without thresholds.

The paper is outlined as follows: the proposed algorithm and the evaluation procedures are presented in Section 2. A description of the dataset with eye movement signals is given in Section 3. The results are presented in Section 4, and finally, the algorithm and its potential are discussed in Section 5.

2. Methods

The raw eye data are classified into three eye movement types by domain experts and I-SC algorithm, shown in Fig. 1. The results of domain experts are used as ground truth, and the performance of I-SC algorithm is compared with the ground truth. The procedures of the proposed I-SC algorithm is shown in the upper part of Fig. 1, including four steps. The detailed descriptions of each step are as follows.

Figure 1.

The I-SC algorithm implementation diagram.

2.1 Preprocessing

In the stage of preprocessing, four kinds of inaccurate data are removed, including blinks, screen outliers, beginning of the experiment and one-sample spikes [21]. The raw data generated by the eye tracker in this experiment contain the sequences of gaze coordinates $(x_{i},y_{i})$ in pixels and the timestamps $(t_{i})$ in milliseconds. During blinks, the video-based eye tracker can not detect the pupil, making the collected eye data inaccurate. According to the Harvard Database of Useful Biological Numbers, the duration of one blink is between 100 to 400 ms [15]. By analysing timestamps $(t_{i})$ , one blink action can be detected when the timestamps of the continuously missing data is greater than 100 ms, and then the two gaze points in front and behind the blink are also removed.

Among the collected data, coordinates of some points are beyond the screen display area. Those points are defined as screen outliers and discarded. At the beginning of each test, the participant will have a short time to get used to the experimental environment, so the data collected at the very beginning period are unreliable. In our experiment, the data sampled from the first second are abandoned in each test. Moreover, in video-based eye trackers one-sample spikes appear when the corneal reflex or pupil of the eye image can not be detected correctly, which leads to the rapid change of point coordinates in an unexpected direction and back again. One-sample spikes can be filtered by the heuristic filtering rule proposed in [24].

2.2 Preliminary segmentation for saccades

The purpose of this step is to detect the saccade segments. One saccade is a very rapid eye movement with the typical velocity ranging from 30 to 500 deg/s and the duration ranging from 30 to 80 ms [1]. The velocity feature of point $i$ is the measured eye movement speed between two neighbour points, defined as

$\displaystyle V_{i}=\arctan\left(\frac{\sqrt{(S_{x}^{i})^{2}+(S_{y}^{i})^{2}}% \times mp}{\textit{Dis}}\right)\times\frac{180}{\Delta t_{i}\times\pi},$ (1)

where $S_{x}^{i}=x_{i}-x_{i-1}$ , $S_{y}^{i}=y_{i}-y_{i-1}$ and $\Delta t_{i}=t_{i}-t_{i-1}$ . $S_{x}^{i}$ represents the distance at x axis between point at $t_{i}$ and point at $t_{i-1}$ , while $S_{y}^{i}$ represents the y axis distance, and $\Delta t_{i}$ is the time interval. The units of $x_{i}$ , $y_{i}$ , $S_{x}^{i}$ and $S_{y}^{i}$ are all pixels. mp is the conversion ratio between pixel and millimeter with the unit of mm/px, which depends on the screen resolution and physical screen dimensions. Dis is the distance between the participant’s eyes and the screen with unit mm.

In the preliminary segmentation procedure, we find out all high speed points with the velocity feature greater than 40 ${}^{\circ}$ /s [1]. If the time interval between two high speed points are less than 100 ms, then the data sequence between them are regarded as one saccade segment. Most saccade actions can be recognized in this step, and those saccade segments are removed.

2.3 Segmentation for remaining data

After removing the saccade segments, we continue to divide the remaining data into different segments according to the standard deviation of the dispersion feature. The dispersion feature $D_{j}$ reflects the space distribution of points in time window $j$ , defined as

$\displaystyle D_{j}=[\max(\{x_{k}\})-\min(\{x_{k}\})]+[\max(\{y_{k}\})-\min(\{% y_{k}\})],$ (2)

where $j$ is the re-ranked time indexes of the remaining data, and $\{(x_{k},y_{k}),t_{j}<=t_{k}<=t_{j}+t_{w}\}$ is the point coordinates within the time window from $t_{j}$ to $t_{j}+t_{w}$ . After removing the saccade segments, we calculate the dispersion feature $D_{j}$ of the remaining data, and then continue to analyze the standard deviation of the dispersion feature, denoted as $\sigma_{\textit{Dj}}$ . $\sigma_{\textit{Dj}}$ is defined as

$\displaystyle\sigma_{Dj}=\sqrt{\frac{\sum_{j}^{j+N-1}(D_{j}-\bar{D})^{2}}{N}}$ (3)

where $N$ is the length of $D_{j}$ window, and $\bar{D}$ is the mean of $D_{j}$ in the window. The standard deviation of dispersion feature can reflect the spatial distribution of gaze points. In our scheme, we pick up those points with the standard deviation larger than 15 px. Here we give an ideal example in Fig. 2 to show that with $\sigma_{\textit{Dj}}$ eye movement change points can be detected. Figure 2a shows the ideal gaze points from one fixation to another, with saccades between the two fixations. Figure 2b shows the ideal gaze points from one fixation to smooth pursuit. The two subfigures have the same legends that the orange line is the raw data, the yellow line is the dispersion feature and the blue line is the standard deviation of dispersion feature. However, the points can not be used for segment division due to the fact that the change points can not totally reflect the changes of gaze points, such as the falling edge of $D_{j}$ in Fig. 2a. So the method for extracting features takes the starting position and removes the falling edge of the window. Finally, we use the extracted points to segment the remaining data. These segments are further analyzed in the fourth step.

Figure 2.

The eye movement changes with the dispersion feature and the standard deviation of the dispersion feature.

2.4 Segment clustering

2.4.1 Feature of segment

A new feature is defined to describe the segment direction and speed feature, that is, to compute the average velocity vector between the starting and ending positions of the segment. The segment is defined as $S_{i}=\{x_{j},y_{j}\}$ with length $N_{s}$ . The new feature is represented as

$\displaystyle Sd_{i}=\frac{\sqrt{\left(\sum\limits_{j\in S_{i}}S_{x}^{j}\right% )^{2}+\left(\sum\limits_{j\in S_{i}}S_{y}^{j}\right)^{2}}}{N_{s}}$ (4)

where $S_{x}^{j}=x_{j}-x_{j-1}$ , $S_{y}^{j}=y_{j}-y_{j-1}$ . When the value of $Sd_{i}$ is small, the segment is considered to be the fixations, even noise exists in the eye tracker. However, the directional consistency of the smooth pursuit is better than the fixation, so the $Sd_{i}$ values of smooth pursuits are always greater than fixations. In addition, most saccades have been removed in the second step. Nevertheless, the segment whose length is less than the shortest time of fixation (100 ms) is treated as a saccade segment. Therefore, $Sd_{i}$ is a good feature to distinguish fixation and smooth pursuit.

2.4.2 Clustering

Feature $Sd_{i}$ is affected by noise, and the fixed threshold for fixations and smooth pursuits classification will cause misclassification. To overcome this problem, the clustering by fast search and find of density peaks (CFSFDP) is used to classify fixation and smooth pursuit without threshold, since the value of smooth pursuit $Sd_{i}$ is always greater than fixation [22]. The CFSFDP algorithm consists of the following steps.

In the first step, for each segmental feature $Sd_{i}$ , we compute two quantities: its local density $\rho_{i}$ and its distance from segments of higher density $\delta_{i}$ . Both these quantities depend only on the distances between segments, which are assumed to satisfy the triangular inequality. The distance between segment $i$ and segment $j$ is defined as $d_{ij}=\textit{dist}(Sd_{i},Sd_{j})$ , where dist is the Euclidean distance between segments. The local density of segment $i$ , denoted as $\rho_{i}$ , is defined as

$\displaystyle\rho_{i}=\sum\nolimits_{j}e^{-\left(\frac{d_{ij}}{d_{c}}\right)^{% 2}},$ (5)

where $d_{c}$ is the cutoff distance. $\delta_{i}$ is measured by computing the minimum distance between point $i$ and any other points with higher densities, which is expressed as

$\displaystyle\delta_{i}=\mathop{\min}\limits_{j:\rho_{j}>\rho_{i}}(d_{ij}).$ (6)

For the point with the highest density, we take $\delta_{i}={\max}_{j}(d_{ij})$ .

In the second step, according to the local density $\rho_{i}$ and distance $\delta_{i}$ , we form a decision map to determine the cluster center, which is shown in Fig. 3.

Figure 3.

An example of the decision graph using our dataset.

In the third step, the non-clustered center segments are classified into one cluster center class, and the segment of the cluster whose density is higher than border region $\rho_{b}$ is considered to be part of the cluster core. The others are considered part of the cluster halo, which are considered as noise.

Finally, according to the results from the third step, we can determine whether the segment is the fixation or the smooth pursuit.

The pseudo code of I-SC algorithm is concluded in Algorithm 1.

Segmentation and clustering-based identification algorithm (I-SC)[1] array of gaze coordinates – $(x_{i},y_{i})$ and timestamps – $t_{i}$ , minimum saccade amplitude – $T_{v}$ , minimum change of the dispersion – $T_{\textit{Dstd}}$ , temporal window length – $W_{L}$ , Clustering cutoff distance – $D_{c}$ ; array of fixations, saccades, and smooth pursuits.

Calculate point-to-point velocity feature $(V)$ for each point;

Mark all points $V\geqslant T_{v}$ as segmentation points;

Mark the time interval between segmentation points $\leqslant W_{L}$ as saccade segments;

Initialize temporal window over first points in the remaining eye data;

Calculate dispersion feature of points in window and its standard deviation;

Mark all points $\sigma_{Dj}\geqslant T_{\textit{Dstd}}$ as the segmentation points;

the time interval between segmentation points $\leqslant W_{L}$ ;

Mark this segments as saccade segments;

Calculate $Sd_{i}$ of the remaining segments;

Calculate the $d_{ij}$ , $\rho_{i}$ and $\delta_{i}$ of the remaining segments;

Form a decision map according to $\rho_{i}$ and $\delta_{i}$ and choose the cluster center from this map;

Cluster the remaining segements into two classes $C_{1},C_{2}$ ;

the mean $Sd_{i}$ of $C_{1}\leqslant C_{2}$ ;

Mark $C_{1}$ as fixation segments and $C_{2}$ as smooth pursuit segments;

Mark $C_{2}$ as fixation segments and $C_{1}$ as smooth pursuit segments;

saccades, fixations, and smooth pursuits;

3. Experiment and dataset

To evaluate the classification performance of the proposed algorithm, we designed the experiment to cover a wide range of induced as well as natural eye movements. In the experiment, both static and dynamic stimuli were considered. The static stimuli mainly produced fixation points, and dynamic stimuli generated smooth pursuit movement points. In order to attract the attention of the participants, we used a small flashing orange dot with 10 px radius in the black background as the stimulus. Before the experiment, static stimuli had been displayed on the screen as shown in Fig. 4a so that the participants could quickly get used to the experiment environment. There were four forms of stimuli: fixed flashing dots, straight trail, circular trail and Archimedes spiral trail. Most fixation data were produced when users watched the fixed flashing dots. Saccade data were produced when the flashing dot random switched to the next fixed point. Straight trail, circular trail and Archimedes spiral trail could produce smooth pursuit eye movements. The speed of these trails were range from 7 to 30 ${}^{\circ}$ /s. Figure 4b shows different trajectories used in the experiment.

We used a low cost and low accuracy Tobii remote eye tracker to record gaze data at 60 Hz [23]. All participants sat comfortably at a distance of about 65 cm away from the screen. The eye tracker allowed participants to move their heads naturally. The stimulus software is developed in C++ with tobii sdk, which can save gaze data and timestamps. The display was a 19-inch Philips 190V1SB/93 with the resolution of 1440 $\times$ 900 px and the refresh rate of 60 Hz.

As shown in Table 1, the experimental data were sampled from 11 participants, four woman and seven men, whose ages ranged from 20 to 60. Among them seven participants wore glasses. Firstly, we explained to those participants about the experimental notes and the stimuli. Then, participants were calibrated using a 9-points calibration provided by Tobii software. Finally, the assistant started the experiment software that automatically guided participants to test a sequence of stimuli. The dataset was manually labeled by two experts with eight years of eye-tracking experience (author PHM, WJN), who manually tagged raw data into fixations, saccades and smooth pursuits, which were used as ground truth. The average noise level of the dataset was 0.1837 $\pm$ 0.1106 ${}^{\circ}$ RMS.

Table 1
Overview of the participants in the dataset

Participant	Gender	Age	Glasses	Eye tracking experience
1	Female	24	Yes	Yes
2	Male	30	Yes	Yes
3	Male	38	No	No
4	Female	20	Yes	No
5	Male	60	No	Yes
6	Male	24	Yes	No
7	Male	27	No	Yes
8	Female	28	Yes	Yes
9	Male	25	No	Yes
10	Male	25	Yes	No
11	Female	26	Yes	No

Figure 4.

The stimuli in the experiment.

4. Results

The performance of our algorithm was compared with three approaches, the I-BDT algorithm, the I-VDT algorithm, and the CNN algorithm. The MATLAB code of the proposed I-BDT algorithm in [16] is available at [17]. The I-VDT algorithm was re-implemented according to the pseudo code in [2]. To adapt to our dataset, the parameters of the I-VDT algorithm were set as follows. The velocity threshold was ${T_{v}}=$ 40 ${}^{\circ}$ /s, the temporal window’s length was ${T_{w}}=$ 100 ms, and the dispersion threshold was ${T_{D}}=$ 20 px. The parameter settings of the proposed I-SC algorithm can be found in Table 2. The values of ${T_{v}}$ and ${W_{L}}$ were the same as the I-VDT algorithm. ${D_{c}}$ was the clustering cutoff distance, and was set as 10% of the total number of segments $N_{\textit{seg}}$ .

The overall architecture of the CNN algorithm is shown in Fig. 5, which we referred to the design of[20]. The CNN model in Fig. 5 consists of six layers, including the input layer, two convolution layers, max pooling layer, fully-connected layer, and the output layer. We adopted the gaze coordinates $(x_{i},y_{i})$ and the velocity feature $(V_{i})$ as the CNN input data. In the output layer, three eye movement types were classified by the highest probability which was generated by the softmax function.

Figure 6 shows an example of the classification results of fixations, saccades and smooth pursuits with four different algorithms based on the same data obtained from our experiment. The four subfigures in Fig. 6 are the results from the proposed I-SC algorithm, the I-VDT algorithm, the I-BDT algorithm and the CNN algorithm, respectively. In each subfigure, the abscissa is the time and the ordinate is the velocity magnitude in ${}^{\circ}$ /s. In the upper part of the subfigures in Fig. 6, the blue line represents the speed feature of the gaze data, and the lower part of each graph lists the three recognized eye movement types classified by experts’ manual process and the tested algorithms, respectively. The classification results from experts were treated as the ground truth, and were compared with the algorithms’ results to confirm their effectiveness. That means in the lower part of each subfigure, the first line is the ground truth of smooth pursuits and the second line is the smooth pursuits identified by each algorithm. The third and fourth lines are the fixations and the last two lines are the saccades.

Table 2
Settings for intrinsic parameters for the proposed algorithm.

${T_{v}}$	40 ${}^{\circ}$ /s	Minimum saccade amplitude
$T_{\textit{Dstd}}$	15 px	Minimum change of the dispersion
${W_{L}}$	100 ms	Windows length
${D_{c}}$	10% $\times$ $N_{\textit{seg}}$	Clustering cutoff distance

Figure 5.

The detailed structure of the CNN algorithm.

It can be roughly concluded that the points with high velocity value are mostly saccades, and the points with little fluctuation on velocity are usually fixations. Moreover, the smooth pursuits process often have higher average velocity level and wider fluctuation range than fixations. By comparing the manual and the algorithmatic results, we find that the I-SC algorithm can achieve the best consistency with the ground truth. However, some misclassification happens at the beginning or the end part of the smooth pursuits and saccades. Compared with the I-SC algorithm, the classification results of the I-VDT algorithm and the CNN algorithm have poor consistency with the ground truth. As we can see in Fig. 6b–d, the I-VDT algorithm and the CNN algorithm would treat the fixation points with wide fluctuation range as smooth pursuits, and misjudge the smooth pursuit points with little fluctuation as fixations. The saccades detected by the I-VDT algorithm are less than the ground truth. As for the I-BDT algorithm, it identifies all eye movements as smooth pursuits.

Figure 6.

An example of fixations, saccades and smooth pursuit movements detected.

The performance is measured with four metrics per movement class, namely recall, precision, accuracy and F1 score.

According to the above experimental analysis, the performance of the I-SC, I-VDT and CNN algorithm was compared with that of accuracy, precision, recall, and F1 score. Table 3 shows the classification performance of I-SC, I-VDT and CNN. Obviously, the performance of I-SC is better than I-VDT in fixation and smooth pursuit. The performance measures of I-SC on saccades are not as good as its performance on fixations and smooth pursuits. The accuracy, recall and F1_score of I-SC are better than those of I-VDT, especially the recall. However, the precision of I-VDT on saccade is better than I-SC. Because I-VDT ignored many saccade points, and the detected saccade points often have high precision rate. As shown in Fig. 6 and Table 3, the performance of the I-SC algorithm is better than the other three algorithms.

To analyze the tolerance of our algorithm with noise data, we added different gaussian white noises to our dataset. The performance of I-SC and I-VDT algorithm under different noise levels is shown in Fig. 7. The left subfigure is the results of fixations, the middle one and the right one are the saccades and smooth pursuit. In each subfigure, the star represents I-SC algorithm and the square is I-VDT algorithm. The red line represents accuracy, the green is precision, the blue is recall, and the black is F1 score. The abscissa is the signal-to-noise ratio (SNR). As shown in Fig. 7, the fixations’ results of I-SC algorithm is superior to IVDT. The I-VDT is better than I-SC in saccades, but the recall is far lower than I-SC. The three indicators(accuracy, recall and F1 score) of I-SC in smooth pursuits are far superior to that of I-VDT. When the SRN is greater than 47, our method is better than I-VDT in the recall. From the above results, we can see that the performance of the I-SC algorithm with noise is better than I-VDT algorithm.

Table 3

Classification performance of I-SC, I-VDT and CNN based on our dataset

Unit (%)	Fixation			Saccade			Smooth pursuit
	I-SC	I-VDT	CNN	I-SC	I-VDT	CNN	I-SC	I-VDT	CNN
Accuracy	95.7	84.6	84.6	96.5	95.5	94.9	95.9	82.1	84.7
Precision	94.5	83.5	85.8	74.5	84.4	73.7	96.7	78.5	79.3
Recall	97.4	87.8	83.7	72.0	42.0	37.6	93.5	79.2	88.2
F1_score	95.9	85.4	84.7	73.0	55.8	49.8	95.1	78.6	83.5

Figure 7.

The performance of I-SC and IVDT algorithm with different noise.

5. Discussion

An algorithm for classifying fixations, saccades and smooth pursuits is developed for low precision eye trackers. In order to perform the discrimination, the algorithm takes the continuity and burstiness features of the eye movements into account, adopts the the standard deviation of the dispersion to characterize the burstiness feature, and applies the segmental feature to characterizing the continuity feature. The algorithm is evaluated with sampled data from both static and dynamic stimuli, and compared with the I-VDT, CNN and I-BDT algorithm. We believe that the proposed I-SC algorithm is effective for eye data with noise.

According to the performance measures of I-SC algorithm in Table 3, the fixation and smooth pursuit classification performance is better than saccade in the recall and precision metrics. Saccade misclassification occur after a significant eye movement, which is similar to the smooth pursuit or fixation data. From the experimental point of view, when participants are following stimulating points from one point to another, their eyes would firstly jump to an area around the destination point, and then accurately gaze at the point. Thus the gaze data from the second process would submerge in the noise or are similar to the smooth pursuit data, due to their short distances. These gaze data are classified as saccade data by the experts. Compared with the I-VDT and CNN algorithm, the proposed I-SC algorithm has a much higher recall, but the the precision of saccade classification is worse than I-VDT. That means I-VDT can only treat the large speed point as saccades, and ignore a lot of saccade data. Moreover, the fixation points with large noise are judged to be smooth pursuits, while the smooth pursuits with low dispersion are misjudged to be fixations in both CNN and I-VDT. Unfortunately the I-BDT identifies all data as smooth pursuits, because the feature $r_{i}$ is always 1, and then the smooth pursuit likelihood $p(r_{i}|pur)$ is always 1. Therefore, all data are misjudged as smooth pursuits. Larsson’s algorithm is based on high frame rate and high precision eye trackers. Thus, it is not available for low-resolution eye trackers such as the one used in this work, mainly due to the fact that the preliminary segmentation stage in the Larsson’s algorithm depends on the hypothesis testing.

It should be noted that the proposed I-SC algorithm has the following limitations. Firstly, saccade detection and the standard deviation of the dispersion feature need thresholds, which will affect the results. In future work we can use the learning method based on some training data to automatically get the threshold. The second limitation is that the segmentation is not divided ideally where the smooth pursuit is changing into a fixation, which can be solved by finding new features.

Early studies focused on the distinction fixations and saccades. With the dynamic stimuli becoming more popular and the demands for disease diagnosis increasing, the identification of smooth pursuit becomes more and more important. The current classification methods for different eye movements mainly use high frame rate and high-accuracy eye trackers. With the increasing popularity of eye tracker application, how to achieve accurate classification for low-cost and low-accuracy eye trackers with natural movements is of great significance. The proposed I-SC algorithm can be used in many scenarios, such as the cognitive psychology, product availability assessment, attention analysis for commercial advertisements and disease diagnosis.

6. Conclusions

In this paper, we propose and evaluate the segmentation and clustering based identification (I-SC) algorithm to recognize fixations, saccades, and smooth pursuits with a consumer eye tracker-tobii. The algorithm takes the continuity and burstiness of eye movements into account. By using the standard deviation of the dispersion feature, the burstiness of eye movement can be reflected. The continuity of eye movement is indicated by the segmental feature. Clustering algorithm achieves the distinguishing of fixation and smooth pursuit without thresholds. The proposed algorithm is proved to achieve better performance than the I-VDT algorithm and CNN algorithm, demonstrating its capability to provide meaningful ternary classification.

For future work, we are interested in analyzing additional features for segmentation to further improve the performance as well as evaluating the algorithm with higher-resolution eye trackers. Moreover, an important step to achieve fully automatic eye movement classification is the reliable detection of saccades without thresholds.

Footnotes

Acknowledgments

This research is supported by the National Science Foundation (91338115), National S/T Major Project (2015ZX03002006) and the 111 Project (B08038).

References

Leigh

R.J.

and Zee

D.S.

, The neurology of eye movements, Oxford University Press, USA, 2015.

Komogortsev

O.V.

and Karpov

, Automated classification and scoring of smooth pursuit eye movements in the presence of fixations and saccades, Behavior Research Methods 45 (2013), 203–215.

Cantoni

and Porta

, Eye tracking as a computer input and interaction method, Proceedings of the 15th International Conference on Computer Systems and Technologies (2014), 1–12.

Istance

Hyrskykari

Immonen

Mansikkamaa

and Vickers

, Designing gaze gestures for gaming: an investigation of performance, Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications (2010), 323–330.

Pfeuffer

Vidal

Turner

Bulling

and Gellersen

, Pursuit calibration: Making gaze calibration less tedious and more flexible, Proceedings of the 26th annual ACM symposium on User interface software and technology (2013), 261–270.

Jansson

Medvedev

Axelson

and Nyholm

, Stochastic anomaly detection in eye-tracking data for quantification of motor symptoms in Parkinsons disease, Signal and Image Analysis for Biomedical and Life Sciences (2015), 63–82.

Salvucci

D.D.

and Goldberg

J.H.

, Identifying fixations and saccades in eye-tracking protocols, Proceedings of the 2000 symposium on Eye tracking research & applications (2000), 71–78.

Blignaut

, Fixation identification: The optimum threshold for a dispersion algorithm, Attention, Perception, & Psychophysics 71 (2009), 881–895.

Veneri

Piu

Rosini

Federighi

Federico

and Rufa

, Automatic eye fixations identification based on analysis of variance and covariance, Pattern Recognition Letters 32 (2011), 1588–1593.

10.

Anantrasirichai

Gilchrist

I.D.

and Bull

D.R.

, Fixation identification for low-sample-rate mobile eye trackers, Image Processing (ICIP), 2016 IEEE International Conference on (2016), 3126–3130.

11.

Vidal

Bulling

and Gellersen

, Detection of smooth pursuits using eye movement shape features, Proceedings of the symposium on eye tracking research and applications (2012), 177–180.

12.

Larsson

Nystro

and Stridh

, Discrimination of fixations and smooth pursuit movements in high-speed eye-tracking data, Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE (2014), 3797–3800.

13.

Kasneci

Kübler

T.C.

and Rosenstiel

, Online recognition of fixations, saccades, and smooth pursuits for automated analysis of traffic hazard perception, Artificial Neural Networks (2015), 411–434.

14.

Larsson

Nyström

Andersson

and Stridh

, Detection of fixations and smooth pursuit movements in high-speed eye-tracking data, Biomedical Signal Processing and Control 18 (2015), 145–152.

15.

Average duration of a single eye blink, http://bionumbers.hms.harvard.edu//bionumber.aspx?id=100706&ver=4/.

16.

Santini

Fuhl

Kübler

and Kasneci

, Bayesian identification of fixations, saccades, and smooth pursuits, Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications (2016), 163–170.

17.

https://www.ti.uni-tuebingen.de/perception/.

18.

Berg

D.J.

Boehnke

S.E.

Marino

R.A.

Munoz

D.P.

and Itti

, Free viewing of dynamic stimuli by humans and monkeys, Journal of Vision 9 (2009), 19.

19.

Larsson

Nyström

Ardö

Åström

and Stridh

, Smooth pursuit detection in binocular eye-tracking data with automatic video-based performance evaluation, Journal of Vision 16 (2016), 20.

20.

Hoppe

and Bulling

, End-to-end eye movement detection using convolutional neural networks, arXiv preprint arXiv:1609.02452, 2016.

21.

Larsson

Nyström

and Stridh

, Detection of saccades and postsaccadic oscillations in the presence of smooth pursuit, IEEE Transactions on Biomedical Engineering 60 (2013), 2484–2493.

22.

Rodriguez

and Laio

, Clustering by fast search and find of density peaks, Science 344 (2014), 1492–1496.

23.

https://www.tobiipro.com/product-listing/tobii-pro-x2-60/.

24.

Stampe

D.M.

, Heuristic filtering and reliable calibration methods for video-based pupil-tracking systems, Behavior Research Methods, Instruments, & Computers 25 (1993), 137–142.

25.

Zemblys

Niehorster

D.C.

Komogortsev

and Holmqvist

, Using machine learning to detect events in eye-tracking data, Behavior Research Methods (2017), 1–22.

26.

Wennmo

Henriksson

N.G.

Pyykkö

and Schalén

, Eye-velocity programming in brain-stem disorders, Annals of the New York Academy of Sciences 374 (1981), 774–783.

Identification of fixations,saccades and smooth pursuits based on segmentation and clustering

Abstract

Keywords

1. Introduction

2. Methods

2.2 Preliminary segmentation for saccades

2.4.1 Feature of segment

Table 1 Overview of the participants in the dataset

Table 2 Settings for intrinsic parameters for the proposed algorithm.

6. Conclusions

Footnotes

Acknowledgments

References

Table 1
Overview of the participants in the dataset

Table 2
Settings for intrinsic parameters for the proposed algorithm.