Abstract
Background
The eye-tracking-based communication systems open up opportunities for interaction in the lives of many patients with severe motor impairments. In these systems, users need to focus their gaze on a key until it is fully entered. However, these systems are not truly refined in terms of design and interaction efficiency, and they may still cause discomfort for users, such as eye strain or typing errors. Some solutions have been proposed to address this issue, but so far, no comprehensive solution has been found.
Objective
In this research, we have proposed a novel method to adjust data inputting speed in human-computer interfaces controlled by eye gaze and electroencephalography (EEG) data.
Methods
It combines EEG data to extract the user’s attention level. We then flexibly adjust the time users need to keep their gaze on a key on the system and personalize the user experience using their own data. This approach aims to enhance interaction efficiency by increasing interaction speed while maintaining accuracy. We evaluated this method on an eye-tracking-based spelling communication system for Vietnamese people, proposed by our team, involving 20 healthy individuals and 4 people with motor function impairments.
Results
The results indicated that communication speed through the system increased by 20–80% for participants, and not only did the time improve, but the communication effectiveness also increased linearly. This outcome was achieved for both healthy individuals and patients.
Conclusions
Through experimentation, we have demonstrated the feasibility and effectiveness of our approach, showing improvements in typing speed over successive trials. This result is highly meaningful as it optimizes dwell time and interaction efficiency in the ET-based system without having to compromise by increasing the error rate when directly reducing dwell time.
Keywords
Introduction
Traditional human-computer interaction (HCI) systems are typically designed around conventional interfaces, utilizing devices like screens, keyboards, and mice. While effective for able-bodied individuals who can interact with these standard input methods, these systems may not adequately meet the needs of individuals facing severe motor impairments. Such conditions are often caused by neurological disorders such as muscular dystrophy, amyotrophic lateral sclerosis (ALS), or cerebral palsy. These diseases disrupt the communication between the brain and musculature, significantly affecting a person’s overall quality of life. 1 Consequently, individuals affected by these disorders often become completely paralyzed, lose all ability to communicate, and become entirely reliant on others to meet their daily needs. Therefore, a system is needed to help them interact again with life to be able to meet the minimum physical and mental needs of the patient. Augmentative and Alternative Communication (AAC) systems offer a vital solution for individuals with severe motor impairments who struggle to use traditional human-computer interaction (HCI) interfaces. These systems are specifically designed to bridge the gap between a user’s intentions and their ability to communicate or control their environment, thereby significantly enhancing their quality of life.
Eye-tracking (ET) technology stands out as a highly effective form of Augmentative and Alternative Communication (AAC) due to its ability to harness the residual motor functions of individuals with severe neurological conditions. When other motor abilities are compromised, eye movements often remain intact, providing a crucial channel for communication and interaction. 2 Based on this, numerous ET-based systems have been developed to support the interaction capabilities of this group with their environment.3–5 Among them, our previous study 5 has proposed two new specialized onscreen keyboard layouts controlled by eye gaze for the Vietnamese language which both speed up typing and reduce typos. ET-based systems enable users to interact with communication systems through an onscreen keyboard using eye gaze input. These systems incorporate an onscreen keyboard, and users control the keys on this virtual keyboard with their eye gaze.
There are two methods for the system to determine the key that the user wants to select: scanning and direct selection based on gaze. The first method involves the system scanning each key, and the user signals when the desired key is reached. While this method provides high accuracy in key selection, it is time-consuming, resulting in inefficient system interaction and slow text input. Additionally, prolonged screen interaction can cause eye strain.6,7 The second method allows users to select keys on the screen with their gaze, using eye blinks as confirmation signals. 8 However, this approach faces challenges due to the natural and unintentional nature of blinking, leading to difficulties in continuous key selection and overall ineffective system interaction. Frequent blinking can cause eye strain and input errors due to misinterpretations by the system. Another method proposed by Majaranta et al. 9 aims to bring about better interaction efficiency for these systems. This method relies on normal eye movements and gaze stability, as well as the ability to maintain gaze on a “key” for a brief period (referred to as dwell time). When the user holds the gaze for a dwell time period, the system records that users “press” that key. Systems using this method have proven highly effective in restoring general communication abilities in individuals with neurological disorders and basic interaction with the environment while they have been completely paralyzed. 10 The choice of value for the dwell time parameter is usually fixed in these systems; however, setting a fixed dwell time encounters several challenges related to user comfort and virtual keyboard efficiency. 11 A dwell time that is too long may cause eye strain and affect typing speed, while one that is too short could cause the user to make many errors. 12 Moreover, each individual has different levels of familiarity with the system, so fixing a dwell time value proves to be ineffective. Instead, exploiting personalized information of each user to adjust the appropriate dwell time will bring effectiveness in the user’s experience with the system.
There is a category of research that strives to enhance the efficiency of ET-based systems by effectively adjusting the dwell time parameter. Jimin Pi et al. 13 introduced a probabilistic model for gaze-based selection that dynamically modifies the dwell time according to the likelihood of each letter, which is determined by previous selections. Another method proposed by Martez E. Mott et al. 14 recommended the dynamic adjustment of key dwell time based on their keyboard position. However, these studies frequently require the manual determination of hyperparameter values (such as thresholds), limiting their universal applicability. Another category of research develops these systems in a direction that eliminates dwell time from the system, referred to as the dwell-free method. In dwell-free eye-typing, users do not need to look at each key for a fixed period. Instead, the system recognizes sequences of words from users’ continuous eye-traces. Users need only gaze through the desired letters in their desired phrase or sentence. After looking at a designated area, the system processes the eye-trace and infers the desired word sequence. These dwell-free eye-typing systems have demonstrated a moderately superior text entry rate compared to their dwell-based counterparts.15–17 However, the use of this dwell-free method poses challenges in designing interfaces and accuracy while interacting with the system.
A promising approach to enhancing the effectiveness of ET-based systems is to integrate the use of multi-modal data. In particular, the combination of ET and electroencephalogram (EEG) data has been explored and applied in systems to accomplish many tasks and enhance the effectiveness of these systems.18–21 Additionally, research aimed at creating a more efficient BCI P300 speller system by incorporating ET data has shown positive results. 22 While this approach opens up new possibilities for the correlation and feasibility of combining ET and EEG data, no research has yet combined EEG data with the aim of improving the interaction effectiveness of ET-based systems.
The dwell time on a specific key is used to determine the user’s intended selection. If the user maintains their gaze on a key for a dwell time, the system confirms that key as the selected option. In essence, if the user concentrates their attention on a key by maintaining their gaze toward it, this signifies their intention to select that key. Therefore, we believe that by identifying moments of user attention in conjunction with their gaze on the current key, we can develop a solution to improve interaction speed in these systems. This attention state can be entirely obtained through EEG data collected from the user.23,24 More specifically, an idea for increasing the interaction speed of systems relies on combining ET data with additional data about the user’s level of attention.
In this study, we propose a novel method to adjust data inputting speed in human-computer interfaces system controlled by eye gaze and EEG data. Based on attention state extracted from EEG data, we determine whether the user is focused or not. This state of attention is then used to adjust the dwell time of the ET-based system to increase interaction speed and interaction effectiveness with the system’s onscreen keyboard. We then evaluate the proposed method on the ET-based system previously proposed in our earlier work 5 to demonstrate its effectiveness.
The remainder of the paper is organized as follow: Section “Related works” reviews existing research in the field and describes our previous system, Section “Proposed method” describes the proposed acceleration method for keyboard input, Section “Experiment and results” presents the experimental process and results, and finally, Section “Conclusion” covers conclusions and future work.
Related works
Human-computer interface system controlled by eye gaze
Eye-tracking is a method where the position of the eye is used to determine the gaze direction at a given time and also the sequence in which they are moved. 25 This technique is the basis of primary ET-based systems to support subjects who cannot communicate in the usual way. To interact with these systems, the user looks at a key on an onscreen keyboard. If the user’s gaze remains fixed on the same key for a dwell time the system assumes the user intended to “press” that key. However, these systems have a very limited typing speed in the range of 7–20 wpm. An important parameter in this process that determines the effectiveness of ET-based systems is dwell time. The dwell time must be long enough to ensure accuracy in selecting keys on the onscreen keyboard, otherwise a high rate of false selections (known as the Midas touch problem) may occur, leading to increased user frustration and consequent delays in the overall process.26,27
The need for an effective dwell time has led researchers to propose adaptive strategies for its determination. For instance, Paivi Majaranta et al. 28 introduced a method that integrates combined visual and auditory feedback to select an efficient dwell time for eye-typing systems, demonstrating its improvement. Kari-Jouko Räihä et al. 29 emphasize the importance of considering individual differences and suggest interface improvements for eye-typing systems. They highlight the limitations of fixed dwell time settings and advocate for personalized dwell time adjustments based on user preferences.
In the system proposed by Oleg Spakov et al., 30 the dwell time was adjusted based on the exit time. The exit time is defined as the time from when a key is selected until the user moves their gaze away from that key. However, this online adjustment is plagued by delayed feedback and uncontrolled variations in the exit time.
In another system proposed by Paivi Majaranta et al., 31 the dwell time was fine-tuned by controlling the speed of the control keys. The dwell time is fine-tuned by adjusting the speed of the cursor movement. However, a major drawback of this method is that it requires additional selection time. This means that the user has to spend extra time to select a character, as they have to control the cursor to move to the position of the character on the screen. In addition, the study also shows that fine-tuning the dwell time can cause inaccuracies in text input.
EEG signal used in brain-computer interface system
The term “brain-computer interface” (BCI) refers to a field of research dedicated to utilizing signals from the brain to control or interact with external systems. 32 Within this context, the EEG-based BCI stands out as one of the most rapidly advancing domains within the broader field of BCI development. 33
The exploration and utilization of EEG signals have become feasible and prevalent, thanks to the research efforts of Hans Berger. 34 His work in 1924 demonstrated that electrical signals of the human brain could be measured from the scalp. 35 Based on this foundation, numerous studies have been conducted to create control systems using EEG signals. The systems proposed by Luzheng Bi et al. 36 focus on EEG-based brain-controlled mobile robots. These robots utilize BCI based on EEG to interpret human control, typically categorized into two types to assist people with disabilities: brain-controlled manipulators and mobile robots. Similar research endeavors also employ EEG signals as the foundation for creating control systems for assistive devices and functional restoration for people with disabilities, stroke survivors, and others with neurological deficits.37–39 In the context of the surge in online education, numerous studies have been conducted to enhance the quality of this form of training. Among these, the effective approach of utilizing electroencephalogram (EEG) signals to determine user attention has gained considerable attention. 40 In 2020, Kridsakon Yaomanee et al. 41 presented the methodology for identifying scalp locations to detect EEG signals related to attention. The results provided recommendations for optimal recording positions to extract the attention level of the subjects. Subsequently, many studies building upon those results have developed BCI systems capable of determining the attention level of a subject through EEG signals. A notable example is the system introduced in 2022 by Abber, 42 designed to assess the attention of students in online classrooms. This system experimented on a public dataset, extracting the power spectral density (PSD) feature using Fourier transform. EEG data has proven its effectiveness in the systems mentioned earlier, especially the ability to obtain information about user attention is the aspect we are most interested in about EEG, which we believe will be effective when integrated into ET-based communication systems. However, according to our research, there have been no published studies combining the two types of data, ET and EEG, to enhance the effectiveness of communication systems using an onscreen keyboard controlled by eye gaze.
Combining EEG and gaze-tracking in human-computer interface system
The successful integration of these two types of data has been previously accomplished by Xujiong Dong and colleagues. 19 They describe a hybrid brain-computer interface that combines information from a four-class motor imagery-based EEG classifier with gaze trajectories from an eye tracker. The objective is to provide a more natural interaction with the BCI system than if gaze were used as an explicit command signal, as is commonly done. The overall results indicate that this system is more effective in terms of feedback time and accuracy of BCI tasks. In another study by Jing Zhu et al., 20 they successfully utilized EEG and eye-tracking data to improve a classification model for detecting depression. Their model outperformed traditional classification models in the field, achieving accuracies of 82.5% and 92.65% on respective datasets. Additionally, Wei-Long Zheng et al. 21 proposed a new emotion recognition method that combines EEG signals with pupillary response. The results indicated that this approach can enhance the effectiveness of emotion recognition models.
Although there have been many successful attempts to integrate EEG and ET data in the studies mentioned, our investigation reveals that there is currently no research integrating these modalities specifically to enhance the speed or accuracy of onscreen keyboard controlled by eye gaze input. From the foundation of the ET-based system, combined with the attention information obtained from EEG data, we propose a method that integrates ET and EEG data in human-computer interface systems to enhance the interaction effectiveness of these systems.
The eye-tracking-based spelling communication system
The eye-tracking-based communication system has been developed for patients with major neuro locomotor disabilities, enabling them to communicate verbally through key on onscreen keyboard. 43 To interact with these systems, the user looks at a key on an onscreen keyboard. If the user’s gaze remains fixed on the same key for a set time period (the dwell time), the system assumes the user intended to “press” that key. The usability of virtual keyboard systems with gaze-based access controls is currently hindered by the challenge of setting optimal values for key system parameters, such as dwell time, which can vary depending on the user (e.g., fatigue and system familiarity). 44 To date, numerous systems have been developed to enhance the efficiency of the initial system. However, they all share a common approach: they rely on ET technology and use dwell time to select keys on a virtual keyboard.
We apply our method to an ET-based spelling communication system, which was proposed in our previous work,
5
designed to assist Vietnamese individuals with impaired speech motor function, particularly those with ALS, in communicating with others by using their eye gaze to interact with keys on an onscreen keyboard, as illustrated in Figure 1. The designed onscreen keyboard layouts and selection mechanism based on the requirements of eye-typing and Vietnamese language characteristics from the statistics results on the Vietnamese Wikipedia Corpus dataset. This system is currently being used to support patients with severe motor impairments in several hospitals and in their homes. The use of this system has opened up opportunities for communication for them, thereby improving and supporting patients in their treatment process and reintegration into daily life. In this system, the key selection method uses the dwell parameter. To select a key on the onscreen keyboard, the user must gaze at that key’s location for a certain dwell time. The timer for each key starts as soon as the user’s gaze targets the key. The key is recognized as the user’s intended selection when the timer value reaches the dwell time. A shorter dwell time allows for quicker key selection but with a higher error rate, while a longer dwell time reduces the selection error rate but slows down typing speed. Therefore, the dwell time parameter must be carefully chosen to optimize the user’s typing task. The eye-tracking-based spelling communication system.
5

Proposed method
To improve interaction speed in eye gaze controlled interfaces, we propose a method that leverages EEG data alongside ET data, as EEG data provides valuable information about user attention state. Our approach utilizes moments of high user attention, identified through EEG analysis, in conjunction with the user’s current gaze on the screen, thereby dynamically adjusting dwell time based on attention level. Hence, we can potentially shorten the selection time compared to fixed dwell time. This approach personalizes the interaction by tailoring dwell time adjustments to each user’s unique EEG data, aiming to achieve faster interaction speeds and a more personalized experience in eye gaze controlled interfaces.
The proposed process to adjust data inputting speed in human-computer interfaces system controlled by eye gaze and electroencephalography data is generally described in Figure 2. In the process of adjusting data inputting speed, we go through 2 phases: Extract the state of attention and Adjust inputting speed. Specifically, with the user’s EEG signal receiving device, we use it to extract the user’s attention level while interacting with the system. Based on the user’s attention, we adjust the interaction speed of the system with the user to enhance interaction efficiency. During the user’s interaction with the system, we continue to save the user’s data for retraining and refining the model, we call this the user personalization process. Proposed process to adjust data inputting speed in human-computer interfaces system.
Extract the state of attention
In the process of user interaction with an ET-based system, to select a key, users go through two stages: search and dwell-input. During the search stage, users scan positions on the keyboard to find the key they want to select. Then, in the dwell-input stage, they fixate their gaze on a specific key for a certain dwell time, allowing the system to recognize it as their intended choice. We assume that when users begin to find the desired key, their sustained gaze on that key indicates their concentration. If we can obtain their level of concentration, we could determine the desired key immediately after they find it, eliminating the need to wait for the full dwell time.
To address the attention degree determination problem, we utilized ET data to label the EEG data. This labeled data was then processed to extract features for classification. We defined “attention data” as the data gathered when participants were looking at a key for selection. Based on this definition, the EEG data was divided into two sample groups: attention data (positively labeled) and inattention data (negatively labeled), using the ET data. Subsequently, we extracted features such as PSD and common spatial patterns (CSPs) from the EEG data and fed them into a suitable classifier M to ascertain when the users were attentive.
We experimented with this process while constructing an EEG and eye-tracking dataset of ALS patients and healthy individuals during the usage of an eye-tracking-based spelling system. 45 In our previous study, we experimented with extracting the state of attention using classifications such as Support Vector Machine (SVM) or Artificial Neural Networks (ANNs). We chose to integrate the SVM model for classification due to its suitability within our system. As an integrated model, SVM is lightweight and doesn’t require excessive computational resources while maintaining stable inference times with approximately 80% accuracy in cross-check cases.
Adjust inputting speed method
Building upon insights into user focus patterns obtained from previous interactions, we implemented a novel technique to enhance the responsiveness of our ET-based system. This approach dynamically adjusts the dwell time based on the accuracy and confidence of the classifier’s assessment of the user’s prior focus state. The underlying principle is that dwell time is inversely proportional to the user’s attention level and the immediacy of their focus upon identifying the desired key. The following provides a detailed description of this technique:
During the user’s utilization of the onscreen keyboard, when the user’s gaze resides within the scope of a control key (a button, a key, a word/character, label, or an icon), referred to as the dwell-input state, the system initializes a time-count parameter, denoted as t, at t = 0. Subsequently, the system updates t after each iteration, incorporating both the increment rate (Δ) and the adjustment rate (Δ
T
), following equation:
When t ≥ dwell time, the system determines that the user intend to select the key at their gaze position. If the users move their gaze to another location outside the selected key, the time-count parameter t is reset to 0. The system cancels the dwell time state of the control key when t = 0.
Δ is represented the time interval between two consecutive iterations. Typically, this value is set to 1/nc seconds, where nc is the number of iterations the system performs within one second and nc selected within about 100 to 2000 based on the hardware. This parameter ensures that the time-count variable t linearly increases with real-time clock progression.
Δ
T
varies based on the user’s attention level, determined from EEG data. During the dwell time state, if the user exhibits a high attention state, Δ
T
takes a positive value to accelerate the time-count variable t. Conversely, if the user is less focused, Δ
T
assumes a negative value to decelerate the time-count variable t. The attention state is discerned through the analysis of EEG data collected while the user controls the computer or inputs text content. The attention state is integrated into the system to adjust the Δ
T
parameter as follows:
Figure 3 describes the operating principle of the onscreen keyboard after applying the proposed speed adjustment method. As a specific example, consider the selection of the key “B” on the screen. Key “B” has a time-count parameter t
B
, representing the time the user looks at that key. The starting value of t
B
is set to 0. When the user’s gaze moves to key “B,” the system compares the coordinates of the user’s gaze point with the boundary coordinates of key “B” to determine whether the user is looking at key “B” or not. Next, the system compares t
B
with dwell time. If t
B
< dwell time, t
B
will be updated after each iteration according to equation (1). The system then adjusts Δ
T
according to equation (2) and repeats the step of checking whether the user is looking at key “B” or not. During this process, if the user looks away from key “B,” t
B
will be reset to 0. When the user looks at key “B,” t
B
will continuously increase, and when this value reaches dwell time, the system will confirm that the user has selected key “B.” Operating principle of the onscreen keyboard after applying the speed adjustment method.
User personalization
User personalization is crucial in ET-based systems. This is because the level of expertise and usage patterns vary among users. Additionally, as individuals become more familiar with the system, their interaction requirements evolve, necessitating a personalized approach. To address these differences, we have developed a user personalization process. This process utilizes the user’s own data gathered over time through their interactions with the system to retrain the classification model, thereby creating an effective personalized system.
During the system’s operation, data is continuously saved. Once a certain amount of data is collected, the system recalculates the corresponding features used and retrains the chosen classification model. This process is called user personalization. Figure 4 illustrates this process in detail. The role of ET is an important part in the user personalization process. The newly collected EEG data will be divided into two parts: negative and positive. The labeling of these two types of data depends on the value of ET. If the value of ET remains unchanged for a period of time and the key representing the value of ET is entered, the system will determine that the user is concentrating, at this time the EEG data in that period will be set to positive. Conversely, if the key representing ET is not entered, and the value of ET changes in a short period of time, the system will determine that the user is not concentrating. User personalization process.
Experiment and results
Given the specific nature of testing the ET-based system, comparing different methods using the same text throughout the entire process would yield inaccurate evaluation results. This is because, over successive interactions with the system, users become familiar with a particular text, allowing them to type faster in subsequent attempts without external influence. To ensure an effective and accurate evaluation process, we designed the following assessment scenario.
The experiment described in this study was approved by the Institutional Review Board in Human Research Dinh Tien Hoang Institute of Medicine, which has the operating codes as IRB-VN02010 issued by Vietnam Ministry of Health and as IRB00010830 and IORG0009080 issued by U.S. Department of Health and Human Services. The study was conducted in accordance with the guidelines established by Dinh Tien Hoang Institute of Medicine. Written informed consent was obtained from all participants, including both healthy individuals and patients or their legal representatives.
Experimental settings
The EEG and ET data were acquired using custom-developed Recorder Software, running on a Core i7 computer. This software, designed by our research team, facilitates the simultaneous recording of EEG and ET data, ensuring synchronized data collection. It also supports real-time capturing of ET data while participants interact with the Spelling Communication System. Both EEG and ET data streams are integrated using the Lab Streaming Layer (LSL) protocol, which enables synchronized acquisition of time-series data from multiple devices.
For EEG data acquisition, we utilized the Emotiv EPOC Flex22 device, equipped with 32 electrodes arranged according to the international 10–10 system (an extension of the 10–20 system). The sampling frequency was set to 128 Hz. To ensure participant comfort without compromising signal quality, saline sensors were employed.
ET data were collected using the Tobii Eye Tracker 524, operating at a sampling frequency of 30 Hz. This setup ensures robust and synchronized collection of both EEG and ET data, providing comprehensive insights into the participants’ cognitive and ocular responses during the experimental tasks.
Scenarios
Purposes
The purpose of the experiment is to assess the effectiveness of combining EEG signal processing to determine user attention levels in order to enhance the effectiveness of the ET-based communication system. Performance is evaluated based on input time as well as the accuracy of the system after using the method we propose. In addition, we also test the effectiveness of the user personalization process by collecting information during their use of the system to retrain the model. To conduct an evaluation of the effectiveness of the method we propose, we introduce an evaluation process with 3 stages: First, use the default ET system with fixed dwell time parameters. Next, integrate into the system by integrating EEG according to the proposed method. And finally, use additional personal data of users during interaction with the system to increase interaction effectiveness according to the proposed method.
Participants
The participants in the experiment were divided into two groups:
The first group consisted of
The second group included patients, referred to as the motor speech disorders group (referred to as D). This group consisted of a total of
All participants, including both ALS patients and healthy individuals, were trained to use the eye-tracking-based spelling communication system, enabling them to form sentences by selecting letters using their eyes and communicate freely. This was the first exposure to such a system for all participants, ensuring that their interactions with the system were unbiased by prior experience.
Procedure
The sentences used in experiment.
The testing process is divided into 9 phases. In each phase, the user inputs a corresponding sentence from Table 1 using proposed eye-tracking-based spelling communication system. After each phase, the time taken for the user to complete a sentence and the number of operations user can perform in a minute will be recorded to evaluate the effectiveness of the proposed method.
Participants will complete the first sentence using the conventional configuration (without using EEG signals for accelerate). For the following sentences, participants will complete them with the system integrated with the proposed acceleration method. Each subsequent session will use the training results of the model with data collected in the previous session. The rest time between typing sessions is 15 minutes to ensure that participants do not experience eye fatigue or adapt to the speed of the previous typing session (factors that can affect the completion time of the sentence). This process is illustrated in more detail in Figure 5. The process of integrating the acceleration method into the system.
Results
The test results are evaluated based on the appropriate changes in dwell time. If, in subsequent experiments, the dwell time decreases compared to the initial experiment (the one without acceleration methods) and the user’s input efficiency remains stable (as evidenced by the improvement in the time they take to complete a trial sentence in a manner similar to the improvement in dwell time), then the acceleration method is deemed effective.
Figure 6 illustrates the trend in the changes of dwell time across each experiment conducted with healthy individuals. The results indicate that for the default system without acceleration methods, the average dwell time is 3.5 seconds. After implementing the acceleration method, there is a slight increase in dwell time during the initial usage (sentence 2), but after retraining the model during the personalization process, the dwell time significantly decreases to around 2 seconds and remains stable thereafter. This trend is common among the majority of participants. However, during the experiment, we also observed one individual with a different trend. The participant, identified as H5 from the healthy group, experienced a significant decrease in dwell time as well as in the time taken to complete the text after the acceleration method was applied (in sentence 2). However, the personalization process for this individual did not yield as positive results as simply using the pre-trained model in sentence 2. This occurred because, during the subsequent personalization process, the quality of the EEG signals obtained from this participant was suboptimal, leading to less effective personalization and slower progress compared to the other participants. Dwell time variation across experiments in the healthy group.
To explain the trend observed in these results, after the proposed acceleration method was applied for the first time (in sentence 2), the model detected the user’s focus using pre-trained data. This data might not be suitable for certain groups, which is why the dwell time in this experiment is often slightly higher than initially observed. However, the pre-trained model might work well for other groups. For instance, as mentioned with participant H5, the acceleration method proved to be highly effective right from the first use. Subsequently, the user’s data was collected and directly used for retraining the model. As a result, the model now becomes better adapted to the user, leading to high effectiveness in the personalization process, as evidenced by the sharp decrease in dwell time.
Figure 7 also illustrates the changes in dwell time across each experiment trial but for the group of patients. For the patients, we observe a trend similar to that of the healthy individuals. However, the test results for the patients indicate that this group exhibits less stability compared to the healthy group. This inconsistency in the results could be attributed to two factors: firstly, the patients may find it more challenging to maintain focus during the experiment; secondly, the quality of the classification model may not be sufficient to perfectly categorize this group. Dwell time variation across experiments in the patient group.
Figures 8 and 9 depict the time it takes for users to complete the required text using the eye-tracking communication system. While it is possible to directly reduce dwell time, reducing it too much can decrease communication efficiency as users may repeatedly select the wrong keys. Therefore, the time taken by participants to complete the text is a crucial metric for measuring the effectiveness of communication through the system. The results have shown that the time required to complete the text decreases linearly with dwell time. This means that even though the dwell time achieved significant acceleration, we were still able to maintain communication effectiveness, ensuring that users’ interaction accuracy was preserved. Total sentence completion time across multiple experiments in the healthy group. Total sentence completion time across multiple experiments in the patient group.

The results have proven that our proposed interaction acceleration method is effective. In particular, the proposed personalization process yields very high efficiency. The results also indicate that the personalization process shows the most significant improvements after just 1–2 data collection sessions; subsequent data collection sessions do not yield substantial improvements. Therefore, for the personalization process, it is sufficient to use data from only the previous 1–2 phases. This result is highly meaningful as it optimizes dwell time and interaction efficiency in the ET-based system without having to compromise by increasing the error rate when directly reducing dwell time.
Conclusion
In conclusion, this paper presents a novel approach to enhance the input speed of ET-based system by EEG data to determine the user’s level of attention during interaction. By capturing eye movements and EEG signals, we can accurately identify when the user is focused on the task, allowing for more efficient input. Our proposed method involves three main steps: determining the user’s attention level using classification models trained on extracted EEG features, and utilizing this attention state to accelerate the input process. Finally, we use the user’s own data during interaction with the system to implement personalization. Through experimentation, we have demonstrated the feasibility and effectiveness of our approach, showing improvements in typing speed over successive trials. This research opens up avenues for further exploration in the realm of assistive communication systems for individuals with motor impairments, potentially leading to more seamless and intuitive interfaces that enhance the quality of life for users.
Statements and declarations
Footnotes
Author contributions
Conception: Thi Duyen Ngo, Luu Tu Nguyen, and Thanh Ha Le.
Performance of work: Thi Duyen Ngo, Luu Tu Nguyen, and Thanh Ha Le.
Interpretation or analysis of Data: Thi Duyen Ngo, Luu Tu Nguyen, and Thanh Ha Le.
Preparation of the manuscript: Thi Duyen Ngo and Luu Tu Nguyen.
Revision for important intellectual content: Thi Duyen Ngo, Luu Tu Nguyen, and Thanh Ha Le.
Supervision: Thi Duyen Ngo and Thanh Ha Le.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Research is supported by Vingroup Innovation Foundation (VINIF) in project code VINIF.2020.DA10.
