Abstract
BACKGROUND:
The use of computers as a communication tool by people with disabilities can serve as an alternative effective to promote social interactions and the more inclusive and active participation of people in society.
OBJECTIVE:
This paper presents a systematic mapping of the literature that provides a survey of scientific contributions where Computer Vision is applied to enable users with motor and speech impairments to access computers easily, allowing them to exert their communicative abilities.
METHODS:
The mapping was conducted employing searches that identified 221 potentially eligible scientific articles published between 2009 and 2019, indexed by ACM, IEEE, Science Direct, and Springer databases.
RESULTS:
From the retrieved papers, 33 were selected and categorized into themes of this research interest: Human-Computer Interaction, Human-Machine Interaction, Human-Robot Interaction, Recreation, and surveys. Most of the chosen studies use sets of predefined gestures, low-cost cameras, and tracking a specific body region for gestural interaction.
CONCLUSION:
The results offer an overview of the Computer Vision techniques used in applied research on Assistive Technology for people with motor and speech disabilities, pointing out opportunities and challenges in this research domain.
Keywords
Introduction
Almost all people with disabilities need assistance to participate in social and economic life, and this support typically comes from family members, caregivers, and technology solutions that facilitate interaction with the environment through relevant Human-Computer Interaction (HCI). Assistive Technology (AT) helps people with disabilities to be socially included and to become or remain autonomous [1]. AT features are also intended to assist caregiving activities [2].
There are many input devices and different technologies that open new paradigms in HCI and can benefit people with disabilities [3]. However, developing assistive features must consider that end-users should perceive using the solution as productive, efficient, and satisfactory, thus reducing the risk of abandonment.
Today, practically all ordinary computers and even cell phones are equipped with a camera. Therefore, the Computer Vision (CV) interfaces can be easily available to everyone via these devices. Easier access to camera devices has spawned a new generation of AT features that do not involve custom and expensive electromechanical devices to accommodate special access needs but are based on software, allowing for cost savings and improved technology availability [3], as predicted by Betke et al. [4]. Non-invasive techniques based on CV (obtained via camera) allow unconventional methods of interaction to be considered, including recognizing movements of hands, head, face, and other parts of the body to perform actions in computer systems [5]. Alternatives are demanded by people who are totally or partially unable to move or control their limbs and cannot rely solely on verbal communication [6]. An interface capable of interpreting their limited voluntary movements can enable conversations with friends, relatives, and care providers, or send commands to a system [7]. Non-invasive vision-based techniques permit unconventional interaction methods to be considered, including using movements of hands and head, eye-gaze tracking interfaces, eye-blinks based interfaces, and face gestures-based interfaces, for the voluntary control of computer systems [5].
In a systematic mapping [8] that identified opportunities and challenges on research applied to mobile devices for Augmentative and Alternative Communication (AAC), we realized the need to promote the use and effective adoption of AT as well as the potential of using CV as an alternative mode of interaction, mainly for users who have motor and speech difficulties. Thus, in this paper, we present a new systematic mapping of literature to identify research initiatives regarding the application of computer vision to enable interaction and communication of people with motor disabilities. Results of systematic mapping summarize the selected studies and provide a mixed survey of scientific contributions related to this topic.
Interaction and communication of people with motor and speech disabilities
Computer vision-based interaction systems process the images coming from one or more cameras to extract features that are interpreted for implementation using specialized software [9]. These systems present many advantages, mainly because they are non-invasive, that is, without physical contact with the device [10], and virtually any part of the body with mobility can be used to perform the interaction, which is especially important for people with severe physical disabilities [9]. Consequently, these systems are extremely flexible because any modification detected in the video is susceptible to be interpreted by the computer and used to unleash some action [9]. It is also possible to combine CV techniques with several distinct tools for input and output, together with modern machine learning techniques, to explore the potentialities of multimodal interaction in a real-time and effective way. People with motor disabilities may not be able to produce the same gestures that other people can do. However, they may still be able to activate other muscles even with limitations in strength or duration of that activation [11].
AAC aims to complement or replace speech to compensate for difficulties of expression by using non-verbal communication systems and intervention strategies [12]. Vision-based AAC computational systems can empower users with motor disabilities by using their remaining functional movements. These technologies allowed people to write words and phrases on the computer and talk employing a speaking synthesized voice. Furthermore, with the advent of CV and infrared trackers, eye-gaze and head trackers are turning cheaper and portable, finding their use in everyday products like tablets and smart-phones, eyeglasses, mounted on the wheelchair or on the bed [13].
Probably the earliest example using video to recognize hand movements as an interaction mode is the study of Krueger et al. [14]. Jacob [15] was one of the first to introduce look-based interaction techniques into real-time applications aimed at people with disabilities. Jacob discussed some factors and technical considerations that arise in attempting to use eye movements as a means of entry. Several works were and continue being developed following this line of research, as shown in the results of the systematic mapping described below.
Systematic mapping
Systematic mappings are a particular type of systematic review with a broad scope, designed to cover and provide an overview of a research area, classifying and counting contributions according to predefined categories [16]. Systematic mappings allow to identify, analyze, and work on the available research relevant to a specific research question [17].
Study protocol used to conduct systematic mapping, specifying the research problem, objective, general question and research questions
Study protocol used to conduct systematic mapping, specifying the research problem, objective, general question and research questions
Inclusion and exclusion criteria used in the first filter of systematic mapping for study selection
The possibilities of using CV to support the interaction and communication of people with motor difficulties are diverse, exciting, and challenging in terms of devices, interface, interaction, social, and economic issues. Thus, we considered a systematic mapping of the literature as helpful to investigate recent research initiatives in the field of CV applied to assist the interaction and communication of people with motor disabilities. The mapping considered scientific articles/ papers indexed by ACM,1
https://dl.acm.org/.
https://ieeexplore.ieee.org.
https://www-sciencedirect-com-443.web.bisu.edu.cn/.
https://link-springer-com-443.web.bisu.edu.cn/.
The systematic mapping was developed according to the guidelines of Munzlinger et al. [18] and Petersen et al. [16]. The first step was to plan and formalize the study protocol, specifying the research problem, objective, general question, and research questions. Table 1 presents details about the study protocol employed.
For the study protocol, the selection criteria were defined and applied as a first filter for the retrieved studies. The selection criteria were divided into inclusion and exclusion criteria and used to classify the studies according to their metadata (title, abstract, and keywords). Studies that met at least one of the inclusion criteria were included, and studies that met at least one of the exclusion criteria were excluded. Table 2 presents the inclusion and exclusion criteria used to filter studies in the first filter.
Selection criteria used in the second filter of systematic mapping for study selection
The selection criteria for the second filter (see Table 3) were defined and applied in the complete reading of the studies resulting from the first filter. Only studies dealing with topics related to at least one of the selection criteria were maintained.
The searches were carried out in December 2019, returning 221 studies: the first filter excluded 167 papers, and the second filter 21, resulting in a set of 33 studies. The search expressions for each selected database were defined, calibrated, and adapted according to the available features. Table 4 presents an overview of the selection process.
Search expressions used and results obtained in each database considered in the systematic mapping: ACM, IEEE, Science Direct, e Springer
Data extraction form used to standardize the recording of information obtained from the studies selected in the systematic mapping
Table 5 shows the data extraction form used to standardize the data extracted from the selected publications to reduce the bias of results and the informality of the process. This data extraction form was inspired by the gesture taxonomy presented by Escalera et al. [19] on the various components involved in researching the recognition of actions/gestures, maintaining only the most relevant information for the research domain of this work.
The extraction was carried out by tracking the information on the extraction form for the declarations of each article and verifying accuracy. The categories were created dynamically as the data was extracted to indicate the data set resulting from the extraction process itself.
Threats to validity may have affected the results of this systematic mapping. To mitigate threats, the review protocol was validated by the three authors to ensure that the research was as correct, complete, and objective as possible. However, potential limitations were identified in the selection of the publication and data extraction.
Number of articles published by year, among the studies selected in the systematic mapping.
One of the possible threats to the validity of this study corresponds to the lack of any relevant research in the area. It is not possible to guarantee that all relevant published works are included in this mapping. The search for publications was carried out only in a limited set of databases, assuming that these search engines tend to contain the most relevant studies. The terms used in the search expressions can have many synonyms, and the search expression itself restricts the possible results. Relevant documents may not have been included due to the divergence in using terms and expressions.
As another threat, it is possible that some type of inaccuracy or incorrect classification occurred in the data extraction performed in this systematic mapping, mainly because the data extraction was done individually by a researcher. The consistency of the classification scheme can also introduce bias in the data analysis, and other researchers may propose different classification schemes. To reduce these threats, data extraction and classification were conducted by the first author of this work and revised by the second and third authors.
According to Petersen et al. [16], four types of validity must be taken into account to minimize threats to the validity of the study: descriptive validity, theoretical validity, generalization, and interpretive validity.
Descriptive validity is the extent to which observations are described accurately and objectively [16]. This threat is considered to be under control because a data extraction form was designed to support data recording. The form directed the data extraction process in an objective way and allowed the extracted information always to be revisited.
Theoretical validity is determined by the ability to capture what one wants to achieve [16]. To reduce this threat, the set of research questions was evaluated by the first author and, later, by revised by the other authors. The first and second filters were applied, and the selected articles were read in full, enabling the extraction of data to answer these questions.
Generalization refers to how much it is possible to do a search and generalize the results from the proposed research process. To avoid the threat of external generalization (generalization between groups or organizations), a specific and theoretically defended protocol was used [16]. And to avoid threat for internal generalization (generalization within a group), research questions and specific inclusion/exclusion criteria were defined, allowing the expansion or reproduction of research in a different period, ensuring the generalization of the study.
The interpretative validity is reached when the conclusions drawn are reasonable, from the data, and, therefore, are mapped to the conclusion validity [16]. A threat in interpreting the data is the researcher’s bias, which is minimized through the review process carried out by another researcher. The paper was revised and accepted by the three authors, reducing the possibility of bias and misinterpreting data. Repeatability requires detailed reporting of the research process [16]. In this section, the systematic mapping process followed was reported, as well as the actions that were taken to reduce possible threats to validity.
Selected studies categorized by application, body region used for gestural interaction and participation of people with motor difficulties at some stage of the research development
Based on the data extracted from the selected studies, we identified some methods and approaches used, as well as some characteristics of the researches that applied CV in the development of AT resources for people with motor difficulties. The information considered relevant is presented below.
Figure 1 shows the number of articles published by year, among the studies selected in the systematic mapping. The most significant number of studies (eight) was published in 2015. Of these, five studies were published in distinct conferences, and three studies [20, 21, 22] were published in the same conference: 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments (PETRA). Another five studies selected in the systematic mapping [23, 24, 25, 26, 27], were also published at the PETRA conference in other years. This conference is considered highly interdisciplinary, focusing on computational and engineering approaches to improve the quality of life and improve human performance in a wide variety of settings, in the workplace, at home, in public spaces, urban environments, and more.
The execution of movements by people with disabilities is very particular, and different remaining movements can be explored for interaction. Research related to the detection and tracking of one or more parts of the human body has been identified, and the verified occurrence is shown in Table 6. In this table, the selected studies are categorized according to their application and identified regarding the participation of the target audience (people with motor difficulties) in the process of building or evaluating the corresponding solution.
Studies selected in systematic mapping, focused on HCI aiming to enable communication, using or not the term “Augmentative and Alternative Communication”
Studies selected in systematic mapping, focused on HCI aiming to enable communication, using or not the term “Augmentative and Alternative Communication”
TagCloud highlighting the frequency of terms used in the keywords of the studies selected in the systematic mapping.
The keywords indicated in each selected study were extracted to create a representative image of the terms with the highest occurrence in a TagCloud presented in Fig. 2. TagClouds are visual displays of a set of words selected for some reason, in which text attributes such as size or color are used to represent relevant properties, such as the frequency of associated terms [28]. Figure 2 shows that among the most representative terms are “interaction”, “assistive”, “detection”, “recognition”, “gesture”, “human-computer”, “technology” that are related to the objectives of the systematic mapping carried out. The term “nonverbal communication” was found in some articles and can be an interesting option to be included in the keywords of search expressions in new mappings that may be carried out.
Among the articles focused on the development of computational solutions for HCI, 29% (7 of 24) aim to enable the communication of people with motor difficulties, as shown in Table 7. Among the studies focused on communication, only four used the term “Augmentative and Alternative Communication”. However, even without being explicitly described in the study, the alternative forms of interaction investigated in all studies selected in this mapping can potentially contribute to the development of AAC solutions.
The representativeness of the study of Betke et al. [4] is observed in the theoretical reference of several papers, and also because five [21, 23, 24, 26, 29] of the selected studies indicated having used the CameraMouse system in their research, either to propose improvements or to compare results. Only 26% (8 of 30) of the total of the selected studies included people with motor difficulties in the process of building or evaluating the solution in question.
Human body parts investigated in selected research in systematic mapping using CV techniques to generated AT resources for people with motor difficulties.
Figure 3 shows the body regions used in the selected studies (it was possible to identify this information in 29 of 33 studies) to enable gestural interaction with computational solutions, machines, or robots. Using eyes for interaction remains a quite explored alternative (37% – 11 of 29), be it in a unique way, or associated with other body regions. Studies focused on the detection and recognition of hand gestures also represent a significant portion of the selected studies (20% – 6 of 29).
Table 8 presents information related to devices and source of visual data used (RGB and Depth). Publications related to the symbol “
Devices and data source used in selected studies via systematic mapping
Methods used in the selected studies in the systematic mapping to perform the pre-processing and objects segmentation, feature extraction and representation
The main objectives or applications of the selected studies are not the same, and as a result, the methods of CV used, mainly in the preprocessing, segmentation, and feature extraction stages, are very varied, making a standardized presentation difficult. As a result, Table 9 highlights the main methods used to carry out the pre-processing and segmentation of objects, feature extraction, and representation, cited in the selected studies, to generate the necessary information for the following steps in the process of recognition.
Some studies [30, 31, 32, 33, 34, 35, 36] cite using functions available on OpenCV library (Open Source Computer Vision). For calculating optical flow, the studies of Kumar et al. [37], Missimer and Betke [23], and Ascari et al. [36] mention using LucasKanade’s optical flow.
Techniques used to perform the step of recognizing gestures or movements in the studies selected in the systematic mapping
The techniques used to perform the step of recognizing gestures, expressions, or movements in the selected studies are presented in Table 10. The wide variety of methods and approaches indicates the need to evaluate the context of each application to choose the best way to implement this step.
Among the programming languages cited in the studies that mentioned this information, we can highlight: C
A brief description of the contributions of each study selected in the systematic mapping is presented in the following subsections.
Publications referring to bibliographic surveys are an important way to share knowledge among researchers interested in contributing to the growth of a given field. Three of the selected works refer to bibliographic reviews in which using CV for people with motor difficulties is mentioned.
Martins et al. [60] presented a study on possible technological solutions to improve the communication process for deaf people on e-learning platforms using sign language translation. Among the technological options used for the recognition, translation, and presentation of signs are CV-based approaches. According to these authors, the capture of the whole body by a camera can be a solution for the correct translation of sign language. However, although existing devices can easily capture gestures and expressions, they face some problems in recognizing gestures, such as: a vast number of gestures and similarity between them; different sign languages due to culture, individual social life, and the way gestures were taught; and, the sequence of gestures to express a sentence can be challenging to calculate because it is difficult to detect where a gesture begins and ends and where the next one begins. Thus, the authors believe that there are some critical challenges to solve, and there is not an effective integration of these technologies in e-learning platforms yet. Besides, no immediate solutions to resolve synchronous communication in real time between deaf and non-deaf people have been achieved.
Ghanem et al. [27] presented a bibliographical survey of the most recent techniques in sign language recognition systems based on mobile devices. The authors classify the existing solutions in sensors and vision and focus on signal detection and classification algorithms. The vision-based approach refers to using the phone’s camera to capture the image or video of the executing hand signals. The article shows a comparison between the methods and techniques used in several works based on VC, mainly focused on the recognition of static signals.
The two studies previously mentioned cited research aiming at developing successful sign language recognition, generation, and translation systems, and are related to this study despite having as main target audience deaf and hard of hearing people. People with motor disabilities, in general, present difficulties in performing movements, and the correct execution of a sizable predefined gesture set, such as used in sign language, is a challenge. Even so, the contributions obtained from studies aimed at sign language recognition using Computer Vision can contribute positively to the development of technologies for people with motor and speech difficulties [3].
Tavares et al. [61] highlight assistive technologies that allow the digital inclusion of people with cerebral palsy, raising methods for their assessment, considering motor possibilities, carrying out tasks, and user satisfaction. CV-based studies are presented, such as the study by Manresa-Yee et al. [62], who investigated using head movements to replace the mouse. These authors cite existing solutions and present a series of design factors relevant to their use, proposing a list of design and evaluation parameters for future designers to use. Another study cited is Pauly and Sankar [63] who investigated non-intrusive methods for evaluating blink detection devices as an alternative form of interaction, with the HOG descriptor combined with the SVM classifier, showing better performance compared to other methods.
The cited studies give a sample of the research related to the use of CV in inclusive approaches for people with motor difficulties or related disabilities. These studies provide a useful overview of the researched domain in a condensed form, and also represent sources of additional references.
Gestural interaction applied in leisure activities
The Kinect Virtual Art Program (KVAP) uses Microsoft Kinect’s gesture recognition technology to allow the creation of art by tracking the user’s different body regions. Diverse effects, such as shapes and glows, are activated by different gestures and speeds of the limbs. Diment and Hobbs [38] presented a pilot study conducted with five children with severe disabilities. KVAP encouraged physical activity and allowed children to create their works of art, an activity that was previously inaccessible to them using traditional approaches.
Graham-Knight and Tzanetakis [20] introduced a new approach for people with disabilities to play music through an adaptable music interface using Microsoft Kinect. A system was developed in which Kinect positional data are sent to the Max/MSP visual programming language through the Open Sound Control (OSC) protocol for analysis and reproduction. A test was carried out with the aim of teaching users (with and without physical disabilities) to play a non-touch instrument, which is not intuitive, and also to measure the overall latency of reaction time and the system. The test involved playing a drum sound, clicking with the mouse on a ‘bang’ message, prompting the user to strum a guitar chord through a hand gesture. The study provides a baseline for future improvements, measuring the latency of the current Kinect system, and training time and repeatability. According to the authors, the amount of latency was considered too high for the presentation of a concert. Still, the system proved to be pleasant for a participant in a situation of music therapy.
Based on the studies presented in this section, it is possible to observe that the interpretation of body interaction commonly explored in entertainment games also has its application in other leisure activities, such as in the field of art and music. The solutions presented enable to expand the possibilities of interaction for people with motor difficulties, encouraging the learning of new skills.
Human-computer interaction
This section presents research aimed at enabling the interaction of people with motor difficulties with computer systems or devices in a generic way, for different applications.
Kumar et al. [37] have created an optical mouse control system based on recognizing head movements and blinking eyes to generate assistive HCI technology for people with motor difficulties. According to the authors, the proposed model showed up robust, real-time and cost efficient, and covered different aspects of the face, eye and blink detection, even though it still lacked the smoothness in control of the mouse.
Missimer and Betke [23] proposed an algorithm that allows the user to interact with the computer using the blink of an eye to simulate the click of a traditional mouse. The algorithm can automatically locate the user’s eyes and learn the appearance of the user’s open and closed eyes, extending the functionality of camera-based binary switch systems by Grauman et al. [47] and Chau and Betke [48], providing a more intuitive method for controlling the mouse. When interpreting the movement of three facial regions, including the two eyes, the system allows the user to control the mouse pointer at a level similar to that of the traditional mouse.
Xu et al. [49] have developed a system for people with movement difficulties of the upper limbs (or amputation). The system employs detection and tracking of facial movements to control the flow of the cursor on the screen and trigger the appropriate mouse events. It will be applied in a rehabilitation system for the disabled with amputation of the upper limb.
Paquette et al. [24] developed the Menu Controller for individuals with severely limited muscle control. This tool can collect menu entries from existing applications and present them to the user in a more accessible and usable way, for individuals with severely limited muscle control. The system was tested with a camera-based mouse replacement system – CameraMouse [4].
McMurrough et al. [25] presented a low-cost solution for real-time tracking of the position of the head of a human user concerning to a video display source for estimating the look in an assistance environment. The solution uses a wearable headset equipped with sensors found in video game devices.
Parmar et al. [50] presented a system designed for users with motor disabilities. The system allows the detection of voluntary eye blinks, the duration of blinking, and the interpretation of blink sequences in real-time to control a nonintrusive human-computer interface.
Feng et al. [26] evaluated the performance of the target reverse crossing selection method, for use with a camera-based mouse replacement system for people with motor difficulties – CameraMouse [4]. The results showed that target reverse crossing is more efficient than dwelltime clicking, while its one-time success accuracy is lower. Target directions have effects on the accuracy of reverse crossing, and increasing the target size improves the performance of reverse crossing significantly, which provides future interface design implications for this selection method.
Karamchandani et al. [32] have developed a simple, portable, non-invasive, and low-cost eye tracking system for children with severe disabilities. The system can differentiate between different points of view, allowing the user to control an interface in a 4
Kurauchi et al. [21] presented the HMAGIC technique that combines interaction based on head movement and gaze. The technique was incorporated into the CameraMouse [4] system, in which it positions the mouse pointer to the user’s point of view on the screen and then allows the user to control the pointer by moving the head for fine manipulation.
Sambrekar and Ramdasi [40] presented a system developed for people with disabilities to interact with computers using the direction of the eye. The system features a 4-key keyboard that can be pressed by estimating the user’s eye direction.
Cristina and Camilleri [33] present a method for three-dimensional estimation of the gaze under the free movement of the head using a notebook camera as an AAC tool to assist individuals with motor disabilities, such as cerebral palsy, affected by involuntary movements of the head and the face. The validity of the proposed method has been investigated on a publicly available data set and real-life data captured through the voluntary collaboration of a group of normal subjects and a person suffering from cerebral palsy.
Utaminingrum et al. [51] presented a new structure to detect eye movement and signal a navigation command. The authors focused on sclera detection to track eye movement. This study is part of continuous research, and the authors point out as future works that this solution will be incorporated in a mini electric seat to provide more accessibility for people with disabilities.
Chattoraj et al. [59] proposed a method to recognize the different hand gestures used by deaf people to communicate using an invariant scale in resources. The main goal is to help in communication with people who do not know American sign language.
Pal et al. [29] developed an AAC application integrated with the Tobii EyeX eye-tracker and the CameraMouse system [4] to simplify communication for people with disabilities. The developed system allows the patient to express his daily needs, send messages over the Internet and to contact numbers, chat using a voice synthesizer, generate an alarm signal, in addition to using other basic mini-modules (calculator and reminder) provided.
Rosales et al. [34] presented a prototype developed for the interaction between children with cerebral palsy and computers. The authors were able to identify certain expressions that a patient uses to try to communicate and that are more likely to be recognized by using CV. The developed interface detects and tracks movements of the user’s hand, foot, or head, and was used to identify body patterns related to headache, happiness, hunger, fear, and recreation.
Kakkoth and Gharge [35] presented a real-time manual gesture recognition system based on a visual descriptor. Ten gestures were recognized based on the detection of the fingertips, which can be used by people with disabilities to transmit these gestures in the form of text and sound.
Nakazawa et al. [52] have developed a wearable glasses-type switching device focusing on the eyeball movement. The images around the eyeball are obtained by a USB camera equipped with infrared LEDs, and the Hough transformation procedure extracts the pupil data. The gaze direction recognition system can distinguish twelve levels of “eyeball movement”, obtaining the centroid consisting of reflection images of the surface of the right cornea and image of the pupil. The authors also developed a communication tool for patients with neurological incursion using the eyeball movement as a form of interaction.
As gesture-based applications gain space in the field of special education, Sharma et al. [53] studied its potential in the Indian context. One of the main contributions of this study is the provision of guidelines for designing and developing gesture-based applications for individuals with developmental disabilities. The authors combined results and experiences from studies with users using their applications (Kirana, Balloons, and HOPE) to present fourteen design guidelines for gesture-based systems in the expectation that educators and caregivers worldwide transmit social, motor, and life skills to individuals with developmental disabilities. The Kirana app employs socially appropriate gestures to teach life skills, such as buying everyday items from a local Indian grocery store. Ballons promote joint attention skills through collaborative interaction. HOPE improves motor coordination and social and cognitive skills of users, with increasing levels of difficulty.
Ascari et al. [36] presented the computer vision system PGCA, developed according to the assumptions of a methodology [6] that provides for the creation of personalized gesture interaction to enable interaction and communication by people with disabilities. The authors presented results of interviews with professionals in the special education area and details of an experiment conducted with students who have motor and speech difficulties.
Krishnamurthi et al. [54] developed an alternative pointer device (Frontier Point Method – FPM) to manipulate a PowerPoint presentation
CV-based solutions developed to allow people with motor disabilities to interact with the computer are the majority among the studies selected in systematic mapping. In general, the studies mentioned exploring the users’ ability to make distinct and recognizable gestures to develop AT resources using different approaches. A common concern with these works is to promote a more natural and intuitive way of interaction for people with disabilities, potentially contributing to improving their quality of life.
Human-machine interaction
In the same context of HCI, there are studies aimed at making possible the gestural interaction of people with machines, devices, or equipment available in the environment in which they are inserted.
Fine and Tsotsos [30] investigated the feasibility of a system capable of obtaining visual user feedback (facial expressions) for use in an automatic wheelchair. Gao et al. [55] designed a robotic wheelchair control system for the elderly and disabled, using hand gesture interaction. A new detection method was introduced, combining skin color and depth information, to obtain real-time speed information according to the palm position.
Mohammad and Anas [39] introduced the design model of an electric wheelchair for people with quadriplegia, guided by tracking movements of the user’s retina. Eye movement is detected by a camera and captured by a signal called electrooculography (EOG). The signals EOG are processed so that they can guide the microcontroller in a serial interface, which, in turn, controls the movements of the wheelchair.
Lamb and Madhe [41] presented a study designed to automatically control the position of a bed using hand gesture recognition. Four movements can be recognized according to the hand gesture presented, directing the bed to up, down, to the right and the left.
As observed in the mentioned studies, gestural interaction has been investigated as a way to obtain an effective and natural human-machine interaction, accessible to users with very limited movement capacity. One of the studies allowed the user to control a bed positioning using gestures. Three of the four studies cited investigated the use of gestures to interact with electric wheelchairs since the target audience would not be able to control traditional electric wheelchairs. These initiatives contribute to increasing the independence of people with motor disabilities in daily life, using body signals that the user can perform to interact with machines.
Human-Robot Interaction (HRI)
Among the research focused on HRI, stands out the study of Zhang et al. [56] that investigates the tracking of head movements in a prototype of a co-robot assistance system based on an egocentric view. Wearing a pair of glasses with a prospective camera, the user is actively involved in the robot’s control circuit in navigation tasks. The intended application of this co-robot system is to help a person with severe disabilities to grab a target object using head movements.
Zhao et al. [57] presented an interactive platform, “Human-Robot Integration” with multi modalities, such as head posture, gaze, voice, and other natural body gestures. The authors propose an interactive method of intention judgment, combining pose estimation of the 3D model of the head and the direction of the gaze detected by the pupil-corneal reflection method. The research aims to increase the mutual understanding of the collaborative situation between humans and service robots.
Drawdy and Yanik [58] presented research focused on assistive robotics using HRI using a user’s point of interest estimate (based on eye tracking) to assist in issuing commands to a robot, providing better planning of the path to be taken. Different approaches for using vergence data to determine the depth of gaze have been investigated: Vector intersection approach, Neural network approach, Combined approach, and Searching region approach. The best results were found by dividing the viewing frustum into defined search regions. However, the authors suggest that estimation depth using vergence should be considered as one among a larger set of depth search criteria.
Saleh and Berns [22] presented a biologically inspired model that performs verbal and non-verbal communication between humans and robots. The non-verbal communication used in this work includes head movements and eye behavior. The robot sends information via speech and receives human feedback through non-verbal suggestions.
Assistive robotic systems can be very valuable for providing assistance in the activities of daily living for people with motor disabilities. The results obtained in the mentioned studies indicate that auxiliary devices controlled by gestures can be a viable medium for non-verbal communication between humans and robots. The benefits of IHR for people with motor disabilities are important from both a humanitarian and cost savings perspective.
Answers to research questions
Based on the mapped studies, the research questions defined for the systematic mapping presented were answered. The main question of the mapping was to identify the theoretical or practical solutions developed using CV to support the communication and interaction of people with motor difficulties. The mapping presented instruments used as resources of AT to enable communication or interaction of people with motor difficulties, either through interaction with the environment in which they are inserted, with applications of AAC or for other purposes, as shown in Tables 6 and 7.
For each question presented in Table 1, an answer was prepared based on the information extracted from the mapped studies. Concerning question “1) How has the CV-based gestural interaction been explored to assist communication and interaction of people with motor difficulties?” Currently, there are many options for gestural interaction, offering different degrees of accessibility for various purposes. Among the initiatives identified, using gestural interaction to perform leisure activities (music, art), control of equipment such as a wheelchair or bed, interaction with assistive robots, and interaction with a computer can be highlighted. All of these applications are linked in some way to offer more quality of life for people with motor difficulties by using CV techniques, whether by giving them some level of independence, facilitating the monitoring of caregivers and communication or enabling the execution of activities that would not otherwise be possible.
Concerning question “2) What types of cameras or devices have been used?” Among the devices identified, it can be seen that low-cost cameras and RGB images were quite used as a data source, and in-depth data were less used, as shown in Table 8. It was possible to observe a tendency to generate more accessible alternatives in terms of cost (through the use of simple cameras or mobile devices), including replacing eye-tracking devices, often based on infrared cameras with high cost (like study of Zhang et al. [53]).
As for question “3) Which body regions have been used for tracking?” Table 6 and Fig. 3 present information regarding body regions used as an alternative means of interaction for people with motor difficulties in the selected studies. Among 29 studies in which the identification of body regions used for screening was possible, 37% (11 of 29) investigated using the eyes as a mode of interaction, whether through blink detection, estimation of eye direction, detection of pupil or sclera. In addition, about 13% of the studies (4 of 29) investigated using gaze associated with head movements. The tracking of head movements is used in several papers, in isolation (2 studies of 29), or in conjunction with other body regions such as eyes, hands, and feet. It was possible to verify the great variety of possibilities already being used, exploring different body regions, as shown in Fig. 3.
On question “4) What techniques have been used to recognize gestures?” Several methods were used for image processing, pattern recognition, different algorithms, and programming languages used, as presented in Tables 9 and 10. As for the programming languages used, there is a trend in using well-established languages, which provide specific methods for CV, such as the OpenCV library. Many of the techniques presented in Table 9 are employed through methods available in that library.
Regarding question “5) Do the solutions allow the customization of the gestures used?” In the study of Ascari et al. [36], a system is presented that allows users and caregivers to update the sets of gestures created and used for interaction or communication through CV. No other study mentioned to allow customized or personalized gestures. This result may be biased due to the non-inclusion of specific terms such as “tailoring”, “customization”, or “personalization” in the search expression used.
On question “6) Are there any methodologies linked to the development of solutions?” The prototype mentioned in the study of Rosales et al. [34] was developed using the software development methodology Iconix [64], which helped to analyze the requirements in the preliminary investigation, the design, and implementation. The study by Ascari et al. [36] cited using a methodology to guide the development of computational solutions based on gesture recognition as an alternative form of interaction or communication specifically for people with motor disabilities. With a similar objective, but aimed at a broader audience, the study of Sharma et al. [53], which presents fourteen design guidelines for designing and developing applications that employ gestures for individuals with developmental disabilities, can be highlighted. The relatively small number of studies that cited using a methodology suggests a gap in the literature and a demand for studies for this purpose, or even that the terminology “methodology” has not been used by studies to refer to the ways of developing AAC systems for people with motor and speech difficulties using gesture interaction.
On question “7) Do solutions aimed at enabling communication use the term Augmentative and Alternative Communication?” It was noticed that the term AAC was not used by all studies that presented solutions aimed at making communication for people with motor difficulties feasible. Only four studies of 7 used the term AAC. In some studies, the term “nonverbal communication” was used as a reference to using gestures to make communication possible.
And finally, regarding question 8) Did people with motor difficulties participate in the construction or evaluation of the proposed solutions? It was possible to identify that only eight studies of 33 involved people with motor difficulties in building or evaluating the solutions presented. Of these studies, 5 included representatives of the target audience in stages of evaluating the solutions developed: [24, 38, 33, 29, 36]. In the studies of Graham-Knight and Tzanetakis [20], Rosales et al. [34], and Sharma et al. [53], a collaborative approach was adopted involving different stakeholders in the process of building and evaluating solutions.
Discussion
The studies presented in this paper represent initiatives aimed at different assistive contexts and for people with varying degrees of disability. Still, in general, they offer technological solutions to minimize problems faced by people who experience motor or speech limitations.
The studies listed above in different ways highlight the feasibility of employing CV techniques to support user interaction with computers.
Involving the target audience and their caregivers in the design and evaluation of assistive technologies can be challenging because of the complexities of the technology and requirements of this population. Possibly because of this, few selected studies reported the participation of representatives of the target audience in the construction or evaluation of the proposed solutions. Studies such as the Rosales et al. [34] that allowed identifying body patterns of a child with cerebral palsy highlight the potential of the current AT. The deficiency is no longer as limiting as it was in the past. The focus of the studies is no longer on verifying whether it is possible to interact with a computer, but on making HCI better or more effective. Various solutions have been identified, and the next step in the AT knowledge field may be related to reducing the abandonment of existing solutions by making them accessible, more flexible, easier to use, and available at a low-cost.
A trend in the development of low-cost solutions has been using simple cameras or webcam for data acquisition. However, depth cameras are also becoming more accessible and are, therefore, already an alternative. Other devices such as Eye-Tracker, Head-Tracker, BCI, among others, remain in use, alone or together. Proposals that aim to generate an accessible structure often explore the use of more than one entry and exit modality to provide multimodal interaction.
Different descriptors of features were used in the selected studies. However, the choice of features is a complex task, which requires a lot of knowledge about the problem domain. Currently, deep learning networks have been widely used to process complex data, and, in particular, CNN has been used efficiently in various pattern recognition tasks, including image recognition, as observed in some studies. In the studies that used CNN, there is no concern about choosing the best feature descriptor, since CNN does not require common steps for pre-processing and feature extraction. That is, data is used in a raw way, preventing the precision of a given classifier to be determined by the choice of descriptors that best represent the elicited problem.
Among the classifiers used, it was observed that traditional options such as Adaboost, ANN, KNN, and SVM remain popular and efficient for gesture recognition. However, new alternatives have been explored by the scientific community (such as CNN) and have shown promising results in classification applications.
Considering the whole process of patterns or gesture recognition, from gesture acquisition to interaction with a computer interface, it was possible to observe that only one of the studies presents (or highlights) the possibility for the user to define in a flexible way the set of gestures to be used. Among the studies selected, many used predefined gesture sets, and the tracking of a specific body region was employed by most of the papers. Studies, in general, have been conducted with patients in clinical trials, proving only the technical feasibility to transform biological signals into commands, but failing to provide insights into the device’s operation daily. In addition to the commercial approach, the successful adoption of a device depends on several factors, including the caregiver’s ability to learn and customize the new tool [65].
Another point observed was that, except Ascari et al. [36], the studies in general, do not present details about the difficulties encountered in creating the data set, possibly because they use existing datasets or datasets created with gestures performed by people without disabilities. This is a critical and somewhat understandable point, as in general, the designers of AAC systems face difficulties in data collection due to different reasons [66]: it is challenging to obtain enough users with disabilities to evaluate alternative projects; and collecting sufficiently large usage data for analysis is also problematic, as physical disabilities prevent users from working continuously for a long time and, in some cases, it takes several months to collect usage data for evaluation.
Among the studies surveyed, it was noticed that using the term “Augmentative and Alternative Communication” is not standard, since several studies were developed with a focus on human-computer interaction as a way to facilitate the communication of people with some type of disability. Still, they are not described as a specific contribution to the AAC field.
Most selected articles provided in their conclusions some insights about future research, being largely related to improving their proposed systems or approaches (such as Ascari et al. [36], Feng et al. [26], Kurauchi et al. [21], Pal et al. [29], Sambrekar et al. [40]). Some studies proposed to investigate other input methods associated with those already employed (Nakazawa et al. [52], Feng et al. [26], McMurrough et al. [25]) or to carry out new tests including participants with disabilities (Feng et al. [26]).
Studies based on classification algorithms, aim to investigate possibilities to improve classification accuracy (Ghanem et al. [27], Karamchandani et al. [32], Missimer and Betke [23], Cristina and Camilleri [33]). The conversion of the proposed system into a web accessible technology (Missimer and Betke [23]) or the possibility of making solutions freely available for download (Paquette et al. [24]) are also mentioned.
Regarding the studies focused on the use of a wheelchair, Fine and Tsotsos [30] showed concern on investigating to enhance the operational safety of a wheelchair by preventing movements to a user’s blind spot direction. Another possible extension would be to recognize human emotions and to provide feedback about the reaction of the user to the actions of the wheelchair.
Diment and Hobbs [38] intend to explore the proposed KVAP as a therapeutic device, since it could be extended to help increase the range of motion of an impaired limb, in addition to being used to increase the physical activity of wheelchair users.
Among the studies aimed at mobile devices, Ghanem et al. [27] points out while smartphone hardware specs are expected to continue to improve rapidly, cloud processing could push the boundaries further ahead by alleviating the hardware requirements on the mobile device. However, maintaining interactivity and low latency while using cloud processing can also be challenging, and these are issues that future research could focus on.
To improve the communication possibilities by means of computers, Saleh and Berns [22] aim to include facial expressions, as well as body postures in nonverbal communication. The authors also suggest the use of a speech recognition system, the use of gestures of different cultures, and learning new gestures from people.
For Rivera and DeSouza [11], a problem still current with Assistive Technology resources is that the burden of adapting to these new technologies still falls on the user. It is time to change: “time to create Assistive Technology devices that require fewer intrusions and fewer electrodes; several modes of interaction at the same time, so that the user can switch from one mode to another in case of short-term fatigue or loss of specific muscles in the long term; and, finally, devices that can adapt perfectly to the user’s intention.” [11]. We add on these possibilities arguing for research that not only address people’s intentions to communicate, but are also able to learn from people’s characteristics, behaviors and reactions, helping even in situations where intentions are compromised by intellectual disabilities. We argue that solutions must evolve from personalization and customization approaches to a personification one where the solution is perfectly tailored for each user by learning from his/her different reactions and interactions with the solution, the environment, and other people.
Conclusion
This paper presented a systematic mapping of the literature on scientific contributions where Computer Vision is applied to improve the interaction or communication of people with motor impairments. The mapping resulted in the reading and categorization of 33 papers published between 2009 and 2019.
The main objective was to provide an overview of what has been investigated in the context of this area, highlighting the leading techniques and approaches used to gesture or expression recognition in recent academic researches. The research questions established for the mapping were answered, and a panoramic vision of the results was discussed, building relationships between the various contributions presented in the literature and their general implications for the research problem addressed in this paper.
The results of this systematic mapping show the importance of developing adaptable, flexible, and accessible solutions, and how CV has enabled the development of AT resources essential for the promotion of autonomy to users who have motor and speech difficulties.
Author contributions
CONCEPTION: Rúbia E. O. Schultz Ascari, Luciano Silva and Roberto Pereira
PERFORMANCE OF WORK: Rúbia E. O. Schultz Ascari
INTERPRETATION OR ANALYSIS OF DATA: Rúbia E. O. Schultz Ascari, Luciano Silva and Roberto Pereira
PREPARATION OF THE MANUSCRIPT: Rúbia E. O. Schultz Ascari, Luciano Silva and Roberto Pereira
REVISION FOR IMPORTANT INTELLECTUAL CONTENT: Luciano Silva and Roberto Pereira
SUPERVISION: Luciano Silva and Roberto Pereira
Ethical considerations
This study, as a literature review, is exempt from Institutional Review Board approval.
Footnotes
Acknowledgments
The authors thank Capes and CNPq by the support to this research.
Conflict of interest
The authors have no conflicts of interest to report.
