Abstract
Several robot-mediated therapies have been implemented for diagnosis and improvement of communication skills in children with Autism Spectrum Disorder. The proposed research uses an existing model i.e., Multi-robot-mediated Intervention System (MRIS) in combination with Hidden Markov Model (HMM) to develop an infrastructure for categorizing the severity of autism in children. The observable states are joint attention type (low, delayed, and immediate) and imitation type (partial, moderate, and full) whereas the non-observable states are (level of autism i.e., (minimal, and mild). The research has been conducted on 12 subjects in which 8 children were in the training session with 72 experiments over 9 weeks, and the remaining 4 subjects were in the prediction test with 25 experiments for 6 weeks. The predicted category was compared with the actual category of autism assessed by the therapist using Childhood Autism Rating Scale. The accuracy of the proposed model is 76%. Further, a statistically significantly moderate Kappa measure of agreement between Childhood Autism Rating Scale and our proposed model has been performed in which n = 25, k = 0.52, and p = 0.009. This research contributes towards the usefulness of Hidden Markov Model integrated with joint attention and imitation modules for categorizing the level of autism using multi-robot therapies.
Keywords
Autism spectrum disorder (ASD) is a neuro-developmental disorder that covers a wide range of impairments including social and cognitive developments (Ali et al., 2019). The diagnostic and Statistical Manual of Mental Disorders (DSM) published by the American Psychiatric Association has categorized the range of disorders for children with autism (Bell, 1994). The word autism is derived from the Greek word “autoismos”, “autos” (self), and “ismos” (action), referring to the children with extreme inability to relate with others, therefore limiting and impairing daily life communication and activity (Bell, 1994). The spectrum of autism is defined with borders that overlap normality on one end and extreme intellectual impairment caused by brain malfunctions on the other hand (Rapin & Tuchman, 2008; Wang et al., 2019). Autism has been divided into three different levels from mild as level 1 to severe as level 3 (Schopler et al., 1980). High functioning autism called mild autism, or “level 1” on the spectrum, is often described as Asperger’s syndrome. “Level 2” needs substantial support called as “autism” and “level 3” as severe autism in which the patient's social and communication skills are severely impaired. This research focuses on autism level 1 and level 2.
As per the report, ASD is the fastest-growing developmental disorder in the USA with a 6% to 15% increase in rate per year (Bonis, 2016). CDC began tracking the prevalence of autism and reports an increase in autism from 1 in 59 (2019) to 1 in 54 (2020) (Andréasson et al., 2020). This highlights the need for more advanced technological therapies such as robotic interventions to improve the communication skills of children. Along with awareness about this neurodevelopmental disorder, advancement in diagnoses and treatment shall also be focused on.
Psychiatric therapy was considered the most common approach for the treatment of ASD children. In this, a psychologist examines a child’s actions to identify the level of autism of different available autism rating scales. Based on which different cognitive therapies are suggested to improve the condition of the child (Eack et al., 2013). Recently, robots are being involved in these cognitive-behavioral therapies to enhance the focus and interest of the autistic child. Especially under the current COVID-19 situation, the current research trends are focusing online for technology-based therapies. For this purpose, robotic therapies including multi-robot therapeutic interventions are becoming popular among technology-based applications for children with ASD (Ali et al., 2019; Ali, Mehmood, Ayaz, et al., 2020; Ali, Mehmood, Khan, et al., 2020; Mehmood et al., 2019, 2020). This is because ASD children are more inclined towards robots because of their predictive behavior (Begum et al., 2016). The reason for robots gaining more attention for autism therapy is because of the controllable environment while using robots, accuracy, low cost, and adaptability to the environment (Y. Feng et al., 2017; Pennisi et al., 2016). These therapies are helpful towards improving the social and communication skills of children with ASD; however, no such model exists that can label the autism category based on the behavioral patterns of children with Autism Spectrum Disorder (ASD) using these technology-based applications. This research presents the application of the Hidden Markov Model (HMM) in the domain of autism spectrum disorder using robots. The proposed model predicts the autism category based on HMM using the performance of the child in joint attention and imitation modules as baseline parameters. This research uses the already existing Multi-robot-mediated Intervention System (MRIS) model for measuring the joint attention and imitation of children with ASD (Ali et al., 2019). Previously, for children with ASD, HMM has been used for automatically segmenting conversational audio into semantically relevant components (Yu et al., 2018), to redress the attention deficit in autistic children by solving the problem of focus attention (Motamed et al., 2015), influence of Autism on the functioning of the brain by quantifying statistical properties of the time-varying brain states (Dammu & Bapi, 2019). In another research, an attempt to determine a person’s level of autism using HMM was focused. However, this research failed to produce hidden Markov models that are indicative of a person’s level of autism (Lancaster Jr, 2008). In another research, an effective prediction model based was proposed based on the ML technique for predicting ASD for people of a young age (Omar et al., 2019). However, this technique did not focus on the level of autism using robotic therapy and therefore needs a therapist to conduct the intervention.
The research presented in this paper focuses is an extension of already existing work on the MRIS model (Ali et al., 2019). The interventions for robotic therapy use robots to address the core deficits of ASD i.e., joint attention and imitation rather than choosing free play as a mode of interaction, therefore, proofs be successful (Tariq et al., 2016). This work presents a novel model for predicting the two different levels of autism i.e., minimal, and mild. The observable states for this model use the results of joint attention and imitation skill improvement of the child. The current model uses the previously presented renowned MRIS architecture to measure the improvement using multi-robots. The proposed model deduces the inference related to unobservable states (level of autism: minimal, and mild autism level) using observable states (joint attention and imitation performance). Based on the results, the parameters chosen for visible state in HMM model were able to estimate the category of autism successfully i.e., the hidden layer of HMM model.
Architecture for Autism Categorization
The proposed architecture uses HMM for the categorization of two autism levels i.e., minimal, and mild. The current model uses MRIS architecture (Ali et al., 2019) from previous research to measure joint attention and imitation of the child using multi-robot interactive therapy. The MRIS architecture is designed to focus on two core impairments i.e., joint attention and imitation. The joint attention model of MRIS uses three cues based on least to most (LTM) order i.e., visual, speech and motion cues. The imitation model of MRIS implemented in the current research is adaptive as it uses joint attention for the activation of the robot in this module. After eye contact of the child is established with the robot, the robot starts the imitation tasks that includes: moving forward, moving backward, raising hands and hands down gestures. These motion gestures are imitated by the child and are measured using Kinect to calculate the success rate. Based on this, the current HMM architecture predicts the category of the autism spectrum disorder (hidden states/transition probabilities) using information about the performance of children in joint attention and imitation modules (observable states/emission probabilities) as shown in Figure 1.

The 2-Step HMM-Based System Architecture Explaining Observable and Non-Observable States in Multi-Robot Therapy for Children With ASD.
In Figure 1, a two-layer network is introduced. Layer one comprises of all observable states while layer two comprises non-observable states. We have categorized joint attention (JA) and imitation (IM) into three different categories as shown in Table 1. Table 1 explains two main evaluation parameters as observable states i.e., joint attention and imitation. Furthermore, categories for each parameter are mapped with the percentage performance of the child. Categories for joint attention are low, delayed, and immediate with the success rate of “≤50%”, “>50% and <80%”, and “≥80% and ≤100%”. The categories for imitation module are: partial, moderate, and full with success rate of “≤50%”, “>50% and <80%”, and “≥80% and ≤100%”. For category of joint attention, “low” represents least level of accuracy, delayed represents medium and immediate represents the quickest response to stimulus. Similarly, for imitation module: partial, moderate, and full represents least to most in terms of success rate.
Mapping Among Different Categories Related to Percentage Performance.
The model is estimating the category of autism (hidden state) via observable states (joint attention and imitation). All types of probabilities are shown in Figure 2. The observable states have been divided into two different categories i.e., joint attention and imitation. Further, joint attention and imitation have been divided into three different types of response i.e., low, delayed, immediate, and partial, moderate, and full. These categories have been deduced in discussion with the therapist and according to the child’s performance in joint attention and imitation modules. This technique focuses on the prediction of the level of autism using robotic therapy, and therefore does not needs a therapist for intervention or categorization prediction.

Detailed Hidden Markov Model (HMM) for Autism Categorization.
The equations which help in finding out the posterior probabilities in HMM model represented in Figure 2 are given:
Probabilities Notations
Probability notations used for experimentation are represented by Equation 4 to Equation 7:
Equation 5 shows that the probability for category of the child is mild given that joint attention and imitation belong to immediate and full categories, respectively. Therefore, for the probability of mild category, Equation 6 becomes:
where
and
Similarly, the probability for minimal category of autism is shown in Equation 7. It can be calculated as follows:
where
and
Hardware for Robotic Therapy
The multi-robot intervention for joint attention and imitation uses two NAO humanoid robots for therapy. NAO robots are the most popular choice for therapeutic interventions because of their human-like appearance and programmability options (Andréasson et al., 2020).
The therapy was based on MRIS protocol for both interventions (Ali et al., 2019). This research uses NaoqiPeoplesPerception module from Naoqi SDK. The API offers the module ALGazeDetection which provides information about the human’s gaze behavior. The joint attention module allows to analyze the direction of the gaze of the child, in order to know if he/she is looking at the robot or not. For this purpose, gaze tracking is done using NAO robots’ cameras to calculate (1) Delay in making eye contact with the robot and (2) Time duration for which eye contact is made. In second module i.e., imitation of the child was recorded and evaluated by Kinect based on the joint movements of the child. The child’s imitation (by Kinect) and robot’s imitation was compared to see if the child has imitated the action or not. Real-time tracking of joints of ASD child is done using Kinect whereas the robot was programmed using NAO API to perform imitation tasks.
Joint Attention (JA) Module
The joint attention module of the MIRS system (Ali et al., 2019) provides three different types/levels of cues in the least to most (LTM) order to ASD children: visual, visual + speech, and visual + speech + motion. The visual cue comprises two types of visual cues: “Rasta” (changing eye color of the robot in a cyclic manner) and “Blinking”. At the second level, speech cues: ‘‘hi’’ and ‘‘Hello’’ along with visual cues are added. At the third level, motion cues: ‘‘Move forward’’, ‘‘Move backward’’, ‘‘Stand-up’’, and ‘‘Sit-down’’ are added along with visual and speech cues. At each stage, the child’s joint attention is noticed using NAO’s cameras.
Imitation Module (IMI)
The imitation module of the MRIS system uses the child’s joint attention to activate the imitation module for both robots (Ali et al., 2019). The child is required to focus on a robot for at least 5 seconds to activate it. After eye contact is established with a particular robot, the robot starts imitation tasks i.e., “Move Forward”, “Move Backward”, “Raise Hands”, “Hands Down”. The child is expected to imitate the motions of the robot and accuracy of imitation is noticed by using Kinect. The child can activate any one of the two robots based on his/her choice to make eye contact.
Materials and Methods
Subjects
Twelve ASD children had been recruited from Autism Resource Center (ARC). The study was approved by the autism specialist and director board of ARC. The recruited participants were also evaluated clinically based on Childhood Autism Rating Scale Schedule (CARS) by the autism experts. Parents have also signed the consent form for the discussed intervention. Among 12 children (11 males and 1 female), 8 children were for the training session of the Hidden Markov Model (HMM), and the remaining 4 were used in the prediction test randomly. The age of children ranged from 4.2 to 7.5 years with an average of 6.5 years (M = 6.5, SD = 0.98 years) and Asian background. The standard deviation for subjects’ age is 0.98. Children who participated in the experimentation were from mild and minimal category only. The rationale for choosing specific population is since as the study was only focused on children with ASD, therefore subjects under the age of 8 years were considered for this research. Table 2 shows the details 12 ASD participant which includes age, gender, type of autism and average performance in joint attention and imitation modules for a robotic intervention. Based on these values, the model predicts the category of autism for the child. This has been further explained in Table 4.
Subjects’ Details.
Details of Participants and Number of Experiments.
Comparison of Actual and Predicted Autism Category.
Experimental Setup
An overview of system experimental setup is shown in Figure 3. It represents the arrangement of child and robots during the intervention. The robots were kept at 1 m from the child in an arc like arrangement. During the joint attention module, the child sat on a comfortable chair, however for imitation module the child had to stand in order to imitate the actions performed by the robot. These actions were recorded by Kinect placed behind the robots at a suitable distance in order to record the action. Total 97 experiments were performed for joint attention and imitation module for 12 children with ASD. 72 experiments were conducted during the training session for 8 children whereas 25 experiments were performed on 4 children with ASD for testing session. Total duration for experimentation was 15 weeks.

Proposed System Architecture.
Regarding interpretation about how the groups were determined, the types of groups (Mild/Minimal) were determined by two different ways in testing session: (1) by CARS scale, and (2) by trained HMM model. Using information of JA and IM performance of ASD children (from 25 experiments) as input to the proposed trained HMM model, group (Mild/Minimal) were predicted. Later, the type of predicted groups was matched with the one available through CARS, therefore determining the accuracy of our proposed trained model. The experimental details for the training and testing session are:
Training Session
Number of autistic children = 12
Number of autistic children who participated in the training session of Hidden Markov Model = 8
Number of experiments conducted for training session = 72
Number of experiments performed by each child (selected for the training session) = 9
Number of weeks experiments were conducted for training session = 9
Testing Session
Number of autistic children who participated in the testing session of Hidden Markov Model = 4
Number of experiments conducted for testing session = 25
Number of experiments performed by each child (selected for the testing session) = 6 (approximately)
Number of weeks experiments were conducted for training session = 6
Whole duration of experiment = 9 + 6 = 15 weeks.
The details of experimentation and participants is reflected in Table 3.
Figure 3 shows the overview of the system’s setup for therapy for both joint attention as well imitation module using multi-robot interaction. The two robots were placed in front of the child at approximately 1 m from each. The child sits on a comfortable plastic chair during the joint attention module of the MRIS model to make eye contact with the robot (Ali et al., 2019). In the imitation module, the child stands in front of the robots to perform the imitation tasks of the MRIS model (Ali et al., 2019). During the intervention, Kinect was used for measuring the imitation skills of the child whereas the robot’s camera was used for measuring the joint attention of the ASD child. To make the intervention replicable, the participant setting for the experiment is shown in Figure 4.

Interaction of ASD Child With Multi-Robot System From Experiment: (a) Joint Attention Module and (b) Imitation Module.
Results and Discussion
Average joint attention and imitation performance (overall experiments) along with categorization of each subject is shown in Figures 5 and 6, respectively. The results show the response of each subject for joint attention as low, delayed, or immediate whereas, for imitation, the results are represented as partial, moderate, and full depending upon the number of imitations done correctly by the ASD child. Categorization of joint attention and imitation regarding the percentage of success is depicted in Table 1. The relation between categorization and percentage success rate was discussed with therapists and autism experts. Table 4 shows the details of all the subjects along with the actual as well as predicted autism category by the proposed model. Actual and predicted category details of each subject for both joint attention and imitation modules are shown in Table 4. In Table 4, the category evaluated by autism experts is represented by “actual category of autism” whereas “predicted category of autism” is based HMM algorithm that uses the joint attention and imitation categories represented in the table. The joint attention and imitation categories are based on child performance as given in Table 1. Average performance in joint attention and imitation module were 65.47% and 76.19% respectively. In 19 out of 25 instances, the predicted category of autism matched the actual category identified by the autism expert. The percentage accuracy for the algorithm was 76% as shown in Table 5.

Average Joint Attention Performance of ASD Children.

Average Imitation Performance of ASD Children.
Summary of Result.
Researchers have been developing several technical tools for the support of children with ASD. Early diagnosis and proper interventions play a vital role in improvement of communication and social skills of an ASD child. However, clinical inspection for early age diagnosis of ASD in young children is still a challenge. Therefore, social robots are one of the most popular techniques to treat autism. This research focuses on predicting the category of autism using Hidden Markov Model (HMM) for a robot led therapy. A lot of research has already been done regarding prediction of autism using HMM based on various clinical factors e.g., likelihood of autistic parents generating autistic children (Carvalho et al., 2020). In another research, HMM model was used for classification analyses to understand face exploration dynamics in boys with ASD (Vettori et al., 2020). Similarly, in another research autism was predicted based on skeleton driven action recognition (Silva et al., 2021). However, all these models focused on predicting the category of autism unlike the current proposed model that works for predicting the level of autism. The observable states in the proposed model are based on robotic interactions rather than clinical findings. Moreover, the current model uses two main impairments in its observable state i.e., joint attention and imitation, unlike previous models that uses only one parameter usually. Based on the results and statistical analysis, it was found that the proposed model is significant in predicting the correct level of autism in an ASD child.
Similarly, another research discusses about the reliability of HMM and VMM models for distinguishing between gaze patterns of TD and ASD children (H. Feng, 2014) while our work refers about the reliability of HMM model for discriminating between the categories of ASD children (minimal and mild). Differentiating various levels of autism has not been done previously. Moreover, H.Feng et al., focuses on using HMM and VMM models on gaze patterns of ASD children in visit session so to take decisions (manually) about the tasks that should be adopted in intervention in order to improve their targeted social skills (e.g., basic question understanding, joint attention, emotional facial expressions recognition). However, the proposed work focuses on using HMM model on joint attention and imitation skills of ASD children in all conducted test sessions that are used to categorize the severity level of autism unlike the decision about tasks to be used in intervention. Moreover, H.Feng et al., research involved manual labelling of gaze responses using single robot while in our research was based on multi-robot interaction and the data collection for joint attention and imitation was programmed using libraries of NAO for gaze analysis and Kinect.
Unlike previous research, the proposed model uses multi-robot interaction, representing a triad human communication scenario, a common social trend to predict the autism category of the ASD child. The implemented MRIS model addresses two core impairments i.e., joint attention and imitation that are further used in the proposed HMM model. The presented model categorizes and predicts the level of autism in children with ASD, therefore, explores if the HMM model based on MRIS can help psychologists to categorize the level of autism. This paper contributes to literature in terms of reliability of HMM model for categorizing the severity level of autism with statistically significantly Kappa measure of agreement between CARS and our proposed model. Moreover, this research presents the first prediction model for categorizing autism based on multi-robot interaction for two impairments i.e., joint attention as well as imitation using MRIS. Previously, no research has focused on predicting the severity of autism in a multi- robot interaction scenario for multiple skill training parameters of an ASD child.
To access the accuracy of HMM model presented in this article, during training session 72 experiments were performed on 8 subjects. For the testing session, a total of 25 experiments were performed on 4 subjects. Therefore, total experiments performed were 97 for 12 ASD children. The presented work is based on previous research that also uses a similar number of subjects (Ali et al., 2019). Furthermore, to ensure the correctness of the results, we have used the suitable statistical analysis techniques so that we can interpret the findings correctly. The reliability of the proposed model is also verified using the Kappa measure of agreement (k). A statistically significantly moderate agreement has been found between the CARS and HMM for categorizing level of autism: n = 25, k = 0.52, p = 0.009.”
Conclusion
Autism spectrum disorder is a neurodevelopment disorder that affects the communication, social skills along with the developmental delays in the child. One of the main impairments considered in children with ASD is a lack of visual coordination and focus (Mundy, 1995; Mundy & Gomes, 1998). Robots are used in the intervention of cognitive therapies to increase the focus (Kim & Paul, 2012) and imitation skills of ASD children (Fujimoto et al., 2011). Apart from autism rating scales used by the therapist such as the Diagnostic and Statistical Manual of Mental Disorders, currently DSM-V (American Psychiatric Association, 2013) and Autism Diagnostic Observation Schedule (ADOS) (Lord et al., 2008), etc., the proposed robotic therapy uses the HMM-based model for categorizing the level of autism between minimal and mild. The uniqueness of the model comprises the fact that it uses two main impairments i.e., joint attention and imitation during the robotic therapy to predict the level of autism without requiring a therapist. The two impairments are categorized based on the child’s performance. The performance of joint attention is categorized as low, delayed, or immediate whereas imitation is categorized as partial, moderate, and full. The categorization in both cases depends on the performance of child during intervention. The probabilities of observable state are then fed into Hidden Markov’s Model which predicts the level of autism in the child. The predicted category is then compared with the actual categorization based on Childhood Autism Rating Scale (CARS) by the psychologist. The proposed model was tested on 12 children with ASD. For 25 trials over a period of 15 weeks, the accuracy achieved was 76%. The trial experiments give an evidence that robots integrated with Hidden Markov model are useful in studying the categorization of severity level of the ASD children.
Footnotes
Acknowledgments
The authors would like to acknowledge the support of the Autism Resource Center to conduct experiments and especially the parents and children who participated in this study.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
