Abstract
Intelligent service robots are being developed for emerging areas of robotics applications. Human-friendly interactive features are preferred for these service robots since these robots are anticipated to be operated by non-experts. Humans prefer to use voice instructions for exchanging the ideas between peers. Such voice instructions often include distance and direction related language descriptors that are fuzzy in nature. Therefore, these service robots must be capable of interpreting the meaning of such fuzzy notions in language instructions in order to enhance the rapport between the robots and their users. This paper proposes a method to interpret the directional notions in motional and positional navigational commands by considering the fuzziness associated with linguistic notions. A fuzzy inference system has been developed in order to adapt a robot’s perception of fuzzy directional notions based on the environment. This adaptation is realized by weighting the output membership function with the distribution of free space around the robot or a reference object. Experiments have been conducted in an artificially created domestic environment with heterogeneous characteristics. According to the experimental results, the proposed system is capable of enhancing the understanding of navigational commands with fuzzy notions.
Keywords
Introduction
An intelligent service robot is a machine that is able to perceive the environment and use its knowledge to operate safely in a meaningful and purposive manner [2]. Developing intelligent service robots with human-like interactive features has been a major research niche in the field of robotics for a quite some time due to their potential use in emerging fields of robotics applications; health care [15], rehabilitation [8], caretaking [5], entertainment [1], etc. However, the development of intelligent service robots with human-friendly interactive features is a challenging and complicated task since human-like cognitive abilities should be embodied into the design of the service robots in order to impersonate human-like interactive features on them [31].
Humans prefer to use voice directives in order to convey the instructions between peers. Therefore, human-like voice communication between human users and robotic systems would enhance the rapport between the robots and the users [20,25]. However, the information conveyed through voice instructions is often ambiguous due to the inclusion of uncertain information, lexical symbols and notions. As an example, humans prefer to use the command, “move far to the left” instead of “move 120 cm with 25 degrees with north”. The linguistic notions related to distances and directions are often fuzzy in nature since the meanings are subjective to various factors such as the arrangement of the environment, experience, and context. Therefore, human-like interpretation ability of such uncertain information is mandatory for a human-friendly service robot.
Methodologies have been proposed in order to operate robotic systems using natural language voice instructions [10,32]. However, the methods are mainly proposed for enhancing the understanding of natural language representations and the methods for inferring the quantitative meaning of fuzzy information in voice instructions are not discussed in those approaches. The method proposed in [29] is capable of describing the surrounding environment through dialogues that have uncertain information related to the spatial information. However, the quantitative meaning of uncertain information is fixed and hence, the interpretation is not human-like. The methods proposed in [4,26] use a fuzzy neural network that quantifies fuzzy notions in natural language user instructions. However, the meanings of fuzzy notions are predetermined since quantitative meanings of fuzzy notions are defined as a linear modification factor that depends on the current state of the robot. The method proposed in [17] is capable of quantifying distance related uncertain information in user instruction through a fuzzy inference system. This fuzzy inference system considers the distance between the robot and its user for the interpretation. However, the interpretation is predetermined since the membership functions are fixed entities defined based on expert knowledge. The method proposed in [16] assumes that the meaning of uncertain information related to distances depends on the immediately previous state of a robot and hence it evaluates quantitative values for uncertain information through a fuzzy inference system that considered the immediately previous movement. However, according to [13], the immediate past state is not enough for effective interpretation of uncertain information and it proposed a method to adapt the perception of uncertain information based on a set of previous states instead of the immediate previous state. Methods have been proposed for adapting the perception of a robot about uncertain information towards the perception of its user by acquiring knowledge through fuzzy neural networks that learn from user feedback [14,19]. However, the above-mentioned methods are not capable of adapting the perception of uncertain information according to the environment even though the meaning of uncertain information directly subjective to the arrangement of the environment. Therefore, the interpretation of uncertain information by those systems is not human-like.
The method proposed in [12], is capable of quantifying the meaning of distance related uncertain information in object manipulation instructions by adapting the perception based on a fuzzy inference system that considers the average distance between objects on its vision field. The method proposed in [27] is capable of inferring the meaning of uncertain positional information related to distances such as “close” and “far” by scaling the meaning according to the size of the frame of interest. Only depending on the size of the room for adapting the perception of uncertain information in navigational instructions is not effective [21]. The cited work proposed a method to adapt the perception of uncertain information related to navigational commands based on environmental factors such as the size of the room, available free space and the arrangement of objects. However, the system is only capable of interpreting fuzzy notions related to distances and meanings of directional notions are fixed despite the natural language directional notions inherit fuzziness [28,30]. Similarly, many methods have been developed to interpret uncertain information in language instructions [24]. However, the scope of those approaches is mainly limited to the interpretation of distance related uncertain notions in language instructions and methods assume that the meaning of directional notions is fixed [24]. Therefore, the interpretation of navigation commands is not effective and human-like in those approaches when determining the exact direction of the movement symbolized by the language notions such as “left”, “right” and “front”.

System overview.
The method proposed in [23] is capable of interpreting the required direction of a movement symbolized by language descriptors based on the surrounding environment. The main argument of the method is that humans tend to move towards an area where the congestion of objects is low. The direction of the movement symbolized from the language descriptors is interpreted by a fuzzy inference system that is capable of modifying the perception of a robot about the directional notions according to the surrounding environment of the robot. This modification is done by weighting the output membership function of the fuzzy inference system with the distribution of the free space around the robot. Even though the system is capable of replicating the natural directional perception of humans that depends on the environment to a greater extent, the system is only capable of interpreting the directional notions in simple motional commands such as “move little left” which express directions with respect to the robot. The system is not capable of interpreting the directions expressed with the aid of landmarks/objects in the environment when a positional command is issued to the robot. For example, the system is capable of interpreting the motional command “move little right” but not the positional command “move near to the left of the table”. Furthermore, the concept has not been demonstrated for operations in a domestic environment with different characteristics.
Therefore, this paper proposes a method to interpret the directional notions in both motional and positional navigation commands by adapting the robot’s directional perception according to the environment. Moreover, the capabilities of the system proposed in [23] have been improved in order to interpret the positional navigation command such as “move near to the right side of the chair”. The proposed system is capable of interpreting uncertain language notions related to distance and direction in navigational commands based on the characteristics of the environment. Furthermore, the proposed system has been designed in such a way that it can be operated on a typical domestic environment with heterogeneous characteristics. The functional overview of the proposed system is presented in Section 2. The fuzzy notions interpretation system is explained in Section 3 with the rationales behind the used techniques. Particulars on experimental validation of the proposed system are discussed in Section 4. Finally, concluding remarks including future improvements are given in Section 5.
The overall functionality of the proposed system is depicted in Fig. 1. The goal of the system is to provide an effective way to navigate a robot by using natural language instructions that include uncertain information related to distances and directions such as “move a little to the right” and “move near the left of the TV”. The voice instructions issued by the user are converted into text and then parsed by the Voice Recognition and Analyzing section. Voice recognition functionality is implemented using the Speech Recognition 3.1 library,1
which converts voice into text with the support of Google Speech Recognition. The voice responses of the robot are generated by the Voice Response Synthesizing section that is a text to speech converter implemented using Microsoft Speech API. The keywords, lexical symbols, and basic dialogue patterns required for construing the language instructions and responses are stored in the Language Memory. The interaction between the robot and a user is managed by the Interaction Manager (IM) by determining the required robot actions. The IM has been implemented as a finite state intention module as explained in [22]. The Action Planner executes the required sequence of the robot actions for accomplishing a particular user instruction with the aid of the Action Knowledge Base. with the aid of the Action Knowledge Base. The fuzzy notions in a particular user instruction are interpreted by the Fuzzy Information Interpreter (FII) that has two modules for interpreting distance and direction related uncertain notions respectively. The module used for interpreting fuzzy information related to distances has been implemented similarly to the system explained in [21], and it has two submodules for interpreting uncertain information related to motional and position information. The required environmental parameters for the interpretation of uncertain information are fed into the FII by the Environment Information Organizer (EIO).The low-level navigation functionalities of the robot such as localization and path planning within a given navigation map are handled by the Navigation Controller. The required navigation maps can be created from Mapper 3 software. The Sensor Handler deals with the low-level sensors of the robots such as range sensors. The EIO organizes the robot’s knowledge about the environment by extracting the information from the Sensor Handler and the Navigation Maps.
Structure of the user commands
The ability to issue flexible user instruction to the robot enhances the overall rapport between a robot and its user. Therefore, the command parsing ability of the system proposed in [23] has been improved in this work in order to provide the flexibility to users in issuing navigation instructions. Furthermore, this enhancement allows the users to issue positional navigation commands to the robot. The command parsing is done by analyzing tokens in a given user instruction with the keywords, lexical symbols, and basic grammar patterns available in the memory as similar to the system explained in [21]. The redundant words in a user instruction such as articles are filtered out before parsing it. Structures of motional and positional navigational commands accepted by the system are given below (without redundant words) in JSpeech Grammar Format [11].
The command type,
The command type
Interpretation of directional notions
Rationale behind the proposed method
The navigation system proposed in [21] assumes that the meaning of directional notions is fixed as explained in Fig. 2. Figure 2(a) shows an instance where the directions are defined with respect to the robot and this definition is used when the robot is commanded with a motional command. According to the definition, the meaning of language descriptors for the directions are fixed as “forward” exactly in the direction of the current heading of the robot (i.e., θ), “left” as

This illustrates how the directional notions are defined in the system proposed in [21]. The shaded color areas represent the objects in the environment. (a) represents a situation where the directions are defined with respect to the robot. In here, θ is the heading angle of the robot. (b) represents a situation where the directions are defined with respect to a reference object. In here, the direction of the front with the X-axis is annotated as θ. The orientation frame is considered based on the point of view of the robot for this kind of instance.
The method for interpreting the directional notions in navigation command has been designed with a single input and single output fuzzy inference system that is capable of perceiving the environment for adapting the perception. The directional keyword in a particular command (i.e.,

(a) shows the input membership function of the Direction Interpreter (DirI). It has singleton sets to represents the direction keywords (
Therefore, in order to adapt the perception of the directional notions according to the environment setting, the output membership function is modified based on the distribution of free space around the robot (if

This explains the ways to obtain
The center of ith Direction Set,
Due to the weighting with free space distribution, the range of the activation degree of a Directional Set
In the rule base, ith direction keyword in the input membership function is directly mapped to the ith set of the output membership function yielding to a single input single output fuzzy system. The aggregation of the output fuzzy sets is done by considering the fuzzy union operator. The required crisp output of the fuzzy inference system, ψ is obtained from defuzzifying the aggregated output membership function using the center of area method. Then, the defuzzified output of the fuzzy inference system, ψ can be obtained as given in Eq. (5), where
The output of the Distance Interpreter, ϕ can be obtained as given in Eq. (6) and the meaning of ϕ depends on the type of the corresponding user command. If the user command is a motional command, then ϕ is the interpreted moving direction for the robot, and it is achieved by changing the heading angle of the robot to ϕ. If the user command is a positional command, then ϕ is the angle to the destination position of the robot measured around the center of the reference object from the X-axis.
Distance-related fuzzy notions in user commands are interpreted by the Distance Interpreter (DisI), which has been implemented with two submodules for interpreting the distance related uncertain descriptors in motional commands and positional commands. This module has been implemented with two fuzzy inference systems similar to the system proposed in [21].

This depicts the functional overview of the fuzzy inference systems used in the Distance Interpreter (DisI) for interpreting distance related fuzzy notions in user commands. These two fuzzy inference systems have been implemented similar to the system proposed in [21] for interpreting distance related uncertain information navigation commands. (a) depicts the overview of the fuzzy inference system in motional submodule of the DisI. (b) depicts the overview of the fuzzy inference system of the positional submodule of the DisI.
The functional overview of the Motional submodule of DisI is depicted in Fig. 5(a). This submodule of the DisI is used to interpret fuzzy distance notions in motional navigation commands (i.e.,

This explains the parameters related to the perceptive distance, D. In motional module,
The functional overview of the Positional submodule of the DisI is depicted in Fig. 5(b). This module is used to interpret the fuzzy distance notions in positional commands (i.e.,

This shows MIRob during experimental scenarios.

The initial and final positions of the robot during the execution of the cases given in Table 1 are marked on the map with corresponding letter indexes. This map is drawn to a scale in order to visualize the characteristics of the experimental environment. However, it should be noted that the markers do not represent the actual size of the robot.
Experimental setup
The proposed system has been implemented on MIRob platform [22] and experiments have been conducted on an artificially created domestic environment for the evaluation of the behavior and performance. MIRob during few experimental instances is shown in Fig. 7. The arrangement of the experimental environment is given in the map shown in Fig. 8. It had 3 different rooms with heterogeneous characteristics. At the start, the robot was initialized with an updated navigation map of the environment. Furthermore, the lexical labels and the locations of the objects in the environment had been taught to the robot through discussion as explained in [22]. Therefore, the robot was well aware of the arrangement and the characteristics of the environment during the experiments.
In order to increase the computational efficiency of the direction interpreter, α is considered as a discrete variable instead of a continuous one. Furthermore, this discretization simplifies the implementation complexity of the system. Therefore, the weighting of the default perception with
The performance of the proposed system (i.e., the system with the adaptable directional perception) has been evaluated against a system with a fixed directional perception (i.e., system similar to [21]). The evaluation has been conducted with the aid of a user study. Due attention has been paid to the guidelines and recommendation given in [3] for designing and performing human studies for human–robot interaction experiments to minimize the subjectivity of outcomes since the user studies are highly subjective in nature.
Sample results of the experiment: Parameters related to the interpretation of directional notions by the DirI
Sample results of the experiment: Parameters related to the interpretation of directional notions by the DirI
The destination positions are decided based on the outputs of both DisI and DirI. The parameters related to the DisI in interpreting the distance notions in the corresponding cases are given in Table 2.
It should be noted that the effects to the interpretation from
The user study has been conducted with the participation of 12 users whose mean and standard deviation of age are 26.8 and 4.1 years respectively. The users were taken one by one to the experiment, and they were advised about the structures of the user commands that can be understood by the robot. Each user has been given 6 occasions to interact with the robot for each of the two systems (i.e., system with fixed directional perception and adaptable directional perception). These instances were chosen by randomly deciding the initial position of the robot. The users were given the freedom to decide user instructions. However, the users were asked to include 3 motional commands, and 3 positional commands for those 6 instances and the same 6 instances were repeated to the other system. In order to minimize the subjectivity, the users were not informed about the system (either with fixed directional perception or with adaptable directional perception) that they are interacting in a particular run. After each run, the user was asked to rate the action of the robot in the scale 0–100 similar to the evaluation approach used in [15], where 100 indicates the perfect agreement and 0 indicates the null agreement. This kind of evaluation method has been chosen over the evaluation method proposed in [23] because the action of the DirI modifies the perceptive distance which affects the output of the DisI and hence, the evaluation should be conducted for the entire action of the robot instead of merely the interpreted direction like in [23]. A User Rating (UR) given by a user depends on the final position of the robot. Therefore, it reflects the assessment of both direction and distance interpreted by the robot.
The results obtained from the 1st user for 6 runs using both systems are given in Table 1 as sample results. The parameters related to the DisI for the corresponding cases are given in Table 2. The initial and final positions of the robot during these experimental runs are marked on the map shown in Fig. 8 with corresponding indexes given in Table 1. The modified output membership functions of the DirI due to the weighting with the free space in these cases are shown in Fig. 9.
Parameters related to the interpretation of distance notions by the DisI
Parameters related to the interpretation of distance notions by the DisI

The output membership functions plotted here show the adaptation of the perception of directional notions after weighting the default perception with the available free space for the cases given in Table 1. (a), (b), (c), (d), (e), and (f) represent case 1, 2, 3, 4, 5, and 6 respectively. It should be noted that only the effective Direction Sets (DS) are plotted here for a particular instance and non-effective
In case 1, the robot was initially placed on the location ‘
In case 2, the initial position of the robot was ‘
In case 3, the robot was commanded with the motional command, “move far forward”. In the system with fixed directional perception case, the robot moved to location ‘
In case 4, the initial position of the robot was ‘
In case 5 and 6, the robot was commanded with positional commands. The system with adaptable directional perception runs, the robot moved to locations which are slightly deviated towards low congestions areas with respect to the moved positions in the system with fixed directional perception runs. The system with adaptable directional perception got a higher UR with respect to the system with fixed directional perception. However, in case 5, the URs for the two systems were almost the same (71 and 72) even though the direction had a slightly large deviation (deviation was 13°). In the system with the fixed directional perception case, the robot was not exactly settled on the direction deicide by the system since the robot cannot reach that position due to the limitation of the space for the occupancy of the robot. Therefore, the location was already deviated towards the free area due to that. This would be the reason for getting almost the same user rating for the two systems.

(a) shows the mean values of the user ratings for the two systems with error bars. The error bars represent the standard error. (b) shows the distribution of user ratings as a boxplot. The boxplot has the usual standard notation; box: Interquartile range, horizontal line: Median, whiskers: Minimum and maximum, and plus sign: Outliers.
Similarly, all the 12 participants were asked to operate the robot 6 runs for each of the system (i.e., with the system with fixed directional perception and system with adaptable directional perception). This yields to 72 effective cases for each system. The mean value of the User Rating (UR) was calculated for both the system based on the individual UR for each instance. The calculated mean UR scores for the two systems are given in Fig. 10(a) with error bars. The distributions of the UR scores are given in Fig. 10(b) as boxplots for better visualization of the results. The system with the adaptable directional perception (i.e., system proposed in this paper) got a mean user rating of 77.7 while the system with fixed direction interpretation got mean UR of 56.2. The difference between the means of UR is statistically significant (
Many approaches have been developed to improve the interpretation of fuzzy notions in language instructions by adapting the perception according to environment [24]. However, the scope of the existing approaches is limited to adapting the perception of distance notions and the directional perception is assumed to be fixed [24]. Moreover, the existing approaches are not capable of inferring the fuzziness inherited with directional linguistic notions when interpreting a user command. The proposed method of this paper is capable of adapting both distance and directional perception based on the environment by considering the fuzziness associated with them. According to the experimental results, the adaptation of both entities significantly improves the navigational command understanding ability of a robot. Therefore, the proposed method surpasses the abilities of the state of the art approaches in this particular research niche. This is the key improvement of the proposed work, and this work improves the state of the art in terms of the scope of this working niche.
The overall performance of the command understanding ability of the robot depends on the voice recognition accuracy. Therefore, remedies have been taken to minimize the adverse effects caused to the evaluation of the proposed method due to the issues in voice recognition. In order to improve the voice recognition accuracy, a wireless headset microphone, which can be placed close proximity to mouth, was given to the users to issue the voice instructions during the experiments. The voice recognition accuracy is around 70–80%. Therefore, there could be situations where the robot misinterprets user instructions due to the issues in voice recognition. For proper evaluation of the proposed concept, the ambiguities arisen due to the issues in voice recognition should be cleared out. Therefore, as a remedy for this, the system has been designed in such a way that the robot requests a confirmation of the received user instruction command from the user before the execution of actions. If the voice instruction is not properly recognized, the user can repeat the instruction. If the instruction is correctly recognized, the user can confirm it to the robot and then the robot will execute the actions to fulfill the user instruction. The robot asks for a confirmation after receiving an instruction. Hence, it may impose some amount of overhead to users. However, both systems used for evaluating the performance (i.e., the system proposed in this paper and the system proposed in [21]) have the same behavior in Voice Recognition and Analyzing section. Therefore, the effects caused to the evaluation due to this adverse effect are nullified in overall comparison. Moreover, it can be concluded that the experimental evaluation has properly assessed the performance gain in the interpretation of fuzzy notions in commands without depending on the issues related to voice recognition.
Conclusion
A method has been proposed for enhancing the interpretation of fuzzy notions in motional and positional navigation command by adapting a robot’s directional perception based on the environmental setting. The major improvement of the proposed system over the existing approaches is the system is capable of interpreting the directional notions in motional and positional navigation commands by considering the fuzziness associated with natural language descriptors instead of fixed interpretations.
The directional notions in user instructions are interpreted by a fuzzy inference system that has been designed in such a way that it can replicate the natural human behavior. The perception of fuzzy directional notions is adapted by modifying the output membership function of the fuzzy inference system according to the available free space around the robot or the reference object.
Experiments have been conducted in order to evaluate the performance improvement caused to the understanding of navigational commands by the robot due to the deployment of the proposed method for adapting the robot’s directional perception. The performance of the system with the adaptable directional perception (i.e., the system proposed in this work) has been compared against a system with a fixed directional perception (i.e., a system similar to [21]) through a user study. According to the obtained experimental results, fuzzy navigational command understanding ability of the system with the adaptable directional perception surpasses the ability of the system with fixed directional perception with a significant margin.
The proposed system has unimodal interaction abilities. Moreover, the interaction between the robot and a user is limited to voice communication, and the system is not capable of grabbing information conveyed through nonverbal instructions that may accompany with a voice instruction for decision making. The interpretation of uncertainties in voice instructions could be improved by fusing information conveyed non-verbal means such as pointing gestures [33]. Furthermore, nonverbal cues can be used to identify the intention of a user or change the attention towards a direction/position [7,18]. Therefore, mechanisms that are capable of evaluating nonverbal cues could be fused with the proposed method to improve the interpretation ability of fuzzy notions further. The establishment of multimodal interaction abilities to improve the interpretation ability of fuzzy notions is proposed for future work.
Footnotes
Acknowledgements
This work was supported by the University of Moratuwa Senate Research Grant No. SRC/CAP/17/03.
