Improving the understanding of navigational commands by adapting a robot’s directional perception based on the environment

Abstract

Intelligent service robots are being developed for emerging areas of robotics applications. Human-friendly interactive features are preferred for these service robots since these robots are anticipated to be operated by non-experts. Humans prefer to use voice instructions for exchanging the ideas between peers. Such voice instructions often include distance and direction related language descriptors that are fuzzy in nature. Therefore, these service robots must be capable of interpreting the meaning of such fuzzy notions in language instructions in order to enhance the rapport between the robots and their users. This paper proposes a method to interpret the directional notions in motional and positional navigational commands by considering the fuzziness associated with linguistic notions. A fuzzy inference system has been developed in order to adapt a robot’s perception of fuzzy directional notions based on the environment. This adaptation is realized by weighting the output membership function with the distribution of free space around the robot or a reference object. Experiments have been conducted in an artificially created domestic environment with heterogeneous characteristics. According to the experimental results, the proposed system is capable of enhancing the understanding of navigational commands with fuzzy notions.

Keywords

Understanding of fuzzy language notions human–robot interactions human-friendly robotics service robotics social robotics

1. Introduction

An intelligent service robot is a machine that is able to perceive the environment and use its knowledge to operate safely in a meaningful and purposive manner [2]. Developing intelligent service robots with human-like interactive features has been a major research niche in the field of robotics for a quite some time due to their potential use in emerging fields of robotics applications; health care [15], rehabilitation [8], caretaking [5], entertainment [1], etc. However, the development of intelligent service robots with human-friendly interactive features is a challenging and complicated task since human-like cognitive abilities should be embodied into the design of the service robots in order to impersonate human-like interactive features on them [31].

Humans prefer to use voice directives in order to convey the instructions between peers. Therefore, human-like voice communication between human users and robotic systems would enhance the rapport between the robots and the users [20,25]. However, the information conveyed through voice instructions is often ambiguous due to the inclusion of uncertain information, lexical symbols and notions. As an example, humans prefer to use the command, “move far to the left” instead of “move 120 cm with 25 degrees with north”. The linguistic notions related to distances and directions are often fuzzy in nature since the meanings are subjective to various factors such as the arrangement of the environment, experience, and context. Therefore, human-like interpretation ability of such uncertain information is mandatory for a human-friendly service robot.

Methodologies have been proposed in order to operate robotic systems using natural language voice instructions [10,32]. However, the methods are mainly proposed for enhancing the understanding of natural language representations and the methods for inferring the quantitative meaning of fuzzy information in voice instructions are not discussed in those approaches. The method proposed in [29] is capable of describing the surrounding environment through dialogues that have uncertain information related to the spatial information. However, the quantitative meaning of uncertain information is fixed and hence, the interpretation is not human-like. The methods proposed in [4,26] use a fuzzy neural network that quantifies fuzzy notions in natural language user instructions. However, the meanings of fuzzy notions are predetermined since quantitative meanings of fuzzy notions are defined as a linear modification factor that depends on the current state of the robot. The method proposed in [17] is capable of quantifying distance related uncertain information in user instruction through a fuzzy inference system. This fuzzy inference system considers the distance between the robot and its user for the interpretation. However, the interpretation is predetermined since the membership functions are fixed entities defined based on expert knowledge. The method proposed in [16] assumes that the meaning of uncertain information related to distances depends on the immediately previous state of a robot and hence it evaluates quantitative values for uncertain information through a fuzzy inference system that considered the immediately previous movement. However, according to [13], the immediate past state is not enough for effective interpretation of uncertain information and it proposed a method to adapt the perception of uncertain information based on a set of previous states instead of the immediate previous state. Methods have been proposed for adapting the perception of a robot about uncertain information towards the perception of its user by acquiring knowledge through fuzzy neural networks that learn from user feedback [14,19]. However, the above-mentioned methods are not capable of adapting the perception of uncertain information according to the environment even though the meaning of uncertain information directly subjective to the arrangement of the environment. Therefore, the interpretation of uncertain information by those systems is not human-like.

The method proposed in [12], is capable of quantifying the meaning of distance related uncertain information in object manipulation instructions by adapting the perception based on a fuzzy inference system that considers the average distance between objects on its vision field. The method proposed in [27] is capable of inferring the meaning of uncertain positional information related to distances such as “close” and “far” by scaling the meaning according to the size of the frame of interest. Only depending on the size of the room for adapting the perception of uncertain information in navigational instructions is not effective [21]. The cited work proposed a method to adapt the perception of uncertain information related to navigational commands based on environmental factors such as the size of the room, available free space and the arrangement of objects. However, the system is only capable of interpreting fuzzy notions related to distances and meanings of directional notions are fixed despite the natural language directional notions inherit fuzziness [28,30]. Similarly, many methods have been developed to interpret uncertain information in language instructions [24]. However, the scope of those approaches is mainly limited to the interpretation of distance related uncertain notions in language instructions and methods assume that the meaning of directional notions is fixed [24]. Therefore, the interpretation of navigation commands is not effective and human-like in those approaches when determining the exact direction of the movement symbolized by the language notions such as “left”, “right” and “front”.

Fig. 1.

System overview.

The method proposed in [23] is capable of interpreting the required direction of a movement symbolized by language descriptors based on the surrounding environment. The main argument of the method is that humans tend to move towards an area where the congestion of objects is low. The direction of the movement symbolized from the language descriptors is interpreted by a fuzzy inference system that is capable of modifying the perception of a robot about the directional notions according to the surrounding environment of the robot. This modification is done by weighting the output membership function of the fuzzy inference system with the distribution of the free space around the robot. Even though the system is capable of replicating the natural directional perception of humans that depends on the environment to a greater extent, the system is only capable of interpreting the directional notions in simple motional commands such as “move little left” which express directions with respect to the robot. The system is not capable of interpreting the directions expressed with the aid of landmarks/objects in the environment when a positional command is issued to the robot. For example, the system is capable of interpreting the motional command “move little right” but not the positional command “move near to the left of the table”. Furthermore, the concept has not been demonstrated for operations in a domestic environment with different characteristics.

Therefore, this paper proposes a method to interpret the directional notions in both motional and positional navigation commands by adapting the robot’s directional perception according to the environment. Moreover, the capabilities of the system proposed in [23] have been improved in order to interpret the positional navigation command such as “move near to the right side of the chair”. The proposed system is capable of interpreting uncertain language notions related to distance and direction in navigational commands based on the characteristics of the environment. Furthermore, the proposed system has been designed in such a way that it can be operated on a typical domestic environment with heterogeneous characteristics. The functional overview of the proposed system is presented in Section 2. The fuzzy notions interpretation system is explained in Section 3 with the rationales behind the used techniques. Particulars on experimental validation of the proposed system are discussed in Section 4. Finally, concluding remarks including future improvements are given in Section 5.

2. System overview

The overall functionality of the proposed system is depicted in Fig. 1. The goal of the system is to provide an effective way to navigate a robot by using natural language instructions that include uncertain information related to distances and directions such as “move a little to the right” and “move near the left of the TV”. The voice instructions issued by the user are converted into text and then parsed by the Voice Recognition and Analyzing section. Voice recognition functionality is implemented using the Speech Recognition 3.1 library,1

¹
www.pypi.org/project/SpeechRecognition/3.1.1

which converts voice into text with the support of Google Speech Recognition. The voice responses of the robot are generated by the Voice Response Synthesizing section that is a text to speech converter implemented using Microsoft Speech API. The keywords, lexical symbols, and basic dialogue patterns required for construing the language instructions and responses are stored in the Language Memory. The interaction between the robot and a user is managed by the Interaction Manager (IM) by determining the required robot actions. The IM has been implemented as a finite state intention module as explained in [22]. The Action Planner executes the required sequence of the robot actions for accomplishing a particular user instruction with the aid of the Action Knowledge Base. with the aid of the Action Knowledge Base. The fuzzy notions in a particular user instruction are interpreted by the Fuzzy Information Interpreter (FII) that has two modules for interpreting distance and direction related uncertain notions respectively. The module used for interpreting fuzzy information related to distances has been implemented similarly to the system explained in [21], and it has two submodules for interpreting uncertain information related to motional and position information. The required environmental parameters for the interpretation of uncertain information are fed into the FII by the Environment Information Organizer (EIO).

The low-level navigation functionalities of the robot such as localization and path planning within a given navigation map are handled by the Navigation Controller. The required navigation maps can be created from Mapper 3 software. The Sensor Handler deals with the low-level sensors of the robots such as range sensors. The EIO organizes the robot’s knowledge about the environment by extracting the information from the Sensor Handler and the Navigation Maps.

3. Interpretation of fuzzy notions in navigation instruction

3.1. Structure of the user commands

The ability to issue flexible user instruction to the robot enhances the overall rapport between a robot and its user. Therefore, the command parsing ability of the system proposed in [23] has been improved in this work in order to provide the flexibility to users in issuing navigation instructions. Furthermore, this enhancement allows the users to issue positional navigation commands to the robot. The command parsing is done by analyzing tokens in a given user instruction with the keywords, lexical symbols, and basic grammar patterns available in the memory as similar to the system explained in [21]. The redundant words in a user instruction such as articles are filtered out before parsing it. Structures of motional and positional navigational commands accepted by the system are given below (without redundant words) in JSpeech Grammar Format [11]. $\begin{array}{l} < {com}_{M} > = < actn > < {dist}_{M} > < {dir}_{K} >; \\ < {com}_{P} > = < actn > < {dist}_{P} > [< {dir}_{K} >] < Ref >; \\ < actn > = (go | move); \\ < {dist}_{M} > = (far | medium | little); \\ < {dist}_{P} > = (near | close); \\ < {dir}_{K} > = (forward | backward | left | right | < sub_dir >); \\ < sub_dir > = (front | back) (left | right); \end{array}$

The command type, $< {com}_{M} >$ is considered as motional commands. When this kind of a command is issued to the robot, the robot is expected to travel the distance meant by the token, $< {dist}_{M} >$ . This distance is measured from the robot’s initial position. The direction to be moved is decided by the direction keyword, $< {dir}_{K} >$ . The distance related uncertain information in a command (i.e., $< {dist}_{M} >$ ) is interpreted by the motional submodule of the Distance Interpreter (DisI) of the FII. The Direction Interpreter (DirI) of the FII interprets the direction keyword (i.e., $< {dir}_{K} >$ ). It is assumed that the direction is expressed with respect to the robot’s frame. For example, “move a little forward” can be considered as a ${com}_{M}$ command and it will be tokenized and parsed as $< actn >$ : “move”, $< {dist}_{M} >$ : “little” and $< {dir}_{K} >$ : “forward”.

The command type $< {com}_{P} >$ is defined for positional commands. When this kind of command is issued to the robot, the robot is expected to travel to a location that is expressed with respect to a reference object (i.e., $< Ref >$ ). The command, “move close to the left of the table” can be considered as an example for this type. The distance between the goal position and the reference object has to be decided by inferring the meaning of the descriptor, $< {dist}_{P} >$ . This is done by the positional submodule of the DisI of the FII. The direction that has to be maintained with the reference is decided by inferring the meaning of $< {dir}_{K} >$ (only if $< {dir}_{K} >$ is available) by the DirI of the FII. The details of the reference object such as its location are identified based on the knowledge of the EIO.

3.2. Interpretation of directional notions

3.2.1. Rationale behind the proposed method

The navigation system proposed in [21] assumes that the meaning of directional notions is fixed as explained in Fig. 2. Figure 2(a) shows an instance where the directions are defined with respect to the robot and this definition is used when the robot is commanded with a motional command. According to the definition, the meaning of language descriptors for the directions are fixed as “forward” exactly in the direction of the current heading of the robot (i.e., θ), “left” as $θ + 90^{\circ}$ , “right” as $θ - 90^{\circ}$ , and “backward” as $θ + 180^{\circ}$ . Figure 2(b) shows an instance where the directions are defined with respect to a reference object and this definition is used when the robot is commanded with a positional command such as “move near to the left of Obj A”. The meaning of the directional language notions is fixed here too. The point of view of the robot is considered for defining the orientation frame of the object similar to the work proposed in [9]. However, the directional notions in language instructions are inherited fuzziness to a certain degree [28,30]. Therefore, such fixed interpretations of directional notions are not effective and not human-like [23]. The work [23] proposed a method for adapting the perception of the fuzzy directional notions based on the surrounding environment. The main argument of this work is that humans tend to move to free locations in an environment than the high congestion areas. The method is capable of replicating the natural behavior of humans to a greater extent. However, it is not capable of interpreting the directions involved in a positional command such as “move near to the left of the TV”. Therefore, the capabilities of that method have been improved in this work in order to interpret directional notions in positional command based on the natural tendencies of humans.

Fig. 2.

This illustrates how the directional notions are defined in the system proposed in [21]. The shaded color areas represent the objects in the environment. (a) represents a situation where the directions are defined with respect to the robot. In here, θ is the heading angle of the robot. (b) represents a situation where the directions are defined with respect to a reference object. In here, the direction of the front with the X-axis is annotated as θ. The orientation frame is considered based on the point of view of the robot for this kind of instance.

3.2.2. Adapting the perception of directional notions

The method for interpreting the directional notions in navigation command has been designed with a single input and single output fuzzy inference system that is capable of perceiving the environment for adapting the perception. The directional keyword in a particular command (i.e., $< {dir}_{K} >$ ), is taken as the input of the system. The input membership function has singleton sets as shown in Fig. 3(a) in order to represent the language terms of $< {dir}_{K} >$ . The meaning of directional terms “left” and “right” has to be interchanged for motional and positional commands. Therefore, the indexes of the input sets are separately defined for motional and positional commands. The output membership function has been defined with eight triangular fuzzy sets with overlapping boundaries for representing the basic linguistic Direction Sets (DS) as shown in Fig. 3(b). The input sets are directly mapped to output sets. If the user command is a motional command, the output is the required change of the heading angle of the robot. If the command is a positional command, the output is the angle to the destination position measured from the absolute front of the reference object. If this system were used for the interpretation of the directional notions directly in this way, the meaning of the notions would be predetermined. Moreover, the system is not capable of adapting the directional perception in a human-like manner.

Fig. 3.

(a) shows the input membership function of the Direction Interpreter (DirI). It has singleton sets to represents the direction keywords ( $< {dir}_{K} >$ ). For motional commands and positional commands the indexes have to be interchanged for “left” and “right”. Therefore, the keywords and index are linked differently as shown. (b) shows the output membership function. It should be noted that the output membership function is a continues one and the ends represented by ${DS}_{- 4}$ are physically at the same position (−180° and 180° are physically the same).

Therefore, in order to adapt the perception of the directional notions according to the environment setting, the output membership function is modified based on the distribution of free space around the robot (if ${com}_{M}$ ) or the reference object (if ${com}_{P}$ ) in such a way that the robot tends to move towards a low congestion area. This is achieved by weighting the output membership function with the available free space around the robot or the reference object.

$ω_{{DS}_{i}} (α)$ is the weighted activation degree of ith Direction Set $({DS}_{i})$ for a given angle α. It can be computed as given in Eq. (1). In here, $μ_{{DS}_{i}} (α)$ is the activation degree of ${DS}_{i}$ for the angle α, and $d_{α}$ is the distance to the nearest obstacle in the direction that creates α angle with the current heading of the robot (for ${com}_{M}$ ) or the absolute front of the reference object (for ${com}_{P}$ ). The ways to obtained $d_{α}$ is explained in Fig. 4(a) and Fig. 4(b) for motional and positional commands respectively, $\begin{matrix} (1) & \begin{matrix} ω_{{DS}_{i}} (α) = & μ_{{DS}_{i}} (α) \cdot d_{α} \\ \forall α ϵ [{({DS}_{i})}_{L}, {({DS}_{i})}_{U}] \end{matrix} \end{matrix}$

${({DS}_{i})}_{L}$ and ${({DS}_{i})}_{U}$ are the lower and upper bound of ${DS}_{i}$ and are defined as in Eq. (2), where δ is a scalar constant, $\begin{matrix} (2) & \begin{matrix} {({DS}_{i})}_{L} = {({DS}_{i})}_{C} - δ \\ {({DS}_{i})}_{U} = {({DS}_{i})}_{C} + δ \end{matrix} \end{matrix}$

Fig. 4.

This explains the ways to obtain $d_{α}$ in order to modify the output membership function of the Direction Interpreter (DirI). (a) shows an instance where $d_{α}$ is obtained for a motional command. For motional commands, free space around the robot is considered for the weighting. Hence, $d_{α}$ is the distance to the nearest obstacle/object from the robot in the direction that creates angle of α with the current heading of the robot. (b) shows an instance where $d_{α}$ is obtained for a positional command. For positional commands, free space around the reference object indicated by $< Ref >$ is considered for the weighting. Hence, $d_{α}$ is the distance to the nearest obstacle/object from the reference object in the direction that creates angle of α with the absolute front of the reference object. The absolute front of the reference object is defined based on the point of view of the robot.

The center of ith Direction Set, ${({DS}_{i})}_{C}$ is defined as in Eq. (3). The scalar constant, δ is taken as 45° in order to have the default directional perception of the system similar to that of the system explained in [28], $\begin{matrix} (3) & {({DS}_{i})}_{C} = i δ \end{matrix}$

Due to the weighting with free space distribution, the range of the activation degree of a Directional Set $({({DS}_{i})}_{C})$ may vary significantly for different situations. Therefore, $ω_{{DS}_{i}} (α)$ is normalized in order to maintain the variation between [0,1]. The normalized weighted activation degree of ith Direction Set, ${\hat{ω}}_{{DS}_{i}} (α)$ is obtained from Eq. (4), $\begin{matrix} (4) & {\hat{ω}}_{{DS}_{i}} (α) = \frac{ω_{{DS}_{i}} (α)}{max (ω_{{DS}_{i}} (α))} \end{matrix}$

In the rule base, ith direction keyword in the input membership function is directly mapped to the ith set of the output membership function yielding to a single input single output fuzzy system. The aggregation of the output fuzzy sets is done by considering the fuzzy union operator. The required crisp output of the fuzzy inference system, ψ is obtained from defuzzifying the aggregated output membership function using the center of area method. Then, the defuzzified output of the fuzzy inference system, ψ can be obtained as given in Eq. (5), where $μ_{{dir}_{K}} (i)$ is the activation degree of ith set of input membership function for the direction keyword (i.e., $< {dir}_{K} >$ ), $\begin{matrix} (5) & ψ = \frac{\sum_{i = - 4}^{3} \sum_{α = {({DS}_{i})}_{L}}^{{({DS}_{i})}_{U}} {\hat{ω}}_{{DS}_{i}} (α) . α . μ_{{dir}_{K}} (i)}{\sum_{i = - 4}^{3} \sum_{α = {({DS}_{i})}_{L}}^{{({DS}_{i})}_{U}} {\hat{ω}}_{{DS}_{i}} (α) . μ_{{dir}_{K}} (i)} \end{matrix}$

The output of the Distance Interpreter, ϕ can be obtained as given in Eq. (6) and the meaning of ϕ depends on the type of the corresponding user command. If the user command is a motional command, then ϕ is the interpreted moving direction for the robot, and it is achieved by changing the heading angle of the robot to ϕ. If the user command is a positional command, then ϕ is the angle to the destination position of the robot measured around the center of the reference object from the X-axis. $\begin{matrix} (6) & ϕ = θ + ψ \end{matrix}$

3.3. Interpretation of distance information

Distance-related fuzzy notions in user commands are interpreted by the Distance Interpreter (DisI), which has been implemented with two submodules for interpreting the distance related uncertain descriptors in motional commands and positional commands. This module has been implemented with two fuzzy inference systems similar to the system proposed in [21].

Fig. 5.

This depicts the functional overview of the fuzzy inference systems used in the Distance Interpreter (DisI) for interpreting distance related fuzzy notions in user commands. These two fuzzy inference systems have been implemented similar to the system proposed in [21] for interpreting distance related uncertain information navigation commands. (a) depicts the overview of the fuzzy inference system in motional submodule of the DisI. (b) depicts the overview of the fuzzy inference system of the positional submodule of the DisI.

3.3.1. Motional submodule of the DisI

The functional overview of the Motional submodule of DisI is depicted in Fig. 5(a). This submodule of the DisI is used to interpret fuzzy distance notions in motional navigation commands (i.e., $< {dist}_{M} >$ ) such as “little” and “far”. The fuzzy inference system used for the interpretation of $< {dist}_{M} >$ has two inputs; the uncertain distance token (i.e., $< {dist}_{M} >$ ) and the available free space of the room. The input membership function for the free space is adjusted according to the size of the room. Hence, the size of the room is also an input for the Motional submodule of the DisI. The output of the DisI is the corresponding quantified distance symbolized by $< disM >$ in a particular user command. The output membership function of the system is adjusted according to the perceptive distance (D) that represents the arrangement of the environment and $D = d_{r}$ in this case, where $d_{r}$ is the distance to the closest obstruction for the movement in the anticipated moving direction, and this is illustrated in Fig. 6.

Fig. 6.

This explains the parameters related to the perceptive distance, D. In motional module, $D = d_{r}$ and any obstacle in the moving direction is considered for $d_{r}$ . In positional module, $D = min (d_{r}, d_{o} b j) / 2$ and the distance between the robot and $< Ref >$ is considered for $d_{r}$ in this case. The dashed-lined arrow indicates the intended moving direction of the robot.

3.3.2. Positional submodule of the DisI

The functional overview of the Positional submodule of the DisI is depicted in Fig. 5(b). This module is used to interpret the fuzzy distance notions in positional commands (i.e., $< {dis}_{P} >$ ) such as “near”. The fuzzy inference system used for this has two inputs and an output. The available free space and the size of the reference object (i.e., size of $< Ref >$ ) are the inputs of the fuzzy system. The input membership functions for the free space and the size of $< Ref >$ are adjusted according to the size of the room. Therefore, the size of the room is also taken as an input for this module. The output of this submodule is the quantified distance meant by the fuzzy notion in a positional command and this distance is measured from $< Ref >$ . The output membership function in this module is also adjusted according to the perceptive distance (D) and $D = min (dr, dobj) / 2$ . The parameters related to the calculation of perceptive distance are explained in Fig. 6.

Fig. 7.

This shows MIRob during experimental scenarios.

Fig. 8.

The initial and final positions of the robot during the execution of the cases given in Table 1 are marked on the map with corresponding letter indexes. This map is drawn to a scale in order to visualize the characteristics of the experimental environment. However, it should be noted that the markers do not represent the actual size of the robot.

4. Results and discussion

4.1. Experimental setup

The proposed system has been implemented on MIRob platform [22] and experiments have been conducted on an artificially created domestic environment for the evaluation of the behavior and performance. MIRob during few experimental instances is shown in Fig. 7. The arrangement of the experimental environment is given in the map shown in Fig. 8. It had 3 different rooms with heterogeneous characteristics. At the start, the robot was initialized with an updated navigation map of the environment. Furthermore, the lexical labels and the locations of the objects in the environment had been taught to the robot through discussion as explained in [22]. Therefore, the robot was well aware of the arrangement and the characteristics of the environment during the experiments.

In order to increase the computational efficiency of the direction interpreter, α is considered as a discrete variable instead of a continuous one. Furthermore, this discretization simplifies the implementation complexity of the system. Therefore, the weighting of the default perception with $d_{α}$ is done only for a defined set of α values. For ith DS, α is defined as $α = {α_{1} = {({DS}_{i})}_{L}, α_{2} = {({DS}_{i})}_{C} - 30^{\circ}, α_{3} = {({DS}_{i})}_{C} - 10^{\circ}, α_{4} = {({DS}_{i})}_{C} + 10^{\circ}, α_{5} = {({DS}_{i})}_{C} + 30^{\circ}, α_{6} = {({DS}_{i})}_{U}}$ . The intermediate values are linearly interpolated in order to form a continuous set for the weighted output membership function. The center of area of this weighted output membership function is considered as the defuzzified output.

The performance of the proposed system (i.e., the system with the adaptable directional perception) has been evaluated against a system with a fixed directional perception (i.e., system similar to [21]). The evaluation has been conducted with the aid of a user study. Due attention has been paid to the guidelines and recommendation given in [3] for designing and performing human studies for human–robot interaction experiments to minimize the subjectivity of outcomes since the user studies are highly subjective in nature.

Table 1
Sample results of the experiment: Parameters related to the interpretation of directional notions by the DirI

User command Initial position (X cm, Y cm, $θ^{\circ}$ ) With fixed directional perception With adaptable directional perception

$ϕ^{\circ}$ Destination 1 (X cm, Y cm, $θ^{\circ}$ ) UR $d_{α}$ (cm) 2 $ψ^{\circ}$ $ϕ^{\circ}$ Destination 1 (X cm, Y cm, $θ^{\circ}$ ) UR

$d_{α_{2}}$ $d_{α_{3}}$ $d_{α_{4}}$ $d_{α_{5}}$

1 move far to the left $I_{1} (369, 259, - 4)$ 85 $F_{1} (378, 290, 85)$ 24 10 45 187 105 101 97 $F_{1}^{'} (346, 455, 96)$ 84

2 move a little left $I_{2} (186, 392, 90)$ 180 $F_{2} (162, 392, 180)$ 67 28 53 122 60 96 −175 $F_{2}^{'} (161, 388, - 175)$ 70

3 move far forward $I_{3} (- 188, - 64, - 64)$ −64 $F_{3} (- 150, - 121, - 63)$ 50 89 184 110 29 −6 −69 $F_{3}^{'} (114, - 235, - 67)$ 84

4 move near to the front of the sink $I_{4} (274, 134, 176)$ −90 $F_{4} (371, 491, 90)$ 64 140 289 227 60 −5 −94 $F_{4}^{'} (367, 492, 90)$ 72

5 move near to the left of the switch board $I_{5} (- 224, 130, 0)$ 90 $F_{5} (382, 440, - 90)$ 71 5 33 138 99 −77 103 $F_{5}^{'} (364, 439, - 90)$ 72

6 go near to the front of the cupboard $I_{6} (- 169, - 65, - 73)$ 90 $F_{6} (- 122, - 279, - 90)$ 66 42 128 268 128 6 96 $F_{6}^{'} (- 128, - 279, - 90)$ 78

	User command	Initial position (X cm, Y cm, $θ^{\circ}$ )	With fixed directional perception	With adaptable directional perception
1	move far to the left	$I_{1} (369, 259, - 4)$	85	$F_{1} (378, 290, 85)$	24	10	45	187	105	101	97	$F_{1}^{'} (346, 455, 96)$	84
2	move a little left	$I_{2} (186, 392, 90)$	180	$F_{2} (162, 392, 180)$	67	28	53	122	60	96	−175	$F_{2}^{'} (161, 388, - 175)$	70
3	move far forward	$I_{3} (- 188, - 64, - 64)$	−64	$F_{3} (- 150, - 121, - 63)$	50	89	184	110	29	−6	−69	$F_{3}^{'} (114, - 235, - 67)$	84
4	move near to the front of the sink	$I_{4} (274, 134, 176)$	−90	$F_{4} (371, 491, 90)$	64	140	289	227	60	−5	−94	$F_{4}^{'} (367, 492, 90)$	72
5	move near to the left of the switch board	$I_{5} (- 224, 130, 0)$	90	$F_{5} (382, 440, - 90)$	71	5	33	138	99	−77	103	$F_{5}^{'} (364, 439, - 90)$	72
6	go near to the front of the cupboard	$I_{6} (- 169, - 65, - 73)$	90	$F_{6} (- 122, - 279, - 90)$	66	42	128	268	128	6	96	$F_{6}^{'} (- 128, - 279, - 90)$	78

The destination positions are decided based on the outputs of both DisI and DirI. The parameters related to the DisI in interpreting the distance notions in the corresponding cases are given in Table 2.

It should be noted that the effects to the interpretation from $d_{α_{1}}$ and $d_{α_{6}}$ are null since $μ_{{DS}_{i}} (α_{1})$ and $μ_{{DS}_{i}} (α_{6})$ are zero. Therefore, those two values are not displayed here.

The user study has been conducted with the participation of 12 users whose mean and standard deviation of age are 26.8 and 4.1 years respectively. The users were taken one by one to the experiment, and they were advised about the structures of the user commands that can be understood by the robot. Each user has been given 6 occasions to interact with the robot for each of the two systems (i.e., system with fixed directional perception and adaptable directional perception). These instances were chosen by randomly deciding the initial position of the robot. The users were given the freedom to decide user instructions. However, the users were asked to include 3 motional commands, and 3 positional commands for those 6 instances and the same 6 instances were repeated to the other system. In order to minimize the subjectivity, the users were not informed about the system (either with fixed directional perception or with adaptable directional perception) that they are interacting in a particular run. After each run, the user was asked to rate the action of the robot in the scale 0–100 similar to the evaluation approach used in [15], where 100 indicates the perfect agreement and 0 indicates the null agreement. This kind of evaluation method has been chosen over the evaluation method proposed in [23] because the action of the DirI modifies the perceptive distance which affects the output of the DisI and hence, the evaluation should be conducted for the entire action of the robot instead of merely the interpreted direction like in [23]. A User Rating (UR) given by a user depends on the final position of the robot. Therefore, it reflects the assessment of both direction and distance interpreted by the robot.

4.2. Results and performance comparison

The results obtained from the 1st user for 6 runs using both systems are given in Table 1 as sample results. The parameters related to the DisI for the corresponding cases are given in Table 2. The initial and final positions of the robot during these experimental runs are marked on the map shown in Fig. 8 with corresponding indexes given in Table 1. The modified output membership functions of the DirI due to the weighting with the free space in these cases are shown in Fig. 9.

Table 2
Parameters related to the interpretation of distance notions by the DisI

Case Room size (m²) Free space (m²) With fixed directional perception With adaptable directional perception

D (cm) $D_{o u t}$ (cm) D (cm) $D_{o u t}$ (cm)

1 15.08 12.95 34 28 243 198

2 15.08 12.95 60 25 65 27

3 11.50 9.27 108 86 234 186

4 15.08 12.95 100 38 100 38

5 15.08 12.95 130 37 130 37

6 11.50 9.27 94 36 94 36

Case	Room size (m²)	Free space (m²)	With fixed directional perception	With adaptable directional perception
1	15.08	12.95	34	28	243	198
2	15.08	12.95	60	25	65	27
3	11.50	9.27	108	86	234	186
4	15.08	12.95	100	38	100	38
5	15.08	12.95	130	37	130	37
6	11.50	9.27	94	36	94	36

Fig. 9.

The output membership functions plotted here show the adaptation of the perception of directional notions after weighting the default perception with the available free space for the cases given in Table 1. (a), (b), (c), (d), (e), and (f) represent case 1, 2, 3, 4, 5, and 6 respectively. It should be noted that only the effective Direction Sets (DS) are plotted here for a particular instance and non-effective ${DS}_{i}$ due to $μ_{{dir}_{K}} (i) = 0$ are not shown.

In case 1, the robot was initially placed on the location ‘ $I_{1}$ ’ and the robot was commanded “move far to the left” by the user 1. This is a motional command; $< {dis}_{M} >$ and $< {dir}_{K} >$ were “far” and “left” respectively. Therefore, the robot had to interpret the distance meant by “far” and the direction meant by “left” in order to fulfill the user command. In the run of the system with fixed directional perception (i.e., the system similar to the work proposed in [21]), the direction interpreted by the robot for “left” was fixed as the current heading (i.e., θ) +90° as explained in Section 3.2.1. Therefore, the heading angle for the movement was decided by the robot as 85°. The distance meant by $< {dis}_{M} >$ was quantified by the Distance Interpreter (DisI) as 28 cm based on the perceptive distance ( $D = 34 cm$ ), room size (= 15.08 m²), and the free space (= 12.95 m²). As a result of this interpreted distance and direction, the robot moved to location ‘ $F_{1}$ ’. The action of the robot in this run has been rated by the user 1 by giving a User Rating (UR) of 24. In the system with adaptable directional perception case (i.e., the system proposed in this work), the robot was initially placed on the same location and issued the same user command. In this run, the direction interpreted by the system was a change of 101° from the current heading yielding to the heading of the movement to 97°. The modified membership function due to the arrangement of the environment is shown in Fig. 9 (a). In this instance, only ${DS}_{2}$ was effective since the effects of other sets are null due to $μ_{{dir}_{K}} (i) = 0$ for $i = {- 4, - 3, - 2, - 1, 0, 1, 3}$ . As a result of adapting the directional perception according to the environment, the defuzzified output of the system was different from the Centre of Area (COA) of the default perception. This exhibits the directional perception adaptation based on the environment setting. Because of this interpreted direction, the perceptive distance (D) was significantly different from the previous case (i.e., system with fixed directional perception). Therefore, the distance interpreted by the system was 198 cm ( $D = 243 cm$ , and free space and room size were the same as previous run) and the robot’s destination position was ‘ $F_{1}^{'}$ ’. The action of the robot has been rated as 84 by the user. The increase of the UR shows an enhancement of the user agreement for the system with adaptable directional perception (i.e., work presented in this paper) than the system with fixed directional perception in this case.

In case 2, the initial position of the robot was ‘ $I_{2}$ ’ and the robot was commanded with the motional command “move a little left”. The system with fixed directional perception run, the heading angle decided by the system was 180°, and the robot moved 25 cm resulting the destination at location ‘ $F_{2}$ ’. The UR for this run was 67. The system with adaptable directional perception run, the heading angle for the movement was −175° and the robot moved 27 cm resulting the destination at location ‘ $F_{2}^{'}$ ’. The UR for this run was 70. The URs for the two systems were not significantly different since the resulted destination positions had only a slight difference.

In case 3, the robot was commanded with the motional command, “move far forward”. In the system with fixed directional perception case, the robot moved to location ‘ $F_{3}$ ’ by considering the direction meant by forward as the current heading (i.e., $- 64^{\circ}$ ) while in the system with adaptable directional perception case, the robot moved to location ‘ $F_{3}^{'}$ ’ by considering the direction meant by “forward” as heading of $- 69^{\circ}$ . The corresponding URs were 50 and 84 respectively for the two runs.

In case 4, the initial position of the robot was ‘ $I_{4}$ ’ and it was commanded with the positional command “move near to the front of the sink”. In this case, the robot had to quantify the distance meant by “near” and the direction meant by “front” with respect to $< Ref >$ (i.e., sink). The robot moved to location ‘ $F_{4}$ ’ in the run where the system had fixed directional perception. In the system with adaptable directional perception run, the robot moved to location ‘ $F_{4}^{'}$ ’, which is slightly deviated towards the free area with respect to the final position of the previous run. Therefore, the UR for the system with adaptable perception (UR = 72) got a slightly higher value compared to the system with fixed directional perception (UR = 64).

In case 5 and 6, the robot was commanded with positional commands. The system with adaptable directional perception runs, the robot moved to locations which are slightly deviated towards low congestions areas with respect to the moved positions in the system with fixed directional perception runs. The system with adaptable directional perception got a higher UR with respect to the system with fixed directional perception. However, in case 5, the URs for the two systems were almost the same (71 and 72) even though the direction had a slightly large deviation (deviation was 13°). In the system with the fixed directional perception case, the robot was not exactly settled on the direction deicide by the system since the robot cannot reach that position due to the limitation of the space for the occupancy of the robot. Therefore, the location was already deviated towards the free area due to that. This would be the reason for getting almost the same user rating for the two systems.

Fig. 10.

(a) shows the mean values of the user ratings for the two systems with error bars. The error bars represent the standard error. (b) shows the distribution of user ratings as a boxplot. The boxplot has the usual standard notation; box: Interquartile range, horizontal line: Median, whiskers: Minimum and maximum, and plus sign: Outliers.

Similarly, all the 12 participants were asked to operate the robot 6 runs for each of the system (i.e., with the system with fixed directional perception and system with adaptable directional perception). This yields to 72 effective cases for each system. The mean value of the User Rating (UR) was calculated for both the system based on the individual UR for each instance. The calculated mean UR scores for the two systems are given in Fig. 10(a) with error bars. The distributions of the UR scores are given in Fig. 10(b) as boxplots for better visualization of the results. The system with the adaptable directional perception (i.e., system proposed in this paper) got a mean user rating of 77.7 while the system with fixed direction interpretation got mean UR of 56.2. The difference between the means of UR is statistically significant ( $P < 0.05$ ) according to t-test. Therefore, it can be concluded with 95% confidence that the user agreement for the system with adaptable perception is par above the user agreement for the system with fixed directional perception. Furthermore, the performance improvement caused to the understanding of fuzzy notions due to the addition of adaptable directional perception is noteworthy since Cohen’s d value of greater than 0.7 can be observed from results of the user rating (since Cohen’s d value greater than 0.7 is considered as a large effect [6]).

4.3. Discussion

Many approaches have been developed to improve the interpretation of fuzzy notions in language instructions by adapting the perception according to environment [24]. However, the scope of the existing approaches is limited to adapting the perception of distance notions and the directional perception is assumed to be fixed [24]. Moreover, the existing approaches are not capable of inferring the fuzziness inherited with directional linguistic notions when interpreting a user command. The proposed method of this paper is capable of adapting both distance and directional perception based on the environment by considering the fuzziness associated with them. According to the experimental results, the adaptation of both entities significantly improves the navigational command understanding ability of a robot. Therefore, the proposed method surpasses the abilities of the state of the art approaches in this particular research niche. This is the key improvement of the proposed work, and this work improves the state of the art in terms of the scope of this working niche.

The overall performance of the command understanding ability of the robot depends on the voice recognition accuracy. Therefore, remedies have been taken to minimize the adverse effects caused to the evaluation of the proposed method due to the issues in voice recognition. In order to improve the voice recognition accuracy, a wireless headset microphone, which can be placed close proximity to mouth, was given to the users to issue the voice instructions during the experiments. The voice recognition accuracy is around 70–80%. Therefore, there could be situations where the robot misinterprets user instructions due to the issues in voice recognition. For proper evaluation of the proposed concept, the ambiguities arisen due to the issues in voice recognition should be cleared out. Therefore, as a remedy for this, the system has been designed in such a way that the robot requests a confirmation of the received user instruction command from the user before the execution of actions. If the voice instruction is not properly recognized, the user can repeat the instruction. If the instruction is correctly recognized, the user can confirm it to the robot and then the robot will execute the actions to fulfill the user instruction. The robot asks for a confirmation after receiving an instruction. Hence, it may impose some amount of overhead to users. However, both systems used for evaluating the performance (i.e., the system proposed in this paper and the system proposed in [21]) have the same behavior in Voice Recognition and Analyzing section. Therefore, the effects caused to the evaluation due to this adverse effect are nullified in overall comparison. Moreover, it can be concluded that the experimental evaluation has properly assessed the performance gain in the interpretation of fuzzy notions in commands without depending on the issues related to voice recognition.

5. Conclusion

A method has been proposed for enhancing the interpretation of fuzzy notions in motional and positional navigation command by adapting a robot’s directional perception based on the environmental setting. The major improvement of the proposed system over the existing approaches is the system is capable of interpreting the directional notions in motional and positional navigation commands by considering the fuzziness associated with natural language descriptors instead of fixed interpretations.

The directional notions in user instructions are interpreted by a fuzzy inference system that has been designed in such a way that it can replicate the natural human behavior. The perception of fuzzy directional notions is adapted by modifying the output membership function of the fuzzy inference system according to the available free space around the robot or the reference object.

Experiments have been conducted in order to evaluate the performance improvement caused to the understanding of navigational commands by the robot due to the deployment of the proposed method for adapting the robot’s directional perception. The performance of the system with the adaptable directional perception (i.e., the system proposed in this work) has been compared against a system with a fixed directional perception (i.e., a system similar to [21]) through a user study. According to the obtained experimental results, fuzzy navigational command understanding ability of the system with the adaptable directional perception surpasses the ability of the system with fixed directional perception with a significant margin.

The proposed system has unimodal interaction abilities. Moreover, the interaction between the robot and a user is limited to voice communication, and the system is not capable of grabbing information conveyed through nonverbal instructions that may accompany with a voice instruction for decision making. The interpretation of uncertainties in voice instructions could be improved by fusing information conveyed non-verbal means such as pointing gestures [33]. Furthermore, nonverbal cues can be used to identify the intention of a user or change the attention towards a direction/position [7,18]. Therefore, mechanisms that are capable of evaluating nonverbal cues could be fused with the proposed method to improve the interpretation ability of fuzzy notions further. The establishment of multimodal interaction abilities to improve the interpretation ability of fuzzy notions is proposed for future work.

Footnotes

Acknowledgements

This work was supported by the University of Moratuwa Senate Research Grant No. SRC/CAP/17/03.

References

Aaltonen,

Arvola,

Heikkilä and

Lammi, Hello pepper, may I tickle you?: Children’s and adults’ responses to an entertainment robot at a shopping mall, in: Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human–Robot Interaction, ACM, 2017, pp. 53–54.

R.C.

Arkin, Behavior-Based Robotics, MIT Press, 1998.

C.L.

Bethel and

R.R.

Murphy, Review of human studies methods in HRI and recommendations, International Journal of Social Robotics 2(4) (2010), 347–359. doi:10.1007/s12369-010-0064-9.

Chatterjee,

Pulasinghe,

Watanabe and

Izumi, A particle-swarm-optimized fuzzy-neural network for voice-controlled robot systems, IEEE Transaction on Industrial Electronics 52(6) (2005), 1478–1489. doi:10.1109/TIE.2005.858737.

M.-T.

Chu,

Khosla,

S.M.S.

Khaksar and

Nguyen, Service innovation through social robot engagement to improve dementia care quality, Assistive Technology (2016).

P.D.

Ellis, The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results, Cambridge University Press, 2010.

J.F.

Ferreira and

Dias, Attentional mechanisms for socially interactive robots – a survey, IEEE Transactions on Autonomous Mental Development 6(2) (2014), 110–125, ISSN 1943-0604. doi:10.1109/TAMD.2014.2303072.

H.-M.

Gross,

Scheidig,

Debes,

Einhorn,

Eisenbach,

Mueller,

Schmiedel,

T.Q.

Trinh,

Weinrich,

Wengefeld,

Bley and

Martin, ROREAS: Robot coach for walking and orientation training in clinical post-stroke rehabilitation – prototype implementation and evaluation in field trials, Autonomous Robots (2016), 1–20.

Guadarrama,

Riano,

Golland,

Go,

Jia,

Klein,

Abbeel,

Darrell et al., Grounding spatial relations for human–robot interaction, in: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2013, pp. 1640–1647. doi:10.1109/IROS.2013.6696569.

10.

Hemachandra,

Duvallet,

T.M.

Howard,

Roy,

Stentz and

M.R.

Walter, Learning models for following natural language directions in unknown environments, in: 2015 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2015, pp. 5608–5615. doi:10.1109/ICRA.2015.7139984.

11.

Hunt, Jspeech grammar format, W3C Note, June 2000.

12.

A.G.B.P.

Jayasekara,

Watanabe and

Izumi, Understanding user commands by evaluating fuzzy linguistic information based on visual attention, Artificial Life and Robotics 14(1) (2009), 48–52. doi:10.1007/s10015-009-0716-8.

13.

A.G.B.P.

Jayasekara,

Watanabe,

Kiguchi and

Izumi, Interpreting fuzzy linguistic information by acquiring robot’s experience based on internal rehearsal, Journal of System Design and Dynamics 4(2) (2010), 297–313. doi:10.1299/jsdd.4.297.

14.

A.G.B.P.

Jayasekara,

Watanabe,

Kiguchi and

Izumi, Adaptation of robot’s perception of fuzzy linguistic information by evaluating vocal cues for controlling a robot manipulator, Artificial Life and Robotics 15(1) (2010), 5–9. doi:10.1007/s10015-010-0755-1.

15.

Jayawardena,

Kuo,

Broadbent and

B.A.

MacDonald, Socially assistive robot HealthBot: Design, implementation, and field trials, IEEE Systems Journal (2014), 1–12, ISSN 1932-8184. doi:10.1109/JSYST.2014.2337882.

16.

Jayawardena,

Watanabe and

Izumi, Controlling a robot manipulator with fuzzy voice commands using a probabilistic neural network, Neural Computing and Applications 16(2) (2007), 155–166. doi:10.1007/s00521-006-0056-8.

17.

Kawamura,

Bagchi and

Park, An intelligent robotic aid system for human services, in: NASA Conference Publication, NASA, 1994, pp. 413–420.

18.

Lanillos,

J.F.

Ferreira and

Dias, Designing an artificial attention system for social robots, in: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 4171–4178. doi:10.1109/IROS.2015.7353967.

19.

C.-T.

Lin and

M.-C.

Kan, Adaptive fuzzy command acquisition with reinforcement learning, IEEE Transaction on Fuzzy Systems 6(1) (1998), 102–121. doi:10.1109/91.660811.

20.

Mavridis, A review of verbal and non-verbal human–robot interactive communication, Robotics and Autonomous Systems 63 (2015), 22–35. doi:10.1016/j.robot.2014.09.031.

21.

M.A.V.J.

Muthugala and

A.G.B.P.

Jayasekara, Enhancing human–robot interaction by interpreting uncertain information in navigational commands based on experience and environment, in: 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 2915–2921. doi:10.1109/ICRA.2016.7487456.

22.

M.A.V.J.

Muthugala and

A.G.B.P.

Jayasekara, MIRob: An intelligent service robot that learns from interactive discussions while handling uncertain information in user instructions, in: 2016 Moratuwa Engineering Research Conference (MERCon), IEEE, 2016, pp. 397–402.

23.

M.A.V.J.

Muthugala and

A.G.B.P.

Jayasekara, Interpreting fuzzy directional information in navigational commands based on arrangement of the surrounding environment, in: 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2017, pp. 1–7. doi:10.1109/FUZZ-IEEE.2017.8015542.

24.

M.A.V.J.

Muthugala and

A.G.B.P.

Jayasekara, A review of service robots coping with uncertain information in natural language instructions, IEEE Access 6 (2018), 12913–12928. doi:10.1109/ACCESS.2018.2808369.

25.

Portet,

Vacher,

Golanski,

Roux and

Meillon, Design and evaluation of a smart home voice interface for the elderly: Acceptability and objection aspects, Personal and Ubiquitous Computing 17(1) (2013), 127–144. doi:10.1007/s00779-011-0470-5.

26.

Pulasinghe,

Watanabe,

Izumi and

Kiguchi, Modular fuzzy-neuro controller driven by spoken language commands, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(1) (2004), 293–302. doi:10.1109/TSMCB.2003.811511.

27.

Schiffer and

Ferrein, Decision-theoretic planning with fuzzy notions in GOLOG, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 24 (2016), 123–143. doi:10.1142/S0218488516400134.

28.

Schiffer,

Ferrein and

Lakemeyer, Reasoning with qualitative positional information for domestic domains in the situation calculus, Journal of Intelligent & Robotic Systems 66(1–2) (2012), 273–300. doi:10.1007/s10846-011-9606-0.

29.

Skubic,

Matsakis,

Chronis and

Keller, Generating multi-level linguistic spatial descriptions from range sensor readings using the histogram of forces, Autonomous Robots 14(1) (2003), 51–69. doi:10.1023/A:1020927503616.

30.

Tan,

Ju and

Liu, Grounding spatial relations in natural language by fuzzy representation for human–robot interaction, in: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, 2014, pp. 1743–1750. doi:10.1109/FUZZ-IEEE.2014.6891797.

31.

Tapus,

M.J.

Mataric and

Scasselati, The grand challenges in socially assistive robotics, IEEE Robotics and Automation Magazine 14(1) (2007), 35–42. doi:10.1109/MRA.2007.339605.

32.

Tellex,

Kollar,

Dickerson,

M.R.

Walter,

A.G.

Banerjee,

Teller and

Roy, Understanding natural language commands for robotic navigation and mobile manipulation, in: Proc. Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI Press, 2011, pp. 1507–1514.

33.

Whitney,

Eldon,

Oberlin and

Tellex, Interpreting multimodal referring expressions in real time, in: 2016 IEEE Int. Conf. Robotics and Automation (ICRA), IEEE, 2016, pp. 3331–3338. doi:10.1109/ICRA.2016.7487507.

Improving the understanding of navigational commands by adapting a robot’s directional perception based on the environment

Abstract

Keywords

1. Introduction

1 www.pypi.org/project/SpeechRecognition/3.1.1

3.1. Structure of the user commands

3.2. Interpretation of directional notions

3.2.1. Rationale behind the proposed method

4.1. Experimental setup

5. Conclusion

Footnotes

Acknowledgements

References

¹
www.pypi.org/project/SpeechRecognition/3.1.1