Machine vision system for robotic navigation in a residential environment

Abstract

This article delves into the integration between CNN-based artificial vision and robotic navigation algorithms with the aim of efficient autonomous driving of a tracked mobile robot in residential environments. The development is based on a machine vision system, through a camera mounted on the robot, capturing scenes from different environments within a residential home to identify its current location.

PROBLEM:

Robotic navigation’s kinematics are usually implemented in spatial coordinates of an unknown environment, thus limiting the human-robot interaction to a naive completion of commands by ignoring the potential behind the environmental context in which the robot behaves. The integration of artificial vision into robotic navigation is expected to enhance a robot’s performance in supporting domestic environment tasks.

METHODOLOGY:

To achieve the identification of the robot’s location and its direction of movement, a convolutional neural network is employed, which has two branches that identify different aspects of the environment from the robot’s perspective. Once a destination is set within the environment, a branched exploration algorithm is implemented, allowing the robot to navigate while knowing its location.

RESULTS:

Mobile robotic algorithms for path planning and obstacle avoidance were implemented along with a 98.33% accuracy CNN measured on its capacity to identify residential rooms from the robot’s first-person perspective. These algorithms’ incorporation resulted in the successful guidance of a tracked differential mobile robot through the rooms of a virtual residential environment, avoiding obstacles in the process and identifying locations through which the robot crosses.

Keywords

Autonomous driving computer vision obstacle avoidance path planning residential environments

1 Introduction

The technological development of the 21st century has led to the creation of so-called intelligent systems, which are now evident in mobile phones and computational algorithms, among other things. These advancements go hand in hand with the development of robots that, since the last century, have supported industrial cutting-edge tasks, and are now even present in homes. The interaction between humans and robots has become increasingly common, but it still requires numerous investigations and developments in this field. For instance, enabling a human to instruct a robot to move (navigate) to a specific location known by a given name (e.g., kitchen) rather than using numerical coordinates requires the robot to visually recognize that location.

Robotic navigation algorithms continuously evolve to adapt to the displacement requirements of various robot types. In home or residential settings, service robots necessitate trajectory planning [1], obstacle avoidance [2], environmental adaptation models [3], and perception [4] as essential components. In the context of assistive robotics, the navigation requirements dictate the selection of algorithms to be applied [5], encompassing image prediction [6], reinforcement learning [7, 8], fuzzy systems [9], fuzzy neuro systems [10], and deep learning [11].

In the current state of the art, deep learning stands out as one of the most effective techniques in pattern recognition [12, 13]. Research utilizing these techniques spans diverse applications, from smart traffic lights [14, 15] and hyperspectral image analysis [16] to robotic vision systems [17]. In the field of robotics, deep learning is applied to robot navigation primarily through Convolutional Neural Networks (CNN) [18], which serve as the primary algorithm for deep learning-based robot navigation.

The versatility of using Convolutional Neural Networks (CNNs) extends to various fields, such as [19] the detection of pandemic diseases and fraud detection, for example, based on web environments as described in [20]. Some applications involve modifying the base structure of the CNN architecture, as discussed in [21], where the authors employ a quantum graph convolutional network’ for predicting traffic congestion.

Robotic navigation also requires an analysis of the environment to avoid collisions. For instance, in [22], sensing methods utilizing obstacle clustering are proposed to identify road boundaries. In [23], algorithms for automated parking systems are presented, including regional differentiation accuracy requirements for Automated Valet Parking (AVP), such as ramp area, surface fluctuation area, and narrow area, which are essential in the final stage of roboticnavigation.

The cooperation between pattern recognition and navigation algorithms in assistive robots is an ongoing area of development and research [24 –26]. This reinforces the necessity for coaction between artificial intelligence and machine vision algorithms in achieving autonomous navigation [27], with the perception of the robot’s surroundings being a fundamental factor in this process [28]. In [29], a navigation algorithm based on nodes is presented and validated in closed simulation environments for drone navigation. It employs transfer learning with convolutional networks, using the AlexNet network architecture, intending to reduce energy consumption. Similarly, in [30], the development of a drone navigation algorithm using convolutional networks is presented, focusing on reducing energy consumption, in open environments.

Based on the referenced state of the art, this document presents the development of a robotic navigation algorithm in a residential environment. It relies on environmental perception through convolutional networks and a navigation algorithm, serving as a foundation for the design and implementation of a home-assistant robot. Given the high computational cost of convolutional architectures used for transfer learning [31, 32], the proposed design introduces a dual-branch Convolutional Neural Network (DAG-CNN) with a maximum of 6 convolution layers. This architecture is intended for use with embedded systems tailored for mobile robots. The network identifies different environments within a residential setting, accompanied by a navigation algorithm that enables the robot to move between these environments.

The main contribution of this project lies in the design of a network architecture and the evaluation of the navigation algorithm, operating not under coordinates as is commonly done but under the direct recognition of the robot’s environment through the proposed convolutional network. Sensors are employed for the avoidance of surrounding obstacles, allowing a residential robot to adjust its navigation even when conditions of the environment change, implying typical obstacles present in residential environments such as the swapping of a dining table, changing the location of a sofa, and others.

This article is divided into four sections: the current introduction, the methodology section where recognition and navigation algorithms are presented, the results section validating robotic navigation in a virtual residential environment, and finally, the conclusions attained.

2 Methodology

The navigation of robots within a residential environment, as outlined in this paper, is proposed to be achieved through the integration of two distinct modules. The first approach utilizes an algorithm that employs convolutional networks to recognize various spatial zones within the residential setting. The second approach involves the use of a robotic displacement algorithm achieved through trajectory planning. Detailed explanations for each of these methods are provided below.

2.1 Machine vision algorithm

The navigation algorithm is designed for a mobile robot operating within a residential environment, and its initial functionality revolves around identifying the robot’s position. To determine the robot’s location, a deep learning algorithm based on convolutional neural networks is employed [33].

Considering the unique characteristics associated with recognizing individual spaces within a residential environment, a two-branch DAG-CNN network was implemented. This network architecture is tailored to distinguish both the general and specific features of spaces such as bedrooms, bathrooms, dining rooms, or TV rooms. With the goal of training the chosen network architecture, a database containing images of various rooms within residential environments was constructed. These images were captured from the anticipated viewpoints and heights of the robots perspective. This database includes a total of 426 training images and 120 validation images. Figure 1 offers a visual representation of the database. What the network will do is allow the robot to have a general awareness of its location within a house or apartment.

Fig. 1

Sample from the database.

The convolutional neural network architecture is show in Fig. 2. It was developed with the primary objective of classifying images into one of four categories: bedroom, living room, dining room, and bathroom. It’s important to emphasize that no distinction was made between rooms of the same type, as residential environments may contain multiple instances of spaces falling within the four selected categories. The architecture of the two-branch DAG-CNN network is detailed in Fig. 2.

Fig. 2

Architecture of the implemented two-branch DAG-CNN.

In Fig. 2, the design parameters of the network are established for each convolutional layer (C), such as the kernel (k), the number of filters (F), padding (P), and stride (s). Additionally, the kernel for the pooling stage is denoted as (KMP), where M implies the use of the maximum method. These parameters are defined up to the final classification stage of the network in the fully connected layer (FC) [33]. The network is designed based on equations that determine the dimensions of the volumes for each layer of the network according to width, height, and depth, as outlined in equations (1), (2), and (3) respectively. $W_{j} = \frac{W_{j - 1} - f + 2 P}{S} + 1$ (1) $H_{j} = \frac{H_{j - 1} - f + 2 P}{S} + 1$ (2) $D_{j} = K_{j - 1}$ (3)

The network’s hyperparameters, which define the filter size, number of filters, stride (step displacement between filters), and padding, were established through an iterative process. As depicted in Fig. 3, the optimal training performance reached an efficiency rate of 98.33% after 100 iterations. The training process took 36 minutes on a computer equipped with an Intel Core i7 processor and an NVIDIA RTX 3070 GPU with 16 GB of memory. The optimizer used is based on Stochastic Gradient Descent (SGD), 100 epochs are employed to limit the training time. The kernel used in all cases corresponds to a randomly initialized square filter so that its size reduces as the network deepens to determine specific features of the visual area to be learned.

Fig. 3

DAG-CNN training progress.

The performance of the implemented DAG-CNN was evaluated through the confusion matrix presented in Fig. 4, on which precision and accuracy metrics are presented for each category on the vertical and horizontal axis respectively. A total of 100% precision was evaluated when classifying dining rooms and bedrooms, and 100% accuracy was achieved when distinguishing bathrooms and dining rooms from the other room types.

Fig. 4

Confusion matrix of the DAG-CNN.

The root cause for the mislabeling of categories was identified by checking isolatedly on error samples, being the vision limitants that semilateral perspective view points that entails omission of key room characteristics needed for differentiating room types.

2.2 Algorithm of Planning of trajectories

The four components encompassing trajectory planning, namely perception, localization, cognition, and control [27] were addressed utilizing the MATLAB^® Robotics System Toolbox. This toolbox offers a comprehensive set of tools and algorithms integrated into Simulink, making it an ideal choice for communication with a virtual residential environment previously constructed in MATLAB's VRML development environment tool “3D World Editor".

The trajectory planning algorithm consists of four essential modules. Firstly, the development of the kinematic model, which determines the mobile robot’s pose (x, y, θ) with respect to the global reference. Secondly, the Rapidly Exploring Random Tree star (RRT*) global path planning algorithm, that calculates both a feasible and optimal path from an initial to a final point within the virtual residential environment. Thirdly, the inverse dynamics method, which computes the rotation speed of each track, ensuring accurate tracking of the expected pose, and lastly, the Vector Field Histogram (VFH) obstacle avoidance module, that dynamically adjusts the robot’s trajectory in real-time to avoid collisions with any obstacles encountered along the beforehand planned path.

The obstacle evasion algorithm relies on the ‘Vector Field Histogram’ (VFH) technique, allowing the robot to determine the best obstacle-free direction for following a predefined path. This is achieved by continuously creating a polar histogram, as shown in Fig. 5, using data from a distance Lidar sensor [34].

Fig. 5

Representation of the polar histogram generated in the virtual environment for obstacle avoidance using VFH. HT: Histogram Threshold, RR: Robot Radius, SD: Safety Distance, StD: Turning Direction.

The VFH algorithm’s programming involves three critical parameters that require fine-tuning through trial and error. These parameters, known as the “weights of the cost function,” include the target steering weight, current steering weight, and previous steering weight. They are calculated to prioritize an obstacle-free path among available options. The specific values for these parameter weights were determined through iterative simulations of a straightforward trajectory involving the avoidance of two obstacles, as demonstrated in Fig. 6.

Fig. 6

Tuning the weights of the cost function. a) Global trajectory determined with RRT*. b) Target direction weight = 3, current direction weight = 3, previous direction weight = 3. c) Target direction weight = 3, current direction weight = 5, previous direction weight = 1.

While the VFH algorithm performs well on its own, it can be enhanced, especially when confronted with abrupt changes in direction. These situations can compromise the safety distance initially set to safeguard the mobile robot from collisions due to the continuous updates between the target direction and the obstacle-free direction. To tackle this issue, an integrative factor and a derivative factor were introduced. These factors are designed to amplify the importance of deviations recommended by the VFH algorithm.

The determination of gains for both the integrative and derivative factors follows an iterative process. Considering their impact on the robot’s trajectory tracking behavior: The derivative parameter’s gain, when increased significantly, amplifies deviations signaled by the VFH algorithm, whereas the integrator parameter’s gain adds up a smoothing effect and retains a memory of the path covered by the mobile robot up to that point.

The practical implications of fine-tuning these parameters become evident in Fig. 6, where iterative adjustments were made. In Fig. 6(a), the occupancy map and the selected navigation route are displayed. Figure 6(b) illustrates the robot’s navigation with obstacle avoidance during the third iteration of the steering parameters, while Fig. 6(c) represents the fifth iteration, showcasing a noticeable improvement in the mobile robot’s path taken when evading the same presented obstacles.

3 Results and discussion

Both algorithms were seamlessly integrated into the overall system, contributing to route planning, route tracking, object avoidance, and location identification. The system was configured with starting and destination points strategically positioned within a virtual residential environment, aligning with the robot’s anticipated daily tasks, such as transporting specific objects from various locations within a house to the main bedroom. This operation followed the algorithm delineated in the flowchart depicted in Fig. 7.

Fig. 7

Flowchart navigation.

The validation of learning to discriminate the residential environment using the DAG-CNN network allows focusing the training on the objects that demarcate each of the environments, as shown in the Fig. 8, where the heat map (right) of the input image (left) highlights the learning. For example, in Fig. 8a it is evident that the recognition of the bathroom is centered on the toilet, for Fig. 8b it is observed that the recognition of the bedroom is centered on the bed. In other words, learning is based on the particular elements that determine each environment regardless of its location within it.

Fig. 8

Heat map of filters activations.

In the navigation of the robot within the real environment, the recognition of each environment is validated according to the classification of the network according to the capture of the image from the robot’s perspective as shown in Fig. 9. Figure 9a illustrates the recognition of the dining area from a perspective that is not included in the training database. Figure 9b illustrates the recognition of the television area successfully.

Fig. 9

Final environment recognition. a. Dinner. b. TV c. Bedroom d. Bathroom.

Figure 9c illustrates the recognition of the room area from an untrained perspective, where the intersection between two spaces (left and right) stands out, which would eventually allow the robot to determine where to go. In the case of the last (right) Fig. 9d, there is a misclassification because it is an untrained and unlabeled environment, the network only has the possibility of generating the four established outputs, so it labels each image within the one that most closely resembles the database.

With the robot’s location identified and a destination set within the virtual environment, the navigation algorithm generates potential trajectory branches, ultimately determining the optimal route among the available choices. This process is exemplified in Fig. 10 (on the left), where it seeks to minimize both travel time and distance (as depicted in Fig. 10 on the right). In Fig. 11, the two red dots within the corridor represent dynamic objects that the robot must avoid, underscoring the algorithm’s effective performance.

Fig. 10

Direct trajectory test.

Fig. 11

Obstacle trajectory test.

4 Conclusion

The implementation of derivative and integrative gains in the deviation signals has been demonstrated to significantly enhance the performance of the VFH obstacle avoidance algorithm, which implemented along with an optimized route planning algorithm such as RRT* results in a integral option algorithm for implementing over an assistant mobile robots path planning within a residential environment.

The utilization of the DAG-CNN architecture has demonstrated remarkable proficiency in accurately identifying rooms within residential environments. This achievement is notable, especially considering its shorter sequence of layers when compared to other well-established CNN designs like AlexNet, which consumes longer computation times in achieving comparable results.

The analysis of learning activations within the network facilitated the swift refinement of parameters, leading to faster convergence and enhanced accuracy during the training process. These insights also provided a clear view of the filter learning approach unique to each environment within the residential context. Similarly, an iterative process was applied to the navigation algorithm settings, resulting in smoother robot movement.

The results obtained up to this point, leads to a subsequent stage of the project, in which a voice command recognition system is to be added for the sake of instructing the robot where in a residential environment should it head to, and what action is expected it to perform.

Footnotes

Acknowledgments

The authors are grateful to Nueva Granada Military University for the funding of this project with code IMP-ING-3405 and titled “Prototipo robótico móvil para tareas asistenciales en entornos residenciales”.

References

Wang

, Tian

and Shao

, Home service robot task planning using semantic knowledge and probabilistic inference, Knowledge-Based Systems 204, 2020.

Yang

C.-H.

and Kang

S.-C.

, Collision avoidance method for robotic modular home prefabrication, Automation in Construction 130, 2021.

Wang

, Zhang

, Sheng

, Chen

and Liu

, Multi-style learning for adaptation of perception intelligence in home service robots, Pattern Recognition Letters 151 (2021), pp. 243–251.

Suwa

, Tsujimura

, Kodate

, Donnelly

, Kitinoja

, Hallila

, Toivonen

, Ide

, Bergman-Kärpijoki

, Takahashi

, Ishimaru

, Shimamura

and Yu

, Exploring perceptions toward home-care robots for older people in Finland, Ireland, and Japan: A comparative questionnaire study, Archives of Gerontology and Geriatrics 91, 2020.

Möller

, Furnari

, Battiato

, Härmä

and Farinella

G.M.

, A survey on human-aware robot navigation, Robotics and Autonomous Systems 145, 2021.

Ishihara

and Takahashi

, Empirical study of future image prediction for image-based mobile robot navigation, Robotics and Autonomous Systems 150, 2022.

Zhang

, Tian

, Zhang

and Duan

, Service skill improvement for home robots: Autonomous generation of action sequence based on reinforcement learning, Knowledge-Based Systems 212, 2021.

Zieliński

and Markowska-Kaczmar

, 3D robotic navigation using a vision-based deep reinforcement learning model, Applied Soft Computing 110, 2021.

Mishra

D.K.

, Thomas

, Kuruvilla

, Kalyanasundaram

and Ramalingeswara

, Prasad and A. Haldorai, Design of mobile robot navigation controller using neuro-fuzzy logic system, Computers and Electrical Engineering 101, 2022.

10.

Alroshan

, Asgher

, Hussain

, Shahzad

, Rasool

and Abu-Khadrah

, Virtual Trust on Driverless Cars Using Fuzzy Logic Design, 2022 International Conference on Business Analytics for Technology and Security (ICBATS) (2022), pp. 1–7.

11.

Siddarth

K.S.

, Barathraj

, Dhipika

, Vignesh

, Supriya

and R

S.S.

, Path Planning for Mobile Robots using Deep Learning Architectures, 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA) (2021), pp. 1–6.

12.

Krizhevsky

, Sutskever

and Hinton

G.E.

, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (2012), pp. 1097–1105.

13.

LeCun

, Bengio

and Hinton

, Deep learning, Nature 521(7553) (2015), pp. 436–444.

14.

Hassan

, Ming

K.W.

and Wah

C.K.

, A Comparative Study on HSV-based and Deep Learning-based Object Detection Algorithms for Pedestrian Traffic Light Signal Recognition, 3rd International Conference on Intelligent Autonomous Systems (ICoIAS), Singapore, (2020), pp. 71–76.

15.

Lee

, Chung

and Sohn

, Reinforcement Learning for Joint Control of Traffic Signals in a Transportation Network, IEEE Transactions on Vehicular Technology 69(2) (2020), pp. 1375–1387.

16.

, Zhou

, Song

, Gong

, Zhao

and Chang

C.-I.

, Unsupervised Hyperspectral Band Selection via Hybrid Graph Convolutional Network, in IEEE Transactions on Geoscience and Remote Sensing.

17.

Wan

and Goudos

, Faster R-CNN for multi-class fruit detection using a robotic vision system, Computer Networks 168.

18.

Sartori

, Zou

, Pei

and Yu

, CNN-based path planning on a map, 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO) (2021), pp. 1331–1338.

19.

Ajagbe

S.A.

and Adigun

M.O.

, Deep learning techniques for detection and prediction of pandemic diseases: a systematic literature review, Multimed Tools Appl (2023).

20.

Rawat

, Oki

, Chakrawarti

R.K.

, Adekunle

T.S.

, Lukose

J.M.

and Ajagbe

S.A.

, Autonomous Artificial Intelligence Systems for Fraud Detection and Forensics in Dark Web Environments, Informatica 47(9) (2023), pp. 51–62.

21.

, Liu

and Zheng

, Temporal-Spatial Quantum Graph Convolutional Neural Network Based on Schrödinger Approach for Traffic Congestion Prediction, in IEEE Transactions on Intelligent Transportation Systems 24(8), pp. 8677–8686.

22.

Han

, et al., Research on Road Environmental Sense Method of Intelligent Vehicle Based on Tracking Check, in IEEE Transactions on Intelligent Transportation Systems 24(1) (2023), pp. 1261–1275.

23.

Jiang

, Zhao

, Zhu

, Wang

and Du

, A Practical and Economical Ultra-wideband Base Station Placement Approach for Indoor Autonomous Driving Systems, Journal of Advanced Transportation 2022.

24.

Zhang

, Zhang

C.-H.

and Shao

, User preference-aware navigation for mobile robot in domestic via defined virtual area, Journal of Network and Computer Applications 173, 2021.

25.

Calderita

L.V.

, Vega

, Bustos

and Núñez

, A new human-aware robot navigation framework based on time-dependent social interaction spaces: An application to assistive robots in caregiving centers, Robotics and Autonomous Systems 145, 2021.

26.

, Rameshkumar

, V

A.P.

, B

J.T.

and N

A.M.

, An Intelligent Robot Assisting Medical Practitioners to Aid Potential Covid-19 Patients, 2021 Sixth International Conference on Image Information Processing (ICIIP) (2021), pp. 413–417.

27.

Wei

Pe.

, Yu

, Di

, Dai

, Wang

and Zeng

, Design of Robot Automatic Navigation under Computer Intelligent Algorithm and Machine Vision, Journal of Industrial Information Integration, 2022.

28.

Ran

, Yuan

and Zhang

J.b.

, Scene perception based visual navigation of mobile robot in indoor environment, ISA Transactions 109 (2021), pp. 389–400.

29.

Anwar

and Raychowdhury

, Autonomous Navigation via Deep Reinforcement Learning for Resource Constraint Edge Nodes Using Transfer Learning, in IEEE Access 8 (2020), pp. 26549–26560.

30.

Osei Agyemang

, Zhang

, Acheampong

, Adjei-Mensah

, Gyamfi

E.O.

, Arhin

J.R.

, Ayivi

and Amuche

C.I.

, RPNet: Rotational pooling net for efficient Micro Aerial Vehicle trail navigation, Engineering Applications of Artificial Intelligence 116 (2022), 105468.

31.

Shao

, Zhu

and Li

, Transfer Learning for Visual Categorization: A Survey, in IEEE Transactions on Neural Networks and Learning Systems 26(5) pp. 1019–1034, May 2015.

32.

Maligalig

K.C.

, Amante

A.D.

, Tejada

R.R.

, Tamargo

R.S.

and Santiago

A.F.

, Machine Vision System of Emergency Vehicle Detection System Using Deep Transfer Learning, 2022 International Conference on Decision Aid Sciences and Applications (DASA) (2022), pp. 1464–1468.

33.

Zeiler

M.D.

and Fergus

, Visualizing and Understanding Convolutional Networks, In: European Conference on Computer Vision. (ECCV). Lecture Notes in Computer Science 8689 (2014), pp. 818–833. Zurich, Switzerland.

34.

Borenstein

, Member, IEEE and Koren

, Senior Member, IEEE. The vector field histogram - fast obstacle avoidance for mobile robots, IEEE Journal of Robotics and Automation 7(3) (1991), pp. 278–288.