Investigation on position and attitude estimation and control of manipulator based on machine vision

Abstract

With the development of science and technology, people have higher and higher requirements for robots. The application of robots in industrial production is also increasing, and there are more applications in people’s lives. Therefore, robots must have a better ability to receive and process the external environment. Therefore, visual servo system appears. Pose estimation is a major problem in the current vision system. It has great application value in positioning and navigation, target tracking and recognition, virtual reality and motion estimation. Therefore, this paper put forward the research of robot arm pose estimation and control based on machine vision. This paper first analyzed the technology of machine vision, and then carried out experiments. The accuracy and stability of the two methods for robot arm pose estimation were compared. The experimental results showed that when the noise of Kalman’s centralized data fusion method was 1 pixel, the maximum error of the X-axis angle was only 0.55, and the average error was 0.02. In Kalman’s distributed data fusion method, the average error of X-axis displacement was 0.06, and the maximum value was 17.66. In terms of accuracy, Kalman’s centralized data fusion method was better. In terms of stability, Kalman’s centralized data fusion method was also better. However, in general, these two methods had very good results, and could accurately control the position and posture of the manipulator.

Keywords

Position and attitude estimation of manipulator machine vision kalman filter world coordinate system

1 Introduction

At the beginning, robots were mainly used in industrial production. Their working environment was relatively simple, and they could perform some special tasks, such as welding, grasping, handling, etc. With the rapid development of people’s living standards and science and technology, a large number of mobile robots have also appeared in people’s daily life. In the complex working environment, the work of robot is becoming more and more complex, which requires its high degree of intelligence and automation. In robots, machine vision is a very key technology. It can detect the surrounding environment, and find and determine the target, so as to make the robot have the ability of autonomy and intelligence and have the vision similar to human. This is a desirable field. Machine vision began in 1980 and has developed rapidly in recent years. The application of manipulator in industry is more and more extensive. However, the self-regulation ability of the robot is not strong, and it can only be used when it is already known. In the case of changing environmental conditions, it is difficult for robots to adjust automatically. Therefore, the mechanical arm based on machine vision has become the urgent need of manufacturers and researchers. In order to improve the autonomy and flexibility of robots and achieve more complex work, the robot arm based on machine vision has become an important topic in the current robot field. Machine vision is applied to the pose estimation and control of manipulator to solve the challenges in practical engineering and improve the application ability of manipulator in complex environment.

In the research of the manipulator, the researcher has established a test platform and carried out a detailed analysis and design on it. Since the dipole model can approximate most magnetic field sources, the problem discussed involves a broader range of pose estimation techniques. Taddese Addisu Z made an experimental demonstration of the closed-loop control of the tethered magnetic device using the developed pose estimation technology, so as to determine its applicability to the robot-guided capsule endoscope. By using this position and attitude estimation system, the development of closed-loop control and intelligent automation of magnetically driven capsule endoscope could be further promoted to clinical realization [1]. Billings Gideon introduced a new method, which could predict the position and pose of 6 degrees of freedom objects from monocular images [2]. From the video, the tracking of the 6 degrees of freedom pose of the object provided rich information for the robot to perform different tasks such as manipulation and navigation. Deng Xinke proposed a 6 degrees of freedom object pose tracking problem in the particle filter framework. Among them, the three-dimensional rotation and translation of objects were decoupled [3]. Machine vision technology could make robots have the same vision system as humans.

The use of machine vision can realize robot automatic recognition and positioning, and carry out motion planning and operation. Industry 4.0 requires a large number of robot mobile manipulators with high autonomy and intelligence. The goal is to complete the dexterous operation task without knowing the object state in the unstructured environment in advance. For mobile manipulator, it is very important to identify and detect objects quickly and accurately, and determine the manipulation position, so as to adjust its position and posture in the workspace. Chen Fei developed a stereo vision algorithm to estimate the position and pose of objects using point cloud data from multiple stereo vision systems. He proposed an improved iterative nearest point algorithm for pose estimation. The algorithm and some criteria for selecting and adjusting the position and posture of the robot by maximizing its operability on a given operation task were studied by using the position and posture input [4]. The estimation of the pose of multiple animals was a challenging computer vision problem: Frequent interaction would lead to occlusion and make the relationship between the detected key points and the correct individuals more complex. At the same time, compared with the typical multi-person scene, the interaction of animals that looked very similar was more close. In order to meet this challenge, Lauer Jessy provided high-performance animal assembly and tracking functions required for multi-animal scenarios based on the open-source pose estimation toolkit [5]. In order to solve the problem of grabbing block food in automatic food packaging machine, Yuan Fei took the pick-and-place parallel manipulator as the research object, combined with visual feedback and predictive control, and proposed a manipulator positioning control method [6]. However, they did not compare which method could better estimate and control the position and attitude of the manipulator. Traditional methods usually do not perform well in the face of occlusion or complex environment. Frequent occlusion or complex background may lead to the loss or mismatch of key points, thus affecting the accuracy of attitude estimation. Machine vision system can obtain the information of the target object in a non-invasive way through sensors such as cameras, without direct contact or interference with the object, which is helpful to realize more flexible and efficient robot operation.

In recent years, microgrid systems have become a research hotspot in the field of power systems, with significant implications for enhancing the reliability, security, and economy of power systems. However, due to the nonlinear characteristics of microgrid systems and the presence of external disturbances, effectively controlling and managing them remains a challenge. In the research by Liu et al., they proposed an optimal voltage restoration control method based on nonzero-sum and zero-sum game strategies to address the voltage restoration issue in microgrid systems [7]. Concurrently, they also introduced a consensus optimal control method for microgrids affected by external disturbances using multi-agent game strategies, further enhancing the robustness and stability of microgrid systems [8]. Additionally, they presented a fault-tolerant control method based on event-triggered and fuzzy adaptive consensus to address nonlinear factors in microgrid systems and achieve predefined performance requirements within a fixed time [9]. Regarding the application of machine vision in the estimation and control of mechanical arm positions and attitudes, this paper aims to delve into relevant issues in this field and propose a novel approach. To enhance the intelligence and adaptability of robot applications, this paper proposes research on robot arm position estimation and control based on machine vision. Initially, the paper outlines the technical requirements in a general environment and subsequently provides a detailed introduction to machine vision technology. This technology is then applied to robots, endowing them with a visual system similar to that of humans. Finally, in the experimental section, two methods under the servo system are compared. The experimental results demonstrate that both methods can accurately control the robot arm, with Kalman’s centralized data fusion method exhibiting superior accuracy and stability. The innovation of this paper lies in not only introducing various technologies but also employing them and conducting experimental comparisons, thereby effectively highlighting their advantages and disadvantages.

The article makes significant contributions in three aspects:

•
Advancement of Machine Vision Technology: The paper extensively discusses the application of machine vision technology in the industrial sector, emphasizing its importance in product inspection, quality control, defect detection, and integration with industrial robots. It lays the groundwork for further research and development in machine vision technology.
•
Development of Vision-Based Manipulator Control Systems: The research explores methods and techniques for precise control of manipulator using visual feedback, including direct visual servo and look-and-move systems. It provides crucial insights for the advancement of robotics.
•
Application of Kalman Filter Fusion Methods: By comparing centralized and distributed data fusion techniques, the article evaluates their performance in manipulator position and attitude estimation. This offers vital technical support for developing more reliable and efficient robotic systems.

Overall Structure of the Paper: In Section 1, the background and significance of research on robot pose estimation and control based on machine vision are introduced. In Section 2, the application of machine vision technology in the industrial domain is discussed. In Section 3, the use of multi-camera systems to improve the accuracy of robot pose estimation is explored, and its performance is evaluated. Section 4 concludes the paper.
2 Machine vision algorithm

2.1 Machine vision technology and its application in industry

Machine vision refers to the use of image processing technology to realize machine recognition of objects and scenes, so as to obtain high-quality images and accurate information for the detection, measurement and control of industrial processes. It is widely used, such as product inspection, logistics transportation and factory automation. In short, it refers to the use of computer aided design or computer vision and other related technologies to replace the human eye to complete some tasks that are called impossible for the human eye in the field of visual perception.

Machine vision technology is the most widely used technology in the industrial field, which has broad market prospects and great application value. Machine vision is mainly used for product quality detection, such as product contour, length, weight and other appearance quality detection; object surface profile detection; inspection of workpiece size and shape appearance; classification and identification of workpiece defects; workpiece surface roughness detection; defect detection of mechanical parts; part positioning accuracy detection in assembly, etc.

Machine vision technology and information technology play an important role in enhancing industrial competitiveness. In addition, compared with conventional industrial robots, industrial robots with visual functions would improve production efficiency and effectively solve production safety problems. Therefore, it is very necessary to study the vision technology of industrial robots.

With the integration of industrial robot and machine vision technology, the robot’s position and posture expression, image coordinate system and robot coordinate system conversion are very important technologies [10, 11]. In each coordinate system of the robot, the computational complexity and convenient conversion must be guaranteed. In the coordinate system, the relative position relationship between image coordinate system and camera coordinate system, image coordinate system and world coordinate system, camera coordinate system and world coordinate system are the most important reference relationships. After determining the relative position relationship between the points, the transition from the image coordinate system to the world coordinate system is realized by establishing the matrix operation formula and calculating the corresponding transformation matrix.

At present, mature machine vision systems in the market are built based on special equipment. Generally, general-purpose computers are used for image processing, data acquisition, system analysis and control. Common product categories include electronic industrial robots, packaging machinery and auto parts, which all use machine vision system for automatic control. The object has a certain shape, so machine vision has broad application prospects. In the industrial production process, the manipulator is an important executive tool, which plays a vital role in the industrial production field. Its precise position and attitude control has a very important impact on the entire production process [12, 13].

Comparing Kalman centralized data fusion method with Kalman distributed data fusion method is helpful to understand their performance under different conditions. Determine whether the two methods are accurate in estimating the position and posture of the robot arm. This is especially important for tasks that require high precision control, such as fine operation or industrial applications that require high positioning accuracy.

2.2 Vision based manipulator control system

Target detection refers to the use of computer image processing technology to detect the contour and shape of the target, so as to identify the status of the end of the manipulator. Due to the change of camera imaging position, the image processing process is easily affected by environmental factors (such as illumination, object occlusion, movement, posture change, etc.) and the structural characteristics of the manipulator itself (such as joint movement, joint angle, rotation, length change, etc.). The relationship between arm length and arm span needs to be considered in the position and attitude estimation of the manipulator. In order to avoid the impact of light changes on image quality, image enhancement algorithms are generally used to reduce the impact of light changes on image quality.

In the field of industrial production, there are usually some special situations in the application scenario of the manipulator, such as the relatively complex environment and the large temperature variation of the working environment. In this case, it is of great significance to accurately control the position and attitude of the manipulator.

As can be seen from Fig. 1, the vision based closed-loop control system of the robot arm mainly includes visual feedback and pose estimation. Through image processing and feature extraction, the relative position of the target and the robot arm is estimated. The visual controller can determine the robot’s action according to different situations of the robot arm and make corresponding actions [14, 15].

Fig. 1

General control system of robot arm based on vision.

According to the position relationship between robot and camera, it can be divided into two types: handheld and eyewear. The former is installed at one end of the robot arm, and the camera can only see objects. The latter is installed in the relative position of the robot, and the end and object of the robot can be observed. When the eye is in the hand, the conversion between the camera coordinate system and the tail coordinate of the robot is constant, while the conversion between the eye camera coordinate system and the world coordinate system is constant. The camera can only see the target, so the usual structure of the eye on the hand is called Eye Open-Loop (EOL); the camera can see the end of the manipulator arm and the target at the same time, so the opponent structure of the eye is also called the Eye Closed Loop (ECL). The control accuracy of the ECL system is not affected by the calibration error of the camera and the end of the robot arm. However, when the robot approaches the target while performing the operation, the robot would block the target, thus affecting the work.

Cameras are divided into hand-held and glasses. In the hand-held situation, the camera is installed at one end of the mechanical arm, and only objects can be seen. In the case of glasses, the camera is installed in the relative position of the robot, and the terminal and objects of the robot can be observed. Visual feedback provides real-time information about the state of the target and the end of the manipulator, which is helpful for the system to perceive and understand the external environment.

Based on the direct control of joint angle, it can be divided into two types: direct visual servo and look-and-move. The servo controller is used to replace the joint controller of the robot arm to realize the direct control of each joint position of the robot; the look-and-move system uses visual information as the joint controller of the robot to realize closed-loop control through each joint of the robot [16, 17]. It is difficult for the vision system to reach a high sampling frequency. Even if it can reach a high sampling frequency, it also requires a high configuration. Therefore, the look-and-move system is generally used.

According to different feedback information, it can be divided into position-based visual servo control system and image-based visual servo control system [18, 19]. The former is a three-dimensional spatial coordinate, so it is also called a three-dimensional visual servo system; the latter is the plane space of two-dimensional image, so it is also called two-dimensional visual servo system. The motion control system of the pose robot arm is shown in Fig. 2. Through this method, the relative position of the camera and the object can be calculated. In the three-dimensional Cartesian space, the motion direction of the robot arm is Cartesian space. The separation of position estimation and controller makes it easy to realize. However, due to the calibration of the relative position relationship between the camera and the robot arm, the final implementation result depends on the accuracy of the target model.

Fig. 2

Manipulator control system based on posture.

2.3 Position and attitude estimation based on kalman filter

The method based on Kalman filter technology has been widely used in position and attitude estimation. The problem of pose estimation is nonlinear, and Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) are the extension of Kalman filter in nonlinear systems, so they can be used for pose estimation. This paper briefly described the construction of linear Kalman filter, and used EKF and UKF to estimate.

EKF deals with nonlinear problems by linearizing system model and measurement model. State prediction using nonlinear state transition function. The Kalman gain is calculated according to the linearized observation model, and the state estimation is updated according to the measurement.

UKF uses unscented transformation to deal with nonlinear transformation more accurately, avoiding explicit linearization of system model and measurement model.

In centralized data fusion, the data of all sensors are processed centrally, and the updating steps of Kalman filter include the updating of the overall system state and the updating of the overall error covariance.

In distributed data fusion, the data of different sensors are processed separately, and each sensor has its own Kalman filter. The updating step of Kalman filter is carried out on each sensor separately, and then the estimation results of each sensor are fused in some way.

(1) Linear Kalman filter

Kalman filter is a kind of linear system state formula, which is optimized by noise measurement. Therefore, it is necessary to understand the dynamics and measurement modes of linear systems. At each stage, it is assumed that the current situation is related to the previous situation and there are some uncertainties.

Kalman filtering is generally an optimal state estimation for a continuous state φ in a linear system. The linear system contains noise measurement. At each moment, this state φ _r is related to the previous state φ _r-1 and contains some uncertainties.

$φ_{r} = X φ_{r - 1} + p_{r}$ (1)

Among them, X is the dynamic matrix, and p _r is the zero mean Gaussian noise.

In addition, the state vector of the system has a certain relationship with the measured data, which can be represented by G _r.

$B_{r} = G_{r} φ_{r} + k_{r}$ (2)

Among them: G _r is the observation matrix. The optimal state estimation of Kalman filter is composed of Formula (1) and Formula (2).

Obviously, Formula (1) and Formula (2) are state formulas and measurement functions of linear dynamic systems, and their calculation formulas must be modified accordingly. The application range of Kalman filter can be extended to occasions with nonlinear dynamic and measurement functions. Therefore, two kinds of filters were proposed in this paper. In EKF, the linearization method based on the current state is used to estimate the system state, and the sigma point is used in UKF. Both methods are applied to the estimation of posture, as shown below.

(2) Position and attitude estimation based on EKF

Kalman filtering method is used to estimate the nonlinearity of the camera projection model, that is, the relative pose of the robot arm and the object, including the relative pose and velocity.

$φ_{r} = [d_{L}^{c}, α_{L}^{c}, {\dot{d}}_{L}^{c}, {\dot{α}}_{L}^{c},]$ (3)

Among them: φ _r is the state vector at time step r.

In the prediction phase, the system state and its error covariance are predicted based on system dynamics:

$\begin{matrix} {\hat{φ}}_{r / - r - 1} = X {\hat{φ}}_{r - 1} \\ Q_{r / - r - 1} = {XQ}_{r - 1} X^{D} + P \end{matrix}$ (4)

Among them, ${\hat{φ}}_{r / - r - 1}$ and ${\hat{φ}}_{r - 1}$ are prior and posterior state estimation vectors respectively; Q _r/-r-1 and Q _r-1 are error covariance matrices of a priori and a posteriori, respectively. The process noise matrix P is defined as follows:

$P = C {p_{r} p_{r}^{D}}$ (5)

Among them: C{ • } represents the expected function.

(3) Pose estimation based on UKF

UKF is a nonlinear transfer of mean and covariance using Unscented Transform (UT).

UKF is the estimation of system state and error covariance using samples. The sampling data is first selected, that is, the calculation of 2m+1 sampling points called the sigma point set. The weights of these sampling points are calculated as follows:

$S_{j} = \frac{1}{2 (m + η)}$ (6)

Among them: m refers to the dimension of the state, and η refers to the scaling parameter.

The first-order prediction and covariance matrix of the state vector are obtained by using the weighted sum of the prediction results of the Sigma point set. The weights are as follows:

${\hat{φ}}_{r / - r - 1} = \sum_{j = 0}^{2 m} S_{j} {\hat{φ}}_{r / - r - 1}^{j}$ (7)

$\begin{matrix} Q_{r / - r - 1} = \sum_{j = 0}^{2 m} \\ S_{j} {({\hat{φ}}_{r / - r - 1}^{j} - {\hat{φ}}_{r / - r - 1}) ({\hat{φ}}_{r / - r - 1}^{j} - {\hat{φ}}_{r / - r - 1})^{D}} + P_{r} \end{matrix}$ (8)

The state equation describes how the state of the system evolves, which is usually expressed as:

$x_{k} = {Fx}_{k - 1} + B_{uk} + w_{k}$ (9)

The observation equation is:

$z_{k} = {Hx}_{k} + v_{k}$ (10)

The results show that UKF can estimate Kalman filter more accurately when compared with EKF. However, this method can be approximated to a nonlinear system, and it is sub-optimized. It should be pointed out that the design of Kalman filter is based on zero mean Gaussian noise. When the system noise is non-Gaussian, the Kalman filter would not be the best linear estimation.

3 Position and attitude estimation of multi-camera system based on robot visual servo

3.1 Data fusion of multi-camera system for robotic arm visual servo

Machine vision system extracts and matches key feature points or markers by analyzing the images captured by the camera, thus establishing the corresponding relationship between the camera and the robot or the target object. This helps to determine the position of the object in the field of view relative to the camera. Machine vision system can monitor and identify the target object in real time, and provide real-time feedback to the robot by constantly updating the estimation results, so that it can adjust its position and posture quickly and accurately.

As mentioned earlier, many Robot Visual Servo (RVS) systems require high accuracy and robustness of pose estimation. In some RVS, the accuracy of pose estimation has a great impact on its performance, such as single-camera servo system. However, the limited Field of View (FOV) of a single camera would have a certain impact on the accuracy of pose estimation. The multi-camera structure is proposed to solve this problem. This structure can improve the accuracy and robustness of the estimation, which can also enhance the overall FOV. In most cases, multiple cameras can benefit from different scenes, because each camera’s perspective has a target. In addition, errors in object modeling would seriously affect the whole evaluation. So far, few works can solve this problem. Therefore, this paper proposed a method based on sensor fusion to maximize the use of the data of each camera and enhance its robustness. At the same time, the adoption of data collected by multiple cameras can effectively improve the accuracy and robustness of the image.

This paper introduced the proposed pose estimation method. Two suitable fusion modes were proposed: centralized mode and distributed mode. Although the centralized fusion technology has high accuracy, it has a large amount of computation. Distributed fusion can improve the speed of the system at the expense of a certain accuracy. The centralized mode is to combine the measurement results and take the fused data as input; in the distributed mode, the parameters of each position are estimated, and then their states are fused. Therefore, these two methods are called measurement and state fusion respectively.

(1) Centralized data fusion based on Kalman

By using sensor fusion technology, the potential of multiple sensors is fully understood. It is generally divided into centralized and decentralized. Centralized fusion is also called measurement fusion. It concentrates all measurement data in a central unit, and then estimates the location from the existing measurement results. The centralized fusion technology has the advantages of small information loss and high accuracy, but its computing cost has also increased significantly due to the increase in the number of sensors. In addition, this paper integrated the data of all sensors. Without identification, it is easy to generate errors.

Centralized data fusion based on Kalman is a method for estimating the position and attitude of the target, which improves the accuracy of estimation by fusing information from multiple sensors. Initialize the Kalman filter to estimate the initial state and uncertainty of the target. Calculate the Kalman gain, measure the uncertainty between prediction and actual measurement, and adjust the prediction value. Update the state vector and covariance matrix to fuse the measurement information and improve the accuracy of estimation.

Kalman filter is a continuous estimation method that can obtain the best estimation result under certain circumstances. The biggest advantage of Kalman filter is its simplicity and optimality, so it has great advantages in real-time. Due to the nonlinearity of many measurement systems, EKF is independent in many fields, such as tracking, navigation, position and attitude estimation. However, without knowing the system noise condition, the EKF performance is significantly reduced. In addition, EKF assumes that the linear model is only locally effective, so it is very sensitive to high-order dynamic systems.

(2) Distributed data fusion based on Kalman

A discrete data fusion method is adopted. Under the premise of maintaining the operation cost, the position and attitude estimation with high accuracy and robustness can be obtained. The camera information of Eyes to Hands (ETH) and Eye in Hand (EIH) is used to estimate the relative position and attitude of the end of the manipulator and the target. The estimated value can be used for further fault detection. When the EKF filtering result is found to be defective, the position and pose with errors would be eliminated in the fusion process. The distributed fusion technology is used to fuse the results of pose estimation accurately.

3.2 Evaluation of simulation results

In the MATLAB environment, the model simulation was carried out. There was one robot and two cameras. Among them, one had four round feature points (black), and the other had four feature points (black). The object was stationary, and the camera was located at one end of the robot and faced the object. Therefore, the image of an object in the camera was always a cube or another face. In the simulation process, the robot arm started from a starting point without singularity, and then moved to the desired position.

The manipulator starts from a starting point without singularity, and the end effector of the robot is simulated by Euler numerical integration method. Using the knowledge of robot kinematics, the manipulator is controlled to move to the desired position. Simulate the real environment by introducing noise.

The robot’s kinematics knowledge was used to control it, and the results were compared with the position and attitude estimation results. The initial phase of ETH was A1, while the initial phase of EIH was A2. The end effector of the robot was simulated using Euler numerical integration method. The sampling time was 2 milliseconds. The relative position of the robot arm and the target object was transl (0,0,2.2), and the measurement noise R was divided into 1 pixel, 2 pixels and 3 pixels. Calculate the standard deviation and variance of attitude estimation error. Standard deviation is a measure of the dispersion of error distribution, and variance is the square of standard deviation. This can help to evaluate the stability of estimation. Smaller standard deviation and variance usually indicate more stable performance.

(1) Simulation result analysis of centralized data fusion based on Kalman

In this part, this paper would simulate under different conditions such as 1 pixel, 2 pixels and 3 pixels, and compare the simulation results (comparison angle). Figure 3 was the relative pose estimation of Kalman-based ensemble data fusion, and Fig. 4 was the relative pose estimation error of Kalman-based centralized data fusion. Table 1 was the relative pose estimation error statistics of centralized data fusion based on Kalman. The maximum value referred to the maximum absolute value of the estimation error. As shown in Table 1, under the same image noise condition, the mean and maximum value of the Kalman estimation deviation during ensemble data fusion were obviously very small. When the noise was 1 pixel, the maximum error of the X-axis angle was only 0.55. The mean error was 0.02, which could even be ignored. This also showed that the Kalman method for cluster data fusion had good accuracy and better stability.

Fig. 3

Kalman-based centralized data fusion relative pose estimation.

Fig. 4

Relative pose estimation error of centralized data fusion based on Kalman.

Table 1

Statistical table of relative pose estimation error of centralized data fusion based on Kalman

		Mean		Max
	1 pixel	2 pixels	3 pixels	1 pixel	2 pixels	3 pixels
Angle(X)	0.02	0.02	0.03	0.55	0.51	0.48
Angle(Y)	–0.01	–0.01	–0.02	0.35	0.38	0.40
Angle(Z)	0	0	0	0.22	0.30	0.30

As shown in Fig. 3, Fig. 3 (a) showed the relative X-axis position and attitude estimation angle of centralized data fusion based on Kalman; Fig. 3 (b) showed the relative Y-axis position and attitude estimation angle of centralized data fusion based on Kalman; Fig. 3 (c) showed the relative Z-axis position and attitude estimation angle of centralized data fusion based on Kalman. It could be seen from the figure that the deviation of posture estimation increased with the increase of noise. The greater the noise, the more the estimation result deviates from the standard value.

As shown in Fig. 4, Fig. 4 (a) showed the angle error of X-axis relative position and attitude estimation based on Kalman centralized data fusion; Figure 4 (b) showed the angle error of Y-axis relative position and attitude estimation based on Kalman centralized data fusion; Figure 4 (c) showed the angle error of Z-axis relative position and attitude estimation based on Kalman centralized data fusion. It could be seen from the figure that the error of pose estimation based on Kalman’s centralized data fusion was relatively small, and the error did not increase too much with the increase of noise.

(2) Simulation result analysis of distributed data fusion based on Kalman

In this part, the distributed data based on Kalman was fused into 1 pixel, 2 pixels, 3 pixels and other image noise environments for simulation, and the Kalman method was used to compare the position and attitude estimation results after the distributed data fusion (contrast displacement). Figure 5 was the relative pose estimation of Kalman method in distributed data fusion, and Fig. 6 was the relative pose estimation error of distributed data fusion based on Kalman method. Table 2 was the estimation table of related positions and postures. As shown in Table 2, when the noise was 1 pixel, the mean value of X-axis error was 0.06, and the maximum value was 17.66. Compared with the above methods, it can be seen that its accuracy was lower than the centralized data fusion method of Kalman.

Fig. 5

Relative pose estimation of distributed data fusion based on Kalman.

Fig. 6

Relative pose estimation error of distributed data fusion based on Kalman.

Table 2

Statistics of relative pose estimation error of distributed data fusion based on Kalman

	Mean			Max
	1 pixel	2 pixels	3 pixels	1 pixel	2 pixels	3 pixels
Displacement(X)	0.06	–0.02	0.55	17.66	18.99	19.63
Displacement(Y)	–0.38	–0.56	0.44	14.89	15.62	21.77
Displacement(Z)	–0.26	–0.21	–0.46	4.66	5.88	5.62

As shown in Fig. 5, Fig. 5 (a) showed the relative X-axis position and attitude estimation displacement based on Kalman distributed data fusion; Figure 5 (b) showed the relative Y-axis position and attitude estimation displacement of distributed data fusion based on Kalman; Figure 5 (c) showed the relative Z-axis position and attitude estimation displacement of distributed data fusion based on Kalman. As shown in the figure, the deviation of pose estimation increased with the increase of image noise. However, compared with the previous, the estimation error of the distributed data fusion based on Kalman was relatively small and had no significant improvement.

As shown in Fig. 6, Fig. 6 (a) showed the displacement error of X-axis relative position and attitude estimation based on Kalman distributed data fusion; Figure 6 (b) showed the displacement error of Y-axis relative position and attitude estimation based on Kalman distributed data fusion; Figure 6 (c) showed the displacement error of Z-axis relative position and attitude estimation based on Kalman distributed data fusion. It could be seen from the figure that compared with Kalman’s centralized data fusion method, it was slightly deficient in accuracy and was also inferior to Kalman’s centralized data fusion method in stability.

Both methods can accurately estimate and control the position and posture of the robot arm. They perform well in accuracy and the error is relatively small. In terms of stability, although the distributed data fusion method is slightly inferior to the centralized data fusion method, it still maintains considerable stability in a certain range, and the average error is close to zero. This emphasizes the potential of robot arm pose estimation and control research based on machine vision in achieving accuracy and stability.

To sum up, these two methods could accurately estimate and control the position and posture of the robot arm. In terms of accuracy, the error value was relatively small. Its stability was also very considerable, which was fluctuating in a range. The mean error was close to 0.

4 Conclusions

In this paper, the position and attitude estimation methods in traditional visual servo system were briefly introduced. However, due to the uncertainty and noise of the system, a hybrid vision system based on multiple cameras and multiple sensors was adopted. The sensor fusion technology was adopted to realize the data cooperation of multiple cameras. In the case of using multiple cameras, better accuracy and robustness were obtained. The centralized and distributed fusion method was adopted for data fusion, which had good scalability and stability. Finally, the effectiveness and stability of the algorithm were verified by simulation. The work of this paper was summarized. Some problems that needed to be further studied were pointed out, and the future research directions were proposed. The fusion technology was limited to feature point estimation, and one of the improved methods was to extend the fusion algorithm to other fields. The redundant information provided by multiple cameras could improve the robustness of the system, so it was very useful to fuse in the imaging stage. By analyzing the machine vision technology, the accuracy and stability of two different robot arm pose estimation methods are compared by experiments, specifically the Kalman centralized data fusion method and the Kalman distributed data fusion method. The experimental results show that the Kalman centralized data fusion method is superior in accuracy and stability. The research in this paper provides effective methods and empirical support for improving the perception and response ability of robot arms to the external environment, and has important practical significance for the wider application of robots in industry and life.

The limitations of the control method in this paper are mainly manifested in several aspects. Firstly, although a multi-camera and multi-sensor hybrid vision system is adopted to enhance the accuracy and robustness of the system, there still exist uncertainties and noise issues, which may affect the accuracy of position and pose estimation. Secondly, the fusion technique is limited to feature point estimation and has not been extended to other areas, which may restrict the system’s understanding and response capabilities in complex environments.

To further improve the results, several suggestions can be considered. Firstly, exploring more advanced sensor fusion techniques, such as vision perception algorithms based on deep learning, to more accurately estimate the position and pose of the robotic arm. Secondly, considering the introduction of more types of sensors, such as LiDAR or Inertial Measurement Units (IMUs), to enhance the system’s environmental perception capabilities. Additionally, researching control methods based on Model Predictive Control (MPC) or reinforcement learning to further enhance the system’s robustness and responsiveness.

By continuously exploring and improving sensor fusion techniques, introducing more sensor types, and researching new control methods, the performance and stability of machine vision-based robotic arm position estimation and control systems can be further improved. This will enable better adaptation to complex industrial and living scenarios, facilitating the widespread application of robots.

Funding

Funding: The work is supported by the National Natural Science Foundation of China (No. 61774107), and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 20KJA460008).

References

Taddese Addisu

Piotr Slawinski

Pirotta

Momi

E.D.

Obstein

K.L.

Valdastri

, Enhanced real-time pose estimation for closed-loop robotic manipulation of magnetically actuated capsule endoscopes, The International Journal of Robotics Research 37(8) (2018), 890–911.

Gideon

Johnson-Roberson

, Silhonet: An rgb method for 6d object pose estimation, IEEE Robotics and Automation Letters 4(4) (2019), 3727–3734.

Xinke

Mousavian

Xiang

Xia

Bretl

Fox

, Poserbpf: A rao–blackwellized particle filter for 6-d object pose tracking, IEEE Transactions on Robotics 37(5) (2021), 1328–1342.

Fei

Selvaggio

Caldwell

D.G.

, Dexterous grasping by manipulability selection for mobile manipulator with visual guidance, IEEE Transactions on Industrial Informatics 15(2) (2018), 1202–1210.

Jessy

Zhou

Menegas

Schneider

Nath

et al Multi-animal pose estimation, identification and tracking with DeepLabCut, Nature Methods 19(4) (2022), 496–504.

Fei

Wenbin

, Positioning control of packaging manipulator based on visual feedback, Packaging Engineering Edition 40(17) (2019), 204–208.

Liu

Sun

Wang

et al Nonzero-sum game-based voltage recovery consensus optimal control for nonlinear microgrids system, IEEE Transactions on Neural Networks and Learning Systems 2022.

Liu

Zou

Sun

et al Multi-agent based consensus optimal control for microgrid with external disturbances via zero-sum game strategy, IEEE Transactions on Control of Network Systems 2022.

Liu

Sun

Wang

et al Event-based fuzzy adaptive consensus ftc for microgrids with nonlinear item via prescribed fixed-time performance, IEEE Transactions on Circuits and Systems I: Regular Papers 69(7) (2022), 2982–2993.

10.

Tanmay

Mathis

Chen

A.C.

Patel

Bethge

Mathis

M.W.

, Using DeepLabCut for 3D markerless pose estimation across species and behaviors, Nature Protocols 14(7) (2019), 2152–2176.

11.

Alexander

Mamidanna

Cury

K.M.

Abe

Murthy

V.N.

Mathis

M.W.

et al DeepLabCut: markerless pose estimation ofuser-defined body parts with deep learning, NatureNeuroscience 21(9) (2018), 1281–1289.

12.

Xiaodan

Gong

Shen

Lin

, Look into person: Joint body parsing & pose estimation network and a new benchmark, IEEE Transactions on Pattern Analysis and Machine Intelligence 41(4) (2018), 871–885.

13.

Liang

Huang

Yang

, Pose-invariant embedding for deep person re-identification, IEEE Transactions on Image Processing 28(9) (2019), 4500–4509.

14.

Fady

Abouelsoud

A.A.

Fath Elbab

A.M.R.

Ogata

, Path following algorithm for skid-steering mobile robot based on adaptive discontinuous posture control, Advanced Robotics 33(9) (2019), 439–453.

15.

Limtrakul

Arnonkijpanich

, Supervised learning based on the self-organizing maps for forward kinematic modeling of Stewart platform, Neural Comput & Applic 31 (2019), 619–635.

16.

Stauffer

Zhang

, s2Cloud: A novel cloud-based precision health system for smart and secure IoT big data harnessing, Discov Internet Things 4 (2024), 3.

17.

Abdelaziz Omar

Shady Maged

Awad

M.I.

, Towards dynamic task/posture control of a 4dof humanoid robotic arm, International Journal of Mechanical Engineering and Robotics Research 9(1) (2020), 99–105.

18.

Jing-Rong

Wang

Z-J.

Wang

Y-Y.

Wang

Q-H.

, An adaptive force and posture control strategy for automated wiring terminal assembly, International Journal of Computer Integrated Manufacturing 35(8) (2022), 873–889.

19.

Hongwei

Chang

Zhang

, Machine learning-based automatic control of tunneling posture of shield machine, Journal of Rock Mechanics and Geotechnical Engineering 14(4) (2022), 1153–1164.