Abstract
This work compares the performance of three different models of neural networks in predicting the intermediary pose of a robot end effector for visual servoing tasks. Robotic applications in complicated and complex workspaces benefit from the use of non-touching sensor technology like vision. Visual feedback control of a two camera robotic system combines the advantages of global and local visibilities of a fixed camera guiding the robot and end effector camera achieving convergence for pick and place tasks. Neural networks replace the control law for the visual guidance for targets initially not in the field of view of the eye-in-hand camera. Visual features collected by the eye-to-hand camera and the robot pose form input for the three types of networks, Multilayer Perceptron Neural Network (MLPNN), Radial Basis Function Neural Network (RBFNN) and Elman Neural Network (ENN). The robot moves to the predicted pose, favorable for switching to Image Based Visual Servoing (IBVS) limiting the number of discrete events. Simulation studies and experimentation with an ABB make robot are performed for drawing conclusions regarding the network performance.
Introduction
Vision provides a non-touching sensor technology in robotic systems for feedback and generation of control signals. The unmatched amount of information contained in an image compared to other sensors regarding the position, orientation, condition and identity of the components of the environment expands the scope of industrial automation. Simple calibration, accuracy over degrading factors like ageing or thermal wear, high speed and updated image processing techniques make them robust in such applications. Reliability in handling and assembling industrial parts demands intelligent systems utilizing soft computing techniques with ability in decision making and transfer of control. The use of camera in the feedback loop of a control system represent Visual Servoing (VS). Fixed vision sensors give eye-to-hand systems with a global view of the scene, and end effector mounted cameras yield eye-in-hand systems with local but precise visibility.
Basic strategies in visual servoing include Image Based Visual Servoing (IBVS) and Position Based Visual Servoing (PBVS). An error between the selected visual features of the scene and its desired values drives the system towards the target in IBVS. PBVS utilises pose error of the robot end effector with a geometrical model of the system under consideration providing the necessary references [1]. The former is a proven strategy for local asymptotic stability suited for small motions but suffers from the effects of local minima, camera retreat and Jacobian singularities. Coupling between translational and rotational components especially in the third axis may cause unnecessary motion due to the rendering of direct control of camera velocity. The latter strategy provides globally stable trajectories, but the tracked object can leave the field of view during servoing. Hybrid strategies and configurations in visual servoing combine the pros and cons of the core strategies in attaining convergence [2–4]. Two dimensional image data and three dimensional pose data are conveniently combined by decoupling rotation for the generation of error vector or the control law. There have been attempts to make VS efficient with numerous modifications in the control law based on the basic procedures. The ability of the system to keep the features available in the image space or joint limits provide benchmarks for stable operation. The complementary nature of IBVS and PBVS in setting limits for stability assure excellent outcomes with these hybrid technologies [5–7]. Pose controllers utilising feature information and image controllers relying on position feedbacks make robust hybrid approaches.
Switching between drivers and configurations is another way to tackle application based problems in robot manipulation using visual servoing. Control scheme switching avoids local minima, overcomes joint space singularities and keeps the object in the field of view while maintaining joint limit criteria [8]. The change in control depends on the benchmark set by the norm of error in IBVS or PBVS causing discrete events in the system more than once. Decomposition of the homography matrix between current and desired views enables the extraction of rotation from camera motion. A discontinuity at the borders leads to more time for convergence. Assuming the performance of IBVS and PBVS to be excellent over a given subspace of input domain, the control can change between these methods [9–12]. They share or combine the duration, space or configurations of regions of servoing. Even the transfer of control or cooperation is possible between an eye-in-hand camera and an eye-to-hand camera [13, 14] considering their local and global visibility. The switching approaches select their input depending on the visibility, stability or error derived from the possible outcomes of the available sensors. Sharing workspaces, parallel operation, multiple targets and visibility constraints suggest supervisory control modes in visual servoing [15]. Visual guidance assists visual servoing for targets not visible for the eye-in-hand camera initially. Condition number of the image Jacobian can provide a criterion for switching the control.
Incorporating elements of artificial intelligence for industrial automation in the field of engineering accommodates complexity in modelling. Prediction from a set of input data can help a system overcome regions of uncertainty and nonlinearity rather than going for rational approaches. Artificial Neural Networks (ANN) can provide excellent approximation in overcoming ambiguity which may otherwise lead to failure of servoing [16, 17]. Multilayered perceptron neural networks with sigmoid transfer function and radial basis function neural networks are proven structures for universal approximation. For MLPNN the performance of the network depends on the number of hidden layers, its size and the type of activation function. A feedforward topology having one layer of hidden neurons with any arbitrary activation function can form a disjoint decision region [18]. As the number of layer increases, the time for training increases, however RBF networks have comparatively fast linear learning algorithms for complex and nonlinear mapping. They find applications not only in classification, prediction and approximation but also in interpolation and differential equations [19]. Extra neurons called context neurons added in the hidden layer for incorporating a sense of memory for the past set of training give recurrent neural networks. They are useful mainly in statistical language modelling [20] but are not restricted to optimization problems.
Robotic control models involve forward and inverse kinematic mapping as well as sensor feedback requiring nonlinear transformations. The continuous feedback provided by visual servo systems increase the flexibility of robotic manipulator motion when combined with the adaptation capability of neural networks. The applications of heuristic methods like ANN in visual servoing are numerous owing to their ability to learn from experience, nonlinear mapping and fast data processing due to the great parallel architecture. They find applications in approximating the image Jacobian [21], coping with calibration errors and geometry changes [22] or in prediction servoing [23] to use predicted pose as the desired pose and in solving inter related visual Jacobian [24]. Neural networks learn the control in an unknown environment [25] with visual servoing and derive the joint angles without camera calibration or robot kinematics from image Jacobian with genetic algorithm approximating the initial weights and thresholds. Even neural networks provide the basis for the system performance in behavioral manner for a cumulative reward [26] with deep learning or extreme machine learning capacity [27]. Data driven controls can account for the data loss in networks due to time delay and/or packet dropouts [28]. When the discrete time events due to switching strategies are modelled, piecewise control of intermediate zones is essential. Such approaches [29] are characterized by the division of the state input space into subspaces with affine state-update equations.
This paper focusses on the modelling of a nonlinear system where the control gets transferred from a global controller to a local controller. The overall control domain can be divided into regions where different controllers perform better than the other. A supervisory controller can converge to the goal in a coarse manner while a dedicated controller accomplishes the task in a fine manner. The transfer of control is a discrete event depending on certain criterion derived from the state of the system. Trained neural networks can be used to transfer the control directly to the local controller such that the aforesaid criterion is met. This can be applied to a two-camera robotic system in assisting visual servoing. The master camera should guide the robot (supervisory mode) to a pose such that end effector camera (local control mode) carry over the task for accurate pick and place applications. Trained ANN can predict an intermediate pose for the end effector ensuring convergence by the eye-in-hand camera. They receive necessary input data from the master camera and the robot controller. This paper aims at modelling a two-camera robotic system and comparing the performance of three different types of neural networks used for prediction. The simulation and experimentation of machine vision [30] are done with an ABB make 6 Degrees of Freedom (DOF) industrial manipulator [31]. The following sections will explain the task, neural network implementation, simulation and experimental evaluation followed by discussions and conclusion.
System components and problem formulation
A 6 DOF industrial robot needs to identify the target and follow it for a pick and place application through visual guidance and feedback. The robot used is ABB IRB 1200, suitable for applications like machine tending, parts assembly and material handling. It can support a payload of 7 kg with a reach of 703 mm and position repeatability accuracy of 0.02 mm. The robot has a pneumatic gripper attached to the end effector for grasping the objects. A master camera (Logitech C170, resolution 768×1024, focal length 2.3 mm) is fixed such that it can see the robot as well as its workspace. Another camera (Logitech C270, resolution 1280×960, focal length 4mm) is attached to the end effector. Calibration of the camera is necessary as the radial and tangential distortion affect the performance.
The operating system, memory and software de-notes the control module for the robot controller. The drive module has the necessary power electronics for motor operations. These functions are available with IRC 5 (Industrial Robot Controller) which needs to be programmed in RAPID, either through RobotStudio® or Flex-pendant. RAPID is the programming language for ABB robots and RobotStudio® is a Graphical User Interface (GUI) enabling the configuration, modelling and off-line programming of ABB robots. Work cells are created by incorporating components available in the library or importing CAD models. A Virtual Controller (VC) allows the programmer to validate his programs and test the created paths for reachability. This also helps in debugging a program before being uploaded to the teach pendant. Direct programming is done through the Flex-pendant by adding instructions in RAPID.
Image processing applications and computations are done by MATLAB® which also allows the simulation study of the robot structure with RVC (Robotics Vision and Control) toolbox. MATLAB® programs on the processor and RAPID programs on the controller run parallel for the visual feedback of the robot. Communication between an external client like MATLAB® and the robot depends on the available options. FTP (File Transfer Protocol) server on the IRC 5 modifies the Transmission Control Protocol/ Internet Protocol (TCPIP) address of the computer and respond to the requests from the client. Information written in files enables the communication between IRC5 and MATLAB®. Figure 1 shows the components of the system and its interconnections.

System components.
The target consists of a rectangular block with distinguishable point features. The target is identified and its bounding box edges provide necessary features for the control law. The robot has to reach the target from any initial position. This includes conditions where the eye-in-hand camera does not see the target, when the fixed camera guides the robot. The problem is modelled as a supervisory control scheme with discrete events. Figure 2 shows the experimental station as seen from the master camera. Initially the target is not visible for the eye-in-hand camera (a), but the final pose (b) allows the gripper to grab the object. Hence the task consists of identifying and picking a predefined object with the cooperation of two cameras, one fixed and the other attached to the end effector.

Experimental station.
When the target is in the field of view of the eye-in-hand camera, there is no role for the master camera. The control law is derived in the image domain. Perspective projection of a world point, P (x, y, z) computes its 3D to 2D transformation giving p (x, y) in the image plane where x = x/z and y = y/z for its normalised coordinates. The pixel coordinates in the camera frame are dependent on the camera parameters focal length (f), principal point (c
u
, c
v
) and pixel dimensions ρ
u
and ρ
v
along u, v directions and are given by
The variation of P with respect to camera is
Equation (1) is of the form
A control law is required to move the camera in such a direction that the error e
s
= sp(x,y) - s* in image features is reduced. For a point feature p (x, y) under consideration sp(x,y) represent the current features and s* its desired values. If the target is predefined, the desired values can be predicted or estimated through simple geometric operations. From equation (1), allocating a control gain of λ
Certain robot poses do not provide the target view for deriving the control law. The master camera having the view of the robot and its workspace must guide the robot. A knowledge of the robot pose and target pose is necessary to move the end effector so that the eye-in-hand camera can identify the features. Robot manipulator pose is estimated either from the controller data or by the master camera through pose estimation algorithms.
A homogeneous transformation matrix,
A control law tries to decrement the error in pose e
p
= ΔT
e
such that end effector move towards the target. The translational and angular components of the vector ΔT
e
can be stacked to form a vector d so that
During the movement as in equation (3), the eye-in-hand camera can see and identify the target so that the control switches to (2). Hence the local visibility of the manipulator camera can be combined with the global sight of the master camera.
A criterion regarding the norm of errors in the two regions of control can decide appropriate switching. However, even when the error is within specified range, the translational and angular components of the output variable camera velocity are prone to interaction between each other resulting in abnormal camera motion. The multivariable nature of the control system demands a qualitative approach. For any control system with a transfer function G (s), input
The matrix is said to be well conditioned if the CN is not too big, the best condition being given by CN = 1. When the camera approaches the target the features move away from each other in the image plane, making the system better conditioned. The switching control law is summarized in Figure 3 where the system condition determines the type of control law.

Control system.
The control strategy is designed to converge to its final stage with the eye-in-hand camera, encouraged by the precise sight and local asymptotic stability. If the features leave the field of view, the servoing may fail, else it should switch back to the supervisory control. The image Jacobian is analysed to evolve a qualitative approach for switching. If the switching occurs in the minimum neighborhood of the target ensuring the stability and convergence of the strategy, multiple switching can be avoided. Switching between control strategies having different magnitudes of error can affect the performance of the system The role of neural network is to predict such an intermediary pose for any initial state of visibility of the target for the eye-in-hand camera.
ANN, inspired by biological neural networks, is an adaptive information processing system with parallel connected structure. The general topology consists of an input layer followed by one or more hidden layers and the output layer. An activation function is responsible for a nonlinear decision boundary through nonlinear combinations of weighted inputs to form the output. ‘Logistic’, ‘hyperbolic tangent’ and ‘linear’ transfer functions are commonly used. After fixing the network structure, it needs to be trained for adapting the weights. The learning can be effected with supervised learning by providing the input and desired output or by unsupervised/adaptive learning where only inputs are provided.
The input and output vectors determine the number of neurons in the corresponding layers while the type and nature of application fix the hidden layers and its size. One hidden layer is sufficient to tackle a general problem if there is no function discontinuity or model uncertainty. Training is burdensome with more layers. Insufficient number of neurons in the hidden layer will result in poor fit and too many neurons will cause over fitting of the data. The optimal value mostly lies between the number of neurons in the input and output layers.
Multi-layered perceptron neural network
MLP neural networks find applications in most of the engineering research fields and are configured in feedforward topology starting with an input layer, one or more hidden layers and finishing with an output layer. The number of neurons in the input and output layers depends on the number of corresponding variables while the number of hidden layers and its size can vary based on the application and will modify the performance. The network is usually trained by standard backpropagation algorithm by supervised learning. The weighted sum of input data along with bias applied will pass through the transfer function to produce the output. Parameters like the number of hidden layers, its size, type of activation function and the training algorithm or method can modify the performance of an MLP network for a given set of inputs and samples.
This study will include two MLP networks with one hidden layer having 8 and 26 hidden neurons for comparison with RBF network and Elman network. The input layer consists of thirteen neurons receiving data from the master camera and the robot controller consisting of visual features and end effector pose along with a scalar representing the radial distance of an imaginary space assumed to favour the convergence of visual servoing. The output layer consists of 6 neurons predicting the components of the intermediary pose to which the manipulator must move. Training continued till the criterion set regarding the mean square error was met. The data allocated for training is 70% while 15% each is used for testing and validation. The input and hidden layer utilized ‘tansig’ transfer function while the output layer carries linear activation function. The two networks under consideration differ only in the number of neurons in the hidden layer.
Table 1 gives a comparative tabulation of Mean Square Error (MSE), and Condition Number(CN) against the number of hidden layer neurons and proximity. Proximity refers to the distance assigned for the intermediate pose from the target. MSE is the error recorded during training phase of the network. CN is the condition number of the image Jacobian matrix at the time of switching to IBVS. Condition number indicates how well the image Jacobian is conditioned when it undergoes image based visual servoing, a lower value referring to good condition. Generally, for MLP applications, MSE during the training can be chosen as a criterion for selection of the number of hidden layer neurons. In this case, the performance of the overall system depends on the condition of the Jacobian matrix. Hence condition number at the time of switching is also taken into consideration. In terms of MSE, the MLP network1 with 8 neurons performed well on most occasions. Similarly, the MLP network 2 with 26 neurons gave fairly conditioned matrix most number of times. The data in boldface correspond to these networks and were chosen for comparison with RBF and ENN in the study.
MLP training-mean square error (MSE) and condition number (CN)
MLP training-mean square error (MSE) and condition number (CN)
The universal approximation potential of RBF networks makes it a useful choice for many science and engineering applications including classification problems and time series prediction. In configuration, it is more rustic than MLP and possesses three layers. Data feed from the environment is facilitated by the input layer through source nodes while the hidden layer transforms the input space to hidden space enabling the solution to the problem. Output layer is responsible for the output which is a linear combination of the scaled hidden layer responses. Unlike the MLP networks, the weights appear in between the last two layers only.
A vector termed centre and a scalar termed width determine the hidden layer parameters and the smoothening properties of the interpolation function. The supervised learning using least mean square algorithm is fast and determines the value of weights necessary to transform the hidden layer response to final output. The radial basis function neural network structure used in the study has Gaussian density function as the activation function. The input and output structures are similar to MLP network. The model is treated as a discrete set of variables to evolve a classification problem rather than treating as a continuous function.
Elman neural network
A sense of memory in the neural network can make the network more similar to the human brain as in the case of Recurrent Neural Networks (RNN). The context layer in RNN provides at least one feedback between layers enabling it to do temporal processing. These are best suited in language modelling algorithms where learning sequences are necessary but have proven capacity in time series prediction problems. Learning is achieved by gradient descent procedures as in MLP networks. The network has a strong capacity for non-linear mapping with a similar structure as MLP with additional loops incorporating feedback in time to imitate cognizance of what had happened in the past.
The backpropagation algorithm extends to backpropagation through time for a recurrent network. This provides scope for associative memory in engineering applications. A simple form of recurrent type network, Elman Neural Network (ENN) has its context layer neurons fed back from the hidden layer. Figure 4 shows the structure of ENN used in the problem.

Structure of Elman neural network.
The performance of the neural networks depends not only on the design parameters but also the training attributes including supervision and algorithm. A robust set of input-target combinations gives a good data base for ANN training. Modelling the system with its geometrical configurations mathematically provide a means of generating data from measurable physical parameters. Experimenting with the real time system contributes further to the data set by incorporating possibilities arising from situations leading to multiple solutions and similar input data. The output of the network is a vector comprising the components of robot pose ensuring the visibility of the target. The information consists of visual data acquired by the master camera and current robot pose obtained from the robot controller. It also contains a distance factor assigned to denote the proximity of the predicted pose to the target which represents the size of the imaginary neighborhood of the target where IBVS shall converge. 70% percentage of the available data is allocated for training while 15% each is dedicated for validation and testing.
MLP with 8 and 26 neurons respectively in the only hidden layer represent network 1 and network 2 while network 3 and network 4 stand for RBFN and ENN respectively. Figure 5 shows the regression plot for all the four networks under consideration after training them and account for the regression between outputs and targets of the data set. As the regression approaches 1, the relationship becomes linear. Values above 0.9 itself give satisfactory performance, and as the value decreases, the relationship becomes nonlinear. Regarding data fitting, all the networks have similar results, except for RBF where the data scatters more. The regression value for MLP networks MLP 1 and MLP 2 considerably increased from 0.7813 and 0.76781 to 0.99769 and 0.9991 respectively when the size of the data set was enhanced from 700 to 1000. The plots in Figure 6 show that errors in data fitting concentrate mostly around zero. RBF network training shows particular significance while all the other networks have similar performances with an even distribution of the available data.

Regression plot for selected network topologies during training: (a) Network1-MLP, (b) Network2-MLP, (c) Network3-RBF, (d) Network4-ENN.

Error histogram after training: (a) Network1-MLP, (b) Network2-MLP, (c) Network3-RBF, (d) Network4-ENN.
Robotics, Vision and Control toolbox with MATLAB® platform provides a simulation tool for visual servoing applications and modelling of robot structures. RobotStudio offers offline simulation and testing of ABB make robots and work cells and the virtual controller confirms the feasibility of the path generated before applying to real time systems. The supervisory control explained in section 2 allows the co-operation of two cameras for achieving visual servoing by a transfer of control after being placed in a pose such that features are visible for the eye-in-hand camera. Simulation studies were conducted and were experimentally validated on the ABB robot IRB 1200. Real time experiments are generally affected by not only the speed of image processing algorithms but also the transfer of data between the PC and robot. The target is a rectangular block with visible markings so that the camera can identify them as feature points. The bounding box containing the target is extracted using standard image processing techniques and its edges serve as the features under consideration. The Speeded Up Robust Features (SURF) in MATLAB® provide great results especially when the scene is cluttered. The reference signals are derived from projecting the target to a plane parallel to the camera view for known shapes.
For certain configurations, a lower value of condition number does not ensure successful servoing by the eye in hand camera. Yielding back the control to the master camera can prevent failure of the strategy. This may lead to multiple switches affecting the stability of the control system with regard to the performance of the robot in complex and hazardous workspaces. The change in velocity from a controlling error of few meters to pixels of few hundreds can cause jerky motion also. Neural networks predict the pose favorable for switching the control to the end effector camera. Figure 7 shows the variation of norm of error on a logarithmic scale plotted against time for a target not initially visible for the eye-in-hand camera.

Norm of error during servoing.
As the robot moves towards the predicted pose, the features fall in the camera field of view and the control can be transferred. Data tips hold time and error for all the four methods at the time of convergence. Error at the time of switching is very high for ENN compared with other approaches. As the norm of error at the time of switching decrease the change in velocity is moderate. The network 3 with RBF reached the target faster.
With adequate training data, the overall computation complexity of the system reduces. When the features are not visible for the eye-in-hand camera, the supervisory control provides incremental change in the pose till the system switches to IBVS. The neural networks, through prediction places the robot in a pose suitable for the control by the end effector camera in a single step. This results in great saving in computation time in simulation as well as experimentation. Real time experiments are further delayed by the speed of image acquisition and processing, transfer rate of data between the PC and the robot and the processor speed as well as the quality of image data. The efficiency of implementation hence improves with accuracy of prediction and proximity attained in single step switching.
Figure 8 shows the components of camera trajectory along X,Y and Z directions. Data tips show the time at which the strategy switches to IBVS and the corresponding X component of pose (fig 8a). Maximum change in pose is observed for MLP1 which is indicative of the high jerk or change in velocity during switching. Figure 9 shows the components of orientation vector of camera pose along X, Y and Z axes. The four networks produce similar trajectory, RBF showing minimum deviation.

Components of camera pose (meters) during servoing along (a) X axis, (b) Y axis and (c) Z axis.

Components of camera orientation (radians) during servoing along (a) X axis, (b) Y axis and (c) Z axis.
Table 2 summarises the performance of all 4 networks under consideration for a given initial position and common target. Different sets of readings were taken by varying the proximity assigned for the intermediary pose from 40 to 160cm. The condition number is calculated from the image Jacobian obtained from the eye-in-hand camera output at the time of switching. Lower condition number indicates clarity of features in the acquired image. The norm of error in pose and norm of image error in pixels are tabulated for a comparative analysis. The larger their difference, the more is the change in velocity. Even though the table has only seven entries with same target and initial condition, the overall performance of the networks was similar under different conditions used in the study.
Performance of the networks
MLP networks are proven structures with continuity in features and output suited for regression, classification and function approximation. Comparing the MLP networks, MLP2 with 26 neurons in the hidden layer gave reliable results than with MLP1 having eight neurons. As seen from the table, variation of CN is less for MLP2. CN shows the effectiveness of the strategy at the time of switching by illustrating how well the system matrix is conditioned. The improvement in condition number as the neighborhood increases is equivalent to assigning different condition numbers for switching in the case of supervisory control. CN improves as the predicted pose approaches the target. Proximity affects the switching characteristics of the system and decides the velocity of the end effector.
RBF networks provide global approximation to the function which can be treated as a linear combination of local nonlinear functions. Each range of data can have different approximations suitable for that region. Unlike the training phase, RBF network provided a consistent output. The input variable, proximity did not affect the output which can be viewed as the range of input data set where RBF has local approximation. The accuracy is very much dependent on the range and quality of the training data. Even though the CN didn’t reach the best value in the table or the switching velocity is not the minimum, the output is consistent and reliable.
ENN with its context neurons imparting storage capacity for ancillary memory are used in many robotic applications. However, for the given problem, it has inferior performance over the given spectrum of input space. When the proximity increased, ENN failed to give a valid output pose which allows the camera to collect the features. Even though the condition number improved with nearer poses, large pixel error caused high velocity at switching degrading its performance.
The results show that modelling the system piecewise and using neural networks for overcoming the regions of ambiguity and decision making allows the stable operation of the system. The type of network, its structure and suitability vary with application. In practice, collection of adequate data is essential for the successful induction of the neural networks. This affects the way the network fits for a task. Similar targets and parallel operation can make the system, even more complex. Execution time of a pick and place task is affected by the speeds of the processor, image processing algorithm and data communication between the processor and robot controller. In the experiments, the sluggishness of the system is caused mainly by the image processing, even though the tasks are accomplished successfully.
Neural networks can impart some degree of intelligence in automation and robotic systems, the performance being dependent on the type of system and its characteristics. This work aimed at comparing the performance of three types of networks namely MLP, RBF, and ENN in assisting visual servoing through prediction. A two-camera robotic system has to identify and grasp a target for pick and place applications. A master camera guides the end effector for targets not in the field of view of the eye-in-hand camera. The supervisory action is successful, but the system is prone to discrete events due to multiple switches in control. Neural networks replace this region of ambiguity by predicting a pose favorable for the end effector camera. Proper replacement effects a switching in control ensuring not only convergence but also a smooth transition in velocity. Prediction by neural networks has better performance than supervisory control by reducing the computation complexity due to image processing and data transfer between the PC and the robot, thus improving the efficiency of hardware implementation.
The study involved three types of the neural networks with different structures. The kind of activation function, training method and the number of hidden layers and its size can affect the performance of MLP networks. Though this study concentrated on three topologies, a comparison was made regarding hidden layer neuron size to show its effect on the output. During the training, RBF showed a scatter of the regression plot compared with MLP and ENN, even though it had the better performance concerning training time. RBF is known to respond to small region of input space compared with MLP, which has continuous output for continuous features. ENN provided a feedback network compared to the other two feedforward networks with a proven trait of associative memory. However, it failed to render reliable output in terms of a smooth velocity at switching or consistency. The input variable proximity affected neither the predicted pose, nor the condition number of the system at switching compared with MLP and ENN. Even though the online response of the MLP is better than RBF, the piecewise nonlinear capability of RBF makes it a better choice for the given application.
This paper is motivated by the enhancement in system performance with the inclusion of local controllers. The system switches from supervisory mode to an affine control depending on the system condition. The convergence gains pace with the prediction of a possible and favorable future state which is implemented on the two-camera robotic system with ANN. Considering the loss of reliability due to possibility of multiple switches in control and uncertainty during switching, prediction with neural networks gives an intelligent outlook to the robotic system. Future work will search for the best replacement network from the literature. As parallel operations with other robots and humans increase the complexity of the system, reinforcement learning or deep learning can be adopted. The reliability of such a system will depend on its performance in the network where the loss of data due to packet dropouts, time delays and quantization will be critical. The switching surfaces may be modelled as a general nonlinear system based on piecewise dynamic models and relevant controllers may be assigned.
