Abstract
People can rely on complex tactile systems to perceive the basic physical properties of an object, and take the corresponding action and pressure to grab the object. However, this task is not easy for mechanical manipulator. The feedback of most manipulators is limited to pressure, it is difficult for manipulators to identify the other properties of the object by touch. A new manipulator method is designed in this paper. The tactile images are reflected by flexible tactile sensor in this manipulator system. A large number of tactile images are used to build training set by which the capsule network is trained to evaluate images. When an object is captured by a manipulator, and parameters such as shape, rigidity, weight and density of target objects are analyzed. Pseudo color transformation of tactile images is performed, in this way, the changes of each parameter of tactile signals can be observed visibly. It is innovative to process the tactile image by using capsule network and realize the evolutionary bionic tactile function.
Introduction
Bionics is a subject established to simulate the functions and behavior of a biological system [1], and it is used to build a technical system. A manipulator is one of bionic application. Manipulator is a mechanical device that simulates certain movements of human hands and arms, which can be used to move objects or operate tools. Manipulator has been applied to many fields to take place of human to complete a variety of actions and improve the quality of our life [2]. The earliest manipulator was used in the industrial field, especially heavy labor and dangerous environment. Therefore, it is widely used in mechanical manufacturing, metallurgy, light industry, and atomic energy industry. With the development of science and technology, the flexibility and durability of manipulator is superior to that of humans. It has advantage of repeating the same mechanical movement for a long time without fatigue. However, the existing manipulator is not better than the human hand in certain circumstances. Human fingers are so sensitive that they can sense vibrations with minimum amplitude of 0.00002 mm. The tactile nerves are distributed in the hand, which can feel the ambient temperature and material quality. The traditional manipulator is bionic mechanical device that only simulates simple movements of hands with pressure feedback at most. It can only sense the force of the interactive object, while cannot sense the tactile information on the surface, such as vibration, roughness and texture [3].
In this paper, a newly-designed manipulator is introduced with the improvement in tactile information processing and the mechanical structure. The tactile images are reflected by flexible tactile sensor in this manipulator system. The images are used to make training and test set after being processed by MATLAB. The capsule network is trained to evaluate the images based on the training and test set. When an object is captured by our manipulator, and parameters such as shape, rigidity, weight and density of target objects are analyzed. Pseudo color transformation of tactile images is performed. In this way, the changes of each tactile signal parameter can be observed visibly. It is innovative to process the tactile image by using capsule network and realize the evolutionary bionic tactile function. In the future, the study can be applied to the fields of fine processing, such as medical, biological, military, and so on.
Section 2 introduces related work about the design of manipulator, the development of new sensors and the development of neural network technology in recent years. Section 3 introduces the system framework of the manipulator, including the multi-vision system and Gesture capture module, the image preprocessing algorithm, The tactile image and Pseudo color enhancement, and the tactile feature perception method-capsule network. Section 4 introduces experimental simulation of capsule network, Sections 5 and 6 introduce the discussion and summary.
Related work
There are two development difficulties in the field of manipulator: 1. Design of related tactile sensor. 2. Implementation of tactile algorithm. Following is an introduction to the related research progress in these two fields.
In human anatomy, the tactile nervous system is an organ to form sense. Similarly, the tactile sensor is a device to capture tactile information [4]. In the field of robot research, there are flexible tactile sensors [5], array tactile sensors and artificial skin, however, they can’t feel certain properties of the objects they’re grabbing, such as shapes, textures, and softness. These are still common technical challenges in robotics [6]. Guo et al. proposed a flexible fabric pressure sensor [7], The sensors use silk screen printing process, using fabric as a flexible basement, the sensor unit is manufactured by carbon dielectric and organosilicon conductive silver gel, the measurement range is 0 ∼ 700 kPa, the hysteresis error is 5.6%, the dynamic response time is 89 MS, and the sensitivity is 0.02536 %/kPa. Wang et al. has developed a self-generating electricity intelligent tactile sensor based on dual mode tribological nanoscale generator. It can obtain the current (more than 0.4 mA) of the six typical materials, even the pressure is as small as100 pa. This sensor can not only detect tiny pressure, but also feel the hardness of material by fluctuation the peak value of current. It is possible to design artificial skin in the future [8]. Qian designed a 12-DOF bionic manipulator based on multi-sensor data fusion technology. The manipulator uses polyvinylidene fluoride (PVDF) sensor to collect the charge signals of 4 fingers, extract useful signals and send them to the controller, and fuse the touch, slip, thermal sensation and the state information of the objects. It gets further action instructions and adjust movement of grip [9]. CHOI et al. Proposed a fast and accurate level control method for contact details in the tactile collision detection, the concrete method is to define a special surrounding volume data structure to include all the grid information in the body, and to realize the rapid detection of ball collision detection by surrounding the sphere [8]. Yule et al. proposed the recognition method of soft and hard objects from tactile perception based on the convolution neural network. It uses thin film pressure sensor to collect hardness data of objects, set up training and test data sets, train network in Caffe to simulate tactile recognition of soft and hard objects [10].
Artificial neural network (ANN), it is a kind of a mathematical model in the field of machine learning and cognitive science. Using neural network algorithm to process tactile images is more flexible than traditional algorithm. In the field of image classification, the accuracy of neural network algorithm has exceeded the average level of human beings. ANN simulates the structure and function of biological neural networks for optimizing functions and evaluating functions. ANN is calculated by a large number of artificial neuron connections. In most cases, ANN can change the internal structure based on external information. It is an adaptive system, which has learning function. Modern neural network is a non-linear statistical data modeling tool. Like other machine learning methods, neural networks have been used to solve various problems, such as machine vision and speech recognition. These problems are difficult to be solved by traditional rule-based programming. In the 1980s, LeNet-5 laid the foundation of modern neural networks. However, due to the computer’s computing power, the neural network has not attracted attention. After 2006, with the improvement of machine learning [11] theory, especially the emergence of layer-by-layer learning and parameter fine-tuning technology, convolutional neural network (CNN) began to develop rapidly and its structure is deeper than before. Various learning and optimization theories were introduced. Since AlexNet [12] in 2012, various structures of CNN have been the winners of ImageNet’s large-scale visual recognition contest (ILSVRC), including ZFNet in 2013, VGGNet [13] in 2014, GoogLeNet [14] and ResNet [15] in 2015.
CNN is very suitable for dealing with machine vision problems such as image target detection, image semantic segmentation, image classification. In this paper, neural networks are introduced into the field of robotic tactile perception. Bionic manipulator grasps different objects and generates tactile grayscale images of various objects. physical properties of the target objects are marked in the image. Then, we make 10000 sufficient tactile image datasets for learning of the capsule network.
System framework
In this system, Multi-vision module are used to identify target objects in three dimensions, and it analyzes the basic properties (size, location) of objects. When the manipulator grasps the object, the action of the manipulator can be captured by the inertial sensor located at the various parts of the manipulator. At the same time, cameras shots the manipulator from the different directions, and the capture-result of the inertial sensor corrected by Multi-vision module. In the process, 3D reconstruction technology was used to capture the gesture and motion of the manipulator. The sensor array on the surface of the manipulator converts tactile signal into measurable values such as current and voltage. The parameters are stored in matrix after signal processing and the matrix is reflected to grayscale image, then it was processed by capsule neural network. In this way, physical properties of the object were analyzed by the processor, and the system make precise feedback on the manipulator; At the same time, the grays-scale image can be further transformed into a pseudo color image, which is convenient for the operator to analyze the state of the manipulator. The system framework is shown in Fig. 1.

System framework.
Before grabbing objects, cameras capture objects’ motion from different directions [16]. The system reconstructs their shape by 3D reconstruction algorithm, so that the manipulator can execute the grasping action accurately. When the manipulator grabs the target object, the markers on the surface of the manipulator was captured by multi-vision module. The posture and action information can be got by this module, and the multi-vision module feedback it to the processor for further movement.
If we want to use image sequences to reconstruct 3D objects. First, we restore the 3D information of points, this is a classical problem of photogrammetry and computer vision [17]. Three target fixed technique is based on multi-vision location algorithm, 3D point coordinate on the plane template is expressed as M = [x, y, z]
T
, the corresponding 2D point coordinate is expressed as m = [u, v]
T
For the camera i, the relationship between the 3D coordinate point x on the template plane and the plane coordinate point ui of its projection graphics is as follows:
The multi-vision system [18] contains three or more than three cameras. By shooting the same object from different angles, the 3D coordinates of the object are calculated based on the difference of pixel coordinates of objects in their respective cameras. Arbitrarily two cameras were selected to constitute binocular stereo vision, while the remaining cameras can provide additional information for stereo matching, so that the system can reduce the error of binocular stereo vision. When a camera is blocked, it can still look for the unshaded points and calculate the 3D coordinates of the target point, as shown in Fig. 2.

Multi-vision model.
When the evolutionary bionic manipulator is working, it needs to feedback the current action posture to the system in real time, so that the system executes next movement. The commonly way about motion posture capture modes are based on inertial sensor or capture and optical capture. The cost of inertial sensors is low, it can measure the angular velocity of the object, and get the rotation angle of the object’s current frame relative to the reference frame by integral calculation. Due to the sensor noise and error, there will be a larger data drift after the integral calculation [19]. Images were shot by Multi-vision which may be blocked by mechanical structures in some certain circumstances. In response to this problem, the advantages of optical capture and inertial sensor capture are combined in the system. The capture of inertial sensors is the primary capture strategy, while the optical capture is used as a scheme to correct the error generated by the inertial sensor.
The section about optical capture is described in Section 2.1. Followed by an explanation of inertial sensors, In the model of gyroscope sensor, it is considered that its high frequency noise, index correlation drift and constant numerical drift. Its mathematical model is as follow [20].
ω is the coordinate value of the gyroscope’s output angle velocity in the manipulator system.
d is the coordinate value of the rotational speed of the manipulator relative to the inertial space. It is an exponential correlation in the drift of a gyroscope, usually described as a First-order Markov process.
b is the normal part of gyro drift.
n
g
is the measurement noise [18].
Intelligent tactile sensor based on dual mode tribological nanoscale generator. It can obtain the current (more than 0.4 mA) of the six typical materials, even the pressure is as small as 100 pa. Therefore, this complementary design enables sensors to obtain complex information from external objects. Its range of detection pressure can be from 40–140N and the correlation coefficient is up to 0.98. When the dynamic force acting time changes from 0.4s to 0.8s and the pressure changes from 55N to 38N, it can still show a good dynamic pressure response. The plane resolution of intelligent tactile sensor can reach 2 mm. It is a good choice as an evolutionary bionic manipulator device [8].
The bionic manipulator consists of nine mechanical fingers, each with three degrees of freedom. Each of them is defined as A to I, respectively [21]. The surface of the it is equipped with a tactile sensor. It produces a signal when the object is grasped. The sensors are not arranged at the joint of the manipulator, so that it avoids the interference effect caused by the deformation of the joint. In the case of the finger A, the signal generated by each tactile sensor unit is filtered, then it is amplified after A / D conversion. The resulting data is stored in a matrix (180 × 180).
Combining the A-I matrices, the sensor data for the bionic manipulator can be obtained initially.
To determine the gray value of each point, we convert voltage U
x
into the form of the gray value.
In the actual environment. Due to mechanical jitter and signal error and other uncontrollable factors, there is interference signal in grayscale. In order to optimize the image quality, highlight the useful information in the image. We need to sharpen the image. This paper adopts Laplacian sharpening coding method to realize sharpening. Finally, edge detection of image is needed to highlight the interface between manipulator and target object, the result is as shown in Fig. 3

Image edge detection.
When the evolutionary bionic manipulator grasps an object, it generates the tactile image as shown above. Tactile sensor unit feedback pressure value p
ij
, Using image edge processing technique to distinguish the contact surface in the graph s
ij
, the pressure value of the contact surface.
The finger of an evolutionary bionic manipulator moves at a speed of V (constant value). During the process of grasping the object by the bionics evolution robot, it realizes the judgment of the object’s hardware and software.
The tactile image generated by the evolutionary bionic manipulator is more suitable for machine learning. However, it is difficult for people to distinguish such a large range of gray scales, humans’ visual system only distinguishes about 20 different levels of grayscale, but humans can distinguish thousands of colors. Color can cause more intense visual response to human brains. In the evolutionary manipulator system, pseudo-color processing is a very important enhancement method. The processed image can highlight the target, rich color details and high background contrast, which greatly improves the human eye’s ability of recognizing the details of the tactile image. The grayscale image is processed by pseudo color enhancement technology, which enables operators to observe the state of the evolutionary bionic robot and the change of tactile signal intuitively. Gray level-color transform method for pseudo-color transform is a common way of pseudo-color transform. Its principle is to send the gray values of the original grayscale to the R, G and B converters of different transformation rules, and encode them to generate 3 components of RGB, then mix the three primary colors of R, G and B through different components [22].
Through pseudo color enhancement technology, tactile image is transformed into a pseudo color image, so that the changes of each parameter of tactile signals can be observed visibly, the result is as shown in the right of Fig. 4.

Pseudo-color enhancement.
Geoffrey Everest Hinton [23] is one of the scholars who introduced reverse propagation algorithm into multi-layer neural network training. And he proposed the capsule network in 2017. Capsule network (Caps Net) is a new structure of neural networks, which achieves top performance on the MNIST dataset. It has better recognition of overlapping images than that of CNN. The input activity vector represents the instantiation parameter of entity type. The length of active vector is used to characterize the probability of entity existence, and vector direction represents instantiation parameter. [23]
Because the manipulator can grasp the target object from different directions, the direction of the touch pressure diagram is uncertain, the capsule network can handle the random graph from different direction very well. The capsule network is used to deal with the datasets of the tactile images. Finally, the characteristics are extracted. In the later period, the tactile image can be well analyzed. Capsule Network structure is shown in Fig. 5.

Capsule network structure.
10000 tactile images were simulated by MATLAB, 7000 images were used as training set, 2000 images were used as cross training set, and 1000 images were used as validation set. Fig. 6 is a sample of datasets. The comparative experiments of SVM and ResNet were carried out respectively.

Test data.
The simulation is based on the MATLAB 2014a modeling and development environment. Machine learning development framework is TensorFlow 1.6.0. The experimental platform is Intel i9-7900x processor, NVIDIA TITAN V 12G GDDR5 graphics card, 64G DDR4 memory. The operating system is windows 10 Enterprise Edition. Program development language is Python 3.5. The grayscale images produced by simulating the mechanical hand grasping the plastic cylinder (5cm×5cm×5cm). The original picture resolution is (180 × 180).
TensorFlow is used to build the framework of capsule neural network. The parameters of the framework are shown in Table 1. A comparative experiment of three different algorithms based on SVM, ResNet, Capsule Network is designed. The following is the explaining about the structure of capsule network.
Parameters of CapsNet
The capsule network is composed of two convolution layers and a fully connected layer. The first convolution layer is a conventional convolution layer, which plays the role of local-level pixel detection to detect the touch range. The second convolution layer is the primary capsule layer, which can be seen as an eight-layer general convolutional stack with a total of 32 channels. The primary capsule extracts the characteristics of the tactile image and reconstruct it into a vector. It is a 5-dimensional vector. The first dimension is the magnitude of the X axis force. The second dimension is the magnitude of the Y axis force. The third dimension is the magnitude of the Z axis force, and the fourth dimension is the shape variable of the object. After that, the vector is normalized so that the length less than 1. The output of this layer is the input of the next layer of capsule network, which simulates human brain’s recognition process. The capsule network is used to perform reverse tactile graph analysis like human beings do. The main capsule layer predicts the captured-object. The full connection layer is used to vote and judge the comprehensive properties of the object.
The first convolutional layer uses 256 convolution kernels (9×9 depth 1, step size 1) and the ReLU activation function, The output tensor is 172×172×256. In addition, the convolution nuclear receptive field of Caps Net is 9×9. The number of weights between the two layers are 9×9×1×256 + 256 = 20992, The last formula 256 is offset value.
The second convolution layer begins to construct the corresponding tensor structure as input of the capsule layer. It uses 32×8 convolution kernels (9×9, depth 256, step size 2), the output tensor is 1893376 (86×86×32 capsule vectors with dimension 8), the weight value between the two layers is 9×9×256×8×32 + 8×32 = 5308672.
The Primary Caps layer has 86×86×32 capsules. Each of the image contains 86×86 capsule. Different image represents different types, and the different capsule of the same image represents different positions.
The conventional convolutional layer of the capsule network deals with the original tactile image, and the value of the obtained value is u
i
, After affine transformation.
In the primary capsule, after the stacked convolution, the output value of the first layer capsule network is s
j
.
The output of the first layer of network needs to be normalized, and the output converted to a vector of length less than one.
The capsule network updates parameters according to the routing-by-agreement [23].
In the final output, the capsule network analyzes the tactile image and classifies it according to the characteristics of the tactile.
TensorFlow was used to build the framework of capsule neural network. The parameters of the framework are shown in Table 1. After 7000 simulated tactile graph training, the accuracy of the model on the training set reached 91%. The model was 89.6% accurate on the final test sets and the results were in line with expectations. The results of the comparison experiment, as shown in the Table 2, capsule network has excellent performance.
Experimental result
Experimental result
In the 1970s, people began to study bionic manip-ulators. However, due to technical limitations, early bionic manipulators only have simple mechanical functions, such as grabbing, moving and so on. There is a big difference from real biological hands. In the later stage, with the development of tactile sensor technology, and the bionic manipulator can have the tactile function which is confined to pres-sure feedback. It can make simple judgement of the hardness and softness of the object. Therefore, it cannot perform fine movement as hands do, espe-cially in the surgical operation.
Brains depend on the neurons in the cerebral cor-tex to process tactile signal to make responding movement. In the bionics, artificial intelligence is combined with tactile sensors to simulate the sense of human touch. In the development of artificial intelligence, convolution neural network is introduced to perform tactile image recognition and obtained properties of objects. But CNN and other machine learning algorithms have some limitations which recognize one direction—upright image and have poor recognition of overlapping images. Capsule network can cover the shortage of CNN.
The evolutionary bionic manipulator simulates the tactile system of the biological hands. It is innovative to process the tactile image by using capsule network and realize the evolutionary bionic tactile function. The manipulator can perform fine movement even with more function than hands. The realization of tactile of manipulator based on capsule network has a broad application prospect.
Conclusion
In this paper, a newly method of manipulator was designed to realize the function of evolutionary bionic. The traditional manipulator has the disadvantage that it cannot feedback the tactile and output the properties of the grabbing object, it is difficult to be applied in the fine processing field. Therefore, in this paper, a new framework of manipulator with flexible tactile sensor arrays were designed to solve this problem, and in recent years, the development of machine learning has provided a good way for us to designed a method to fit into the framework. SVM, ResNet and capsule network can be used for image classification, SVM is a traditional image classification algorithm, it is that SVM algorithm is difficult to implement for large-scale training samples in the test, and it is difficult to solve multi-classification problems with SVM (the classical SVM algorithm only supports the two classification algorithm),it is sensitive to the selection of parameters and kernel functions, ResNet is a deep network, It takes more than one weeks for training. Therefore, the cost of applying it to the actual scene is very high. And it’s like others traditional CNN structure, the ResNet is not enough to sensitive to rotating pictures, the accuracy is very low for rotating images, therefore its average accuracy is modern. Compared with the above two methods, capsule network is as expected, the time to train it is only 20 hours, and the accuracy is better than the other two methods, in the rotating picture, to maintain a good effect. The result is shown in the Table 3. Capsule network can cover the shortage of traditional machine learning algorithms such as SVM and CNN. After the experiment, the capsule network in various aspects are better than other neural networks. It is suitable for realizing manipulator tactile algorithm. The method of manipulator tactile based on the capsule network is of great significance to robot filed.
Comparison between SVM, ResNet & CapsNet
Footnotes
Acknowledgments
This paper is acknowledged by the National Natural Science Foundation of China (Grant: 51502209), the Government Support Enterprise Development Funding of Hubei Province (Grant: 16441), the Three-dimensional Textiles Engineering Research Center of Hubei Province, the Anning Technology Transfer Center of Wuhan Textile University.
