Robust Grasping of a Variable Stiffness Soft Gripper in High-Speed Motion Based on Reinforcement Learning

Abstract

Industrial robots are widely deployed to perform pick-and-place tasks at high speeds to minimize manufacturing time and boost productivity. When dealing with delicate or fragile goods, soft robotic grippers are better end effectors than rigid grippers due to their softness and safe interaction. However, high-speed motion causes the soft robotic gripper to vibrate, leading to damage of the objects or failed grasping. Soft grippers with variable stiffness are considered to be effective in suppressing vibrations by adding damping devices, but it is quite challenging to compromise between stiffness and compliance. In this article, a controller based on deep reinforcement learning is proposed to control the stiffness of the soft robotic gripper, which can accurately suppress the vibration with only a minor influence on its compliance and softness. The proposed controller is a real-time vibration control strategy, which estimates the output of the controller based on the current operating environment. To demonstrate the effectiveness of the proposed controller, experiments were done with a UR5 robotic arm. For different situations, experimental results show that the proposed controller responds quickly and reduces the amplitude of the oscillation substantially.

Introduction

Rigidity and linearity make the traditional rigid robots precise and strong, but also make them dangerous in human interaction. A promising solution to this problem is to replace the rigid links and actuators with a soft deformable one.^1–3 With their softness, exceptional flexibility, and low cost, soft robotic grippers make an excellent candidate for scenarios in which delicate objects need to be handled.^4–6

Soft gripper is typically constructed of materials or structures that have very low damping coefficients. As a result, challenges occur when it is in high-speed motion. It vibrates due to the intense inertia forces produced by the acceleration of the robot arm. These undesirable vibrations can cause unstable or even fail grasping, leading to decreased productivity.⁷ In addition it also causes damage to the structural integrity of the soft gripper, which leads to the reduction of useable life span.⁸

To suppress the vibration of soft grippers, additional components are attached to them to generate mechanical damping.^9–11 Based on passive and active particle damping, Li et al. found that the proposed damper can reduce the vibration amplitude and time effeectively.¹² However, the volume of particle jamming is large, as a lightweight structure, the jamming layer mechanism has gradually become the trend in robotic applications. For example, Choi et al. presented a sliding linkage-based layer jamming mechanism,¹³ for controlling the stiffness of wearable robots. Inspired by this, a fully multimaterial 3D-printed soft gripper presented by Zhu et al.¹⁴ whose stiffness is modified by the jamming layer provided a new idea for the design and manufacture of soft robots. Besides, Wanasinghe et al. developed a layer jamming-based glove and applied it to medical treatment,¹⁵ expanding the range of its applications.

Although high stiffness can suppress vibration well, it influences the compliance of the soft actuator. Several methods were done to control the stiffness of soft robots before. Bern et al. extended an open loop control method called Soft IK, which allows the deformation shape of a cable-driven soft robot as well as its stiffness to be controlled simultaneously.¹⁶ However, the method requires a lot of mathematical modeling, Arachchige et al. utilized empirical data gathered from testing to map input pressure to stiffness for simplifying the model of stiffness control.¹⁷ Moreover, Lai et al. introduced a tendon-tensioning method for adaptively controlling the stiffness of the soft robot.¹⁸ When subjected to the external load, a depth-based closed-loop controller will compensate for the stiffness. Although these methods exhibit the ability to control the stiffness, they have complex control schemes that are time-consuming.

Recently, reinforcement learning has been gradually applied to the field of stiffness control of the robot. For instance, Oikawa et al. used Q-learning to generate the stiffness matrix to modify the trajectory of robots for the execution of assembly tasks that require precise contact with objects without causing damage.¹⁹ To learn high-dimensional stiffness control policy problems, Kim et al. introduced a method called Stiffness Control from Augmented Position control Experiences, which can control stiffness only by position control demonstrations.²⁰ Moreover, Ansari et al. implemented cooperative multiagent reinforcement learning to control the soft manipulator's position and stiffness simultaneously for safe human–robot interaction in the bathing task.²¹

In response to the poor generalization ability and massive parameter adjustment of the traditional control method, this work applies deep reinforcement learning (DRL) to the stiffness control for vibration suppression of the soft gripper in high-speed pick-and-place tasks. The remainder of this article is organized as follows: The Model section describes the physical and mathematical model of the soft robot, the Methodology section provides the formulation of the DRL-based controller, the Experiments and Results section presents experimental results, and the Conclusion and Future Works section concludes the article.

Model

The purpose of this work is to control the stiffness of the soft gripper to suppress the vibration precisely. This not only enables the soft gripper to work in a high-speed environment but also maintains the flexibility of the soft finger. In this work, DRL is chosen to solve this problem. Compared with DRL, conventional controllers are less real-time, resulting in the need to constantly adjust parameters when faced with different tasks. To train the neural network of DRL well, data are obtained by simulations in the environment, which seems to be more reasonable and effective for soft robots than via sensors. Therefore, an approximate model of the soft robotic gripper is built.

System overview

The soft finger is surrounded by a vacuum chamber, and rotation jamming layers (RJLs) are located on both sides of the main body. The soft gripper constructed of two identical soft fingers is mounted on a 3D-printed holder (Fig. 1b). Each finger is mounted at an angle of 30° relative to the vertical axis.

FIG. 1.

(a) Structure of the proposed soft finger. (b) Soft gripper consisting of two soft fingers mounted on the wrist of UR5. (c) The bending state of the soft finger with 50 KPa inflation pressure and the right one applying an additional 30 KPa of vacuum pressure. (d) The bending state of the soft finger under gravity.

Kinematics

In the interest of conciseness, only the two-dimensional (2D) case is considered. As shown in Figure 2, a simplified theoretical model of the soft gripper is constructed. A Cartesian coordinate system $\{G\}$ is established at the midpoint of the holder to calculate the precise positions of the soft gripper's two fingertips. Thus, the opening distance of the soft gripper d (or the width/diameter of the grasped object) can be calculated as follows:

FIG. 2.

The dotted line represents the initial position of the soft gripper, while the solid line represents the current position of the soft gripper. (a) 3D model of the gripper and its main components. Also the simplified model of the soft gripper with the object whose width is d. The red line denotes a three-link rigid robot equivalent to the soft finger in dynamics. The $i - t h$ link is actuated by torque M_i (b) Kinematic representation of the $i - t h$ planar CC segment. (c) The state of the soft gripper when the grasp fails. CC, constant curvature.

d = 2 \{l - R [cos ω - cos (θ_{2} + θ_{3} - ω)]\},

(1)

where l, $ω$ , $θ_{2} + θ_{3}$ denote the Euclid distance between the origins of $\{G_{1}\}$ and $\{G\}$ , the angle between the finger's axis and the vertical direction, and the bending angle of the finger, respectively. The angle $ω$ is determined by the holder's structure, which is 30° in this work. Also, R is the bending radius of the finger.

Piecewise constant curvature (PCC)^22–26 is used to establish the kinematic model of the soft finger. Suppose a soft finger with $p - 1$ RJL units on each side, thus it can be divided into p constant curvature (CC) segments. A group of reference frames $\{G_{0}\}, \{G_{1}\}$ ,…, $\{G_{p - 1}\}, \{G_{p}\}$ are attached to the end of each CC segment. And the origin of $\{G_{i}\} (i = 1, 2, \dots, p - 1)$ is located at the center of the overlapping area of the RJL unit. ${G_{0}}$ is the base frame whose origin is located at the holder, and ${G_{p}}$ locates at the fingertip, as shown in Figure 2a.

Under the CC hypothesis, the reference frames $\{G_{i}\}$ and $\{G_{i - 1}\}$ can completely define the configuration of the $i - t h$ segment. Figure 2b shows the kinematic representation of a single CC segment. The angle $θ_{i}$ between $\{G_{i}\}$ and $\{G_{i - 1}\}$ is defined as the curvature of the $i - t h$ segment. Thus, the transformation between two consecutive frames can be derived as follows: $T_{i}^{i - 1} (θ_{i}) = [\begin{matrix} cos θ_{i} & - sin θ_{i} & L_{i} sin \frac{θ_{i}}{2} \\ sin θ_{i} & cos θ_{i} & L_{i} cos \frac{θ_{i}}{2} \\ 0 & 0 & 1 \end{matrix}],$ (2)

where L_i represents the distance between the origins of $\{G_{i}\}$ and $\{G_{i - 1}\}$ .

As depicted in Figure 2a, the proposed soft finger can be seen as a combination of three segments (two segments with the same curvature and a segment whose curvature is 0). Due to the structural constraints of the gripper's holder, the first segment is attached to the base, resulting in its inability to rotate around the origin of $\{G_{0}\}$ . Reference frames $\{G_{i}\} (i = 0, 1, 2, 3)$ depict the shape of the soft finger. Thus, its kinematics can be defined by a sequence of transformation matrices $T_{1}^{0}$ , $T_{2}^{1}$ , and $T_{3}^{2}$ .

Dynamics

In addition to kinematic analysis, dynamic analysis is also important for designing robot control strategies. The dynamic model depicts the mathematical relationship between motion, driving torque, and load of the robot. It is noteworthy that in the following, for a matrix or vector X, denote by X^T the transpose.

It is always possible to match a generic PCC continuous soft robot with a rigid robot that is dynamically consistent.²⁴ For a single CC segment, the rigid robot matching it always satisfies two constraints: one is that the endpoints of the rigid robot always coincide with the origin of each reference frame on the CC segment, and the other is that under the assumption of a uniform distribution of mass of each CC segment, the rigid robot's center of mass always lies on the same point as the CC segment's center of mass.

According to the method proposed by Della Santina et al.,²³ by considering the overlapping area of RJL unit of the soft finger as the free rotating joints of the rigid robot, we can connect the soft finger to a three-link rigid robot by the map described by Eq. (3), which ensures that the endpoints of each CC segment coincide with the reference points of the rigid robot. During the bending motion of the soft finger, the distance that the overlapping area's center point moves is negligible compared with the length of the link. Thus, the length of the link is considered to be a constant. $u (θ) = {[\begin{matrix} u_{1} {(θ_{1})}^{T} & u_{2} {(θ_{2})}^{T} & u_{3} {(θ_{3})}^{T} \end{matrix}]}^{T},$ (3)

where $u_{i} (θ_{i})$ is the segment map, represented as Eq. (4) $\{\begin{matrix} u_{1} (θ_{1}) = {[\begin{matrix} 0 & \frac{L_{1}}{2} & \frac{L_{1}}{2} & 0 \end{matrix}]}^{T} \\ u_{2} (θ_{2}) = {[\begin{matrix} \frac{θ_{2}}{2} & \frac{L_{2}}{2} & \frac{L_{2}}{2} & \frac{θ_{2}}{2} \end{matrix}]}^{T} \\ u_{3} (θ_{3}) = {[\begin{matrix} \frac{θ_{3}}{2} & \frac{L_{3}}{2} & \frac{L_{3}}{2} & \frac{θ_{3}}{2} \end{matrix}]}^{T} \end{matrix}$ (4)

$L_{i} (i = 1, 2, 3)$ denotes the length of the link of the equivalent three-link rigid robot. It is defined as the distance between a jamming layer's endpoint and the center of the overlapping area of the RJL unit. We present in Figure 2a the equivalent rigid robot for the proposed soft finger, which is indicated by the red line.

For simplicity, it is assumed that the moment generated by the inflation pressure is just enough to allow the soft gripper to grasp the object stably. Thus, the moments generated by the grasping force, the friction force between the gripper and the object, the flexural force, and the inflation pressure cancel each other out. Only the frictional moment between jamming layers created by the vacuum pressure, the inertia moment caused by the variable motion of the robotic arm, and the damping moment of the finger itself must be considered.

Inertia moment

Due to the effect of acceleration, the inertia of the grasped object and the soft finger causes the vibration during the grasp. The inertia moment of the first and the second joints is as follows: $M_{i n 1} = m_{1} a \frac{L_{1}}{2} cos β_{1} + m_{2} a (\frac{L_{1}}{2} cos β_{1} + \frac{L_{2}}{2} cos β_{2}),$ (5) $M_{i n 2} = m_{2} a \frac{L_{2}}{2} cos β_{2},$ (6)

where m₁ represents the mass of the first link, while m₂ is the sum of the mass of the second link and the grasped object. $β_{1}$ and $β_{2}$ represent the angle between the link and the vertical direction, respectively. Also, a is the acceleration of the robotic arm.

Frictional moment

The vacuum pressure in the chamber causes frictional moments that are the same for each joint. Figure 3 shows the enlarged schematic graph of the jamming layer when the soft finger is bending, and the coordinate system is established with point E as the origin. Taking the point H whose coordinate is $(x, y)$ as an example, the frictional moment here can be written as $d M = F_{f} \sqrt{x^{2} + y^{2}} d x d y$ , where F_f is the frictional force. Furthermore, it can be noticed that $∠ A D E = ∠ G C F = \frac{π}{2} - \frac{φ}{2}$ , therefore, the value of x can be expressed as $r - r cos (\frac{π}{2} - \frac{φ}{2})$ . In addition, the value of y, which equals the length of $\bar{H D}$ , is $\sqrt{r^{2} - {[r c os (\frac{π}{2} - \frac{φ}{2}) + x]}^{2}}$ .

FIG. 3.

The schematic of the rotational jamming layer unit.

In summary, the frictional moment M_f of each joint is twice the integral of dM over the overlapping areas, which can be known as follows: $M_{f} = 2 \int \int 4 F_{f} \sqrt{x^{2} + y^{2}} d x d y .$ (7)

The range of x and y is $[0, r - r cos (\frac{π}{2} - \frac{φ}{2})]$ , $[0, \sqrt{r^{2} - {[r c os (\frac{π}{2} - \frac{φ}{2}) + x]}^{2}}]$ , respectively.

Based on our previous work,²⁷ the frictional force can be expressed as Eq. (8). $F_{f} = n ρ S_{A} P,$ (8)

where n is the number of layers, $ρ$ is the friction coefficient, S_A is the overlapping area of two jamming layers, and P is the vacuum pressure applied to the vacuum chamber.

Damping moment

Because of the nature of the material, the components of a soft finger are subject to damping force during its movement. The damping force is used to eliminate the effect of elastic force and the gravity of the soft finger itself. To avoid the complex formulation of rotational damping dissipation, the damping force term is directly added as the external force.²⁸

The formula $M_{d} = λ w$ denotes the moment generated by the damping forces. $λ$ is the viscous coefficient to be determined by minimizing the deformation differences between the experiment and simulation under gravity. And w is the angular velocity of the rigid robot's link. Experiments show that the soft finger takes 0.2 s to achieve a steady state (Fig. 1d). Therefore, the best-fit parameters of $k_{1} = 21.1 N \cdot m m$ , $k_{2} = 128.3 N \cdot m m$ , and $λ = 0.095 N \cdot s ∕ m$ can be obtained. $k_{i} (i = 1, 2)$ are the flexural rigidity coefficients.

Full model

The dynamics of the equivalent rigid robot in compact form can be written as follows: $B (q) \ddot{q} + C (q, \dot{q}) \dot{q} + G (q) = M,$ (9)

where B is the inertia matrix, C is the centrifugal and Coriolis matrix, G is the gravity vector, and M is the external resultant moment.

Then, we integrate the map described by Eq. (3) directly into the dynamics of the system. The time derivatives of it read as follows: $\{\begin{matrix} q = u (θ) \\ \dot{q} = J (θ) \dot{θ} \\ \ddot{q} = \dot{J} (θ, \dot{θ}) \dot{θ} + J (θ) \ddot{θ} \end{matrix},$ (10)

where $J (θ)$ is the Jacobian of $u (θ)$ with respect to $θ$ . By substituting (10) into (9), the dynamics of the soft finger obtains $\begin{matrix} B (u (θ)) [\dot{J} (θ, \dot{θ}) \dot{θ} + J (θ) \ddot{θ}] \\ + C (u (θ), J (θ) \dot{θ}) J (θ) \dot{θ} + G (u (θ)) = J^{T} (θ) M . \end{matrix}$ (11)

Here, $M = {[\begin{matrix} 0 & M_{1} & M_{2} \end{matrix}]}^{T}$ , and $M_{1} = M_{i n 1} + M_{f} + M_{d}$ , $M_{2} = M_{i n 2} + M_{f} + M_{d}$ .

Methodology

In many practical decision-making problems, the state of the Markov Decision Process is high-dimensional and cannot be solved by traditional RL algorithms. The stiffness control problem in this work is no exception. DRL algorithms incorporate deep learning to solve such problems and develop specialized algorithms that perform well in this setting. In the following, we define the state space $S$ , the action space $A$ , the reward function $ℛ$ , and the training details of the DRL model.

States

The state space $S$ contains all the information about the soft gripper with variable stiffness at a specific moment, which is defined as a list $[ϑ_{1}, ϑ_{2}, {\dot{ϑ}}_{1}, {\dot{ϑ}}_{2}, d, m_{o}, a]$ . As depicted in Figure 2a, $ϑ_{i} (i = 1, 2)$ denotes the angle between the $i - t h$ link and the $i - 1 - t h$ link of the equivalent rigid robot, and the magnitude of it is the same owing to the assumption of CC. ${\dot{ϑ}}_{i}$ is the velocity of the corresponding angle. d is the opening distance of the soft gripper. The initial state of $ϑ_{i}$ is determined by it and is limited from 0 to $l + L_{2} sin \frac{π}{12}$ . When d is 0, the gripper is completely closed. When d equals $l + L_{2} sin \frac{π}{12}$ , the third link of the rigid robot is perpendicular to the horizontal, leading to the failure of grasping (Fig. 2c). What is more, m_o is the mass of the grasped object ranging from 0 to 200 g. a is the acceleration of the robotic arm.

In addition, a has been set between −10 and 10 $m ∕ s^{2}$ (the positive and negative signs represent the direction of acceleration, with the negative sign representing to the left, and vice versa to the right) to mimic the actual production environment. The acceleration, mass, and size of the target object affect the magnitude of the inertial force on the soft gripper during the grasping process. In a state of no tendency to move, ${\dot{ϑ}}_{i} = 0$ , $a = 0$ .

Actions

In our DRL setting, the action is modeled using the vacuum pressure for each actuator. The action takes a value from −1 to 1, and we map it to the vacuum pressure using a constant multiplier to ensure that the vacuum pressure output by the controller is limited between 0 and 80 KPa. If the vacuum pressure is >80 KPa, no matter how it changes, the friction between the jamming layers varies little due to the structure. Therefore, 80 KPa is set as the maximum upper limit.

Rewards

The objective of DRL is to learn an optimal policy that has the capability of maximizing the cumulative rewards for its assigned task. The reward quantifies the effect of the action chosen by the optimal policy.

In the pick-and-place task of this work, a reward function, denoted by $ℛ$ , is modeled such that encourages the soft gripper to select a suitable vacuum pressure to suppress the vibration caused by external forces. Also, this means that the joint angle of the equivalent rigid robot varies a little while its velocity remains at a low level. What is more, the resultant external moment acting on the joint should be as small as possible for completing the grasping task in high-speed motion with minimum stiffness of the soft gripper. At the same time, when the steps exceed the limit, regardless of the results, the training will be ended and a large penalty will be given to the agent while the environment is reset, which is expected to achieve the goal with the minimum number of steps.

In addition, if the third link of the equivalent rigid robot is in the vertical direction, the same penalty will be returned if there is a grasping task failure, which can make the soft gripper with variable stiffness effectively avoid the falling of objects during the operation.

Taking all the above factors into consideration, the reward function is described as Eq. (12). The first item is used to motivate the soft gripper to suppress the vibration as precisely as possible, while the second item is responsible for urging the gripper to reach its steady state in the fewest steps.

where $t o r q_{i}$ means the resultant external moment of the $i - t h$ joint.

Training and testing

Deep Deterministic Policy Gradient (DDPG),²⁹ Proximal Policy Optimization (PPO),³⁰ and Advantage Actor-Critic (A2C)³¹ are used to train the DRL model, respectively, all of which are model-free DRL algorithms that skill in solving continuous control problems and have also been applied to the control of soft robot.³²

The training results for three different algorithms are shown in Figure 4a. It can be seen that the rewards of the three DRL methods begin with a noticeable increase, and start to converge at around 1000 episodes. However, the convergence rate and the convergence effect of DDPG are better than the other two algorithms, which implies that it is better able to adapt to the changes in the environment. The task success rate is an important indicator in assessing the learning performance of the DRL model. To evaluate it, we save the training model for testing in a random environment (the acceleration of the robotic arm, the size, and the mass of the grasped object). Figure 4b plots the average grasping success rate for each algorithm after several tests in the simulation.

FIG. 4.

(a) The training process for three different algorithms (A2C, DDPG, PPO). (b) The performance of each algorithm during tests. Performance is the success rate that the neural network can get in the test. The right bars depict the performance of each algorithm after reducing the vacuum pressure by 10 KPa, while the left bars take no action. (c) Average oscillation amplitude of soft finger in the simulation under the control of different algorithms. The line below represents the average oscillation amplitude with no action taken, and the line above represents the oscillation amplitude with the reduction of vacuum pressure. A2C, Advantage Actor-Critic; DDPG, Deep Deterministic Policy Gradient; PPO, Proximal Policy Optimization.

The results show that DDPG has a success rate of more than 90% (note that an action is considered to be successful in training if the object does not fall during the motion). To illustrate that the negative pressure output by the controller just can suppress the vibration of the soft gripper, as a comparison, the output negative pressure is reduced by 10 KPa. If the grasping fails at this time, the proposed control strategy is effective. It can be seen from Figure 4b that when the vacuum pressure output is reduced by 10 KPa, the performance drops significantly, mainly due to the low robotic arm's acceleration. In addition, as can be seen from Figure 4c, under the control of DDPG, the soft finger produces less oscillation compared with A2C and PPO in the simulation. This contributes to the usage of DDPG in this work.

In conclusion, the model trained with the DDPG produces better results in situations. Although PPO also has excellent performance, it is an on-policy algorithm that faces serious sample inefficiency and requires a huge number of samples to learn. Thus, DDPG is adopted in this work to generate the control strategy in the experiment.

Experiments and Results

An experimental platform was designed and built to verify the effectiveness of the proposed controller. The platform consists of a UR5 robotic arm with a robot controller, a computer system, and a soft gripper attached to the wrist of the manipulator.

Experimental setup

Figure 5 depicts the sections and connections between the data processing (software) and the physical setup (hardware) for the experimental implementation. To assist the soft gripper to grasp different objects, an electropneumatic regulator (ITV2030) was installed among the pump with the inflation chamber to regulate the inflation pressure. Due to ITV0090's high-speed response, which is 0.1 s, it was selected to regulate the vacuum pressure in the chamber. A 24 V DC power was adopted to power the pneumatic regulators.

FIG. 5.

(a) The schematic of the hardware connection. (b) Back view of the soft gripper. Four JY61Ps put on the back of the soft gripper, and transmit data to the microcontrol board. (c) The connection between different hardware and software parts in the experimental setup. (d) The schematics of the entire neural network control system framework.

Four six-axis accelerometers (JY61P) whose x-axes overlapped the fingers' axes were fixed with the gripper using a rubber band. As the fingers bend or vibrate, the sensor moves with them, detecting the angle and its velocity across each joint. And these data were transmitted to the host computer, assisting it to make the correct action. Despite the fact that this fixation method has some impact on the movement of the soft gripper, it does not affect the results of the experiment as the sensors are lightweight.

For collecting the data from accelerometers, a control board (Arduino Nano RP2040 Connect) was placed on the gripper's holder horizontally. Thanks to its integrated accelerometer (LSM6DSOX) and Wi-Fi module, it is capable of reading the acceleration of the robotic arm during motion and then transmitting the data to the host computer for real-time control.

An Arduino Uno R3 was used as the control board to generate the pulse width modulation waves corresponding to the control signal to change the state of the pneumatic regulator. The control signal was sent to ITV2030 through the Arduino Uno board to keep the regulator open, allowing the soft gripper to bend to grasp the object. To make the experimental conditions consistent with the simulation, the grasping pressure is inflated into the soft gripper to grasp the target object. The steps for obtaining the grasping pressure are as follows. First, when grasping the target object, the soft gripper must be inflated with a specific pressure (such as 30 KPa); if the target object can be easily picked up at this time, the inflation pressure should be reduced, and vice versa.

This process of adjusting the inflation pressure should be repeated until a proper pressure value that enables the soft gripper to grasp the object without it falling is found. The control signal generated by the host computer was fed to the electronic vacuum regulator ITV0090.

To investigate and analyze the practicability and effectiveness of the proposed dynamic model and the DRL controller, multiple grasping experiments were conducted with different objects under a range of settings (e.g., robotic arm's acceleration, the mass, and the size of the object). The results of our previous experiments¹⁴ indicate that the speed of the robotic arm can lead to slipperiness between layers. To ensure consistent experimental results, the maximum moving speed of the robotic arm was set to 1 m/s for each experiment.

As shown in Table 1, the following three different objects were selected as the experiment objects. A ball with 85-mm diameter weighs 40 g, and two cuboid pill boxes with the same width weigh 40 and 80 g, respectively.

Table 1.

The Physical Characteristics of the Grasped Object in Experiment

Grasped object	Mass	Dimension
Ball (A)	40 g	$D = 85 m m$
Pill box (B)	40 g	$L = 105 m m, W = 70 m m,$ $H = 25 m m$
Pill box (C)	80 g	$L = 105 m m, W = 70 m m,$ $H = 25 m m$
Pear	95 g	$H = 120 m m$ , $D = 70 m m$
Tomato	22 g	$H = 82 m m$ , $D = 70 m m$
Red delicious apple	90 g	$H = 85 m m$ , $D = 78 m m$
Capsicum chinense	69 g	$H = 93 m m$ , $D = 83 m m$
Green apple	74 g	$H = 80 m m$ , $D = 80 m m$

Experiment results

In this experiment, the trajectory during the pick-and-place task executed by the robotic arm is a straight line. As described in the Model section, the magnitude of the robotic arm's acceleration determines the inertia force to which the soft gripper is subjected. To fully confirm the performance of the controller we designed, for the same object, a total of nine sets of grasping experiments at three different robotic arm's accelerations (4, 6, and 8 $m ∕ s^{2}$ ) were performed. Each set of experiments included three cases: (1) with DDPG; (2) 10 KPa lower than the output of DDPG (if the vacuum pressure output by DDPG is ≤10 KPa, 0 KPa was taken); and (3) without DDPG. For a rotating object, the change in moment affects the angular acceleration, which reflects the changing rate of angular velocity.

For each group of experiments, the angular velocity of the left finger of the soft gripper was recorded. These data can be referred to as an indicator of the soft finger vibration amplitude during the motion. A smaller absolute value means less vibration, proving a better control effect. Based on these data, we plotted the vibration curves as shown in Figure 6, where dashed line represents the angular acceleration, while the solid line represents the angular velocity.

FIG. 6.

The angular velocity and angular acceleration of the left soft finger's first joint during experiments. (a–c) Object A with accelerations of 4, 6, and 8 m/s². (d–f) Object B with accelerations of 4, 6, and 8 m/s². (g–i) Object C with accelerations of 4, 6, and 8 m/s².

For object A, the grasping pressure is 30 KPa. Figure 6a–c illustrates how the angular velocity varies when the acceleration of the robotic arm changes. Under the proposed DRL-based controller, the peaks of the angular velocity had a significant drop by 69.18%, 67.33%, and 71.48% compared with the uncontrolled group, proving the effectiveness of the proposed control method. The experiment results showed that the greater the acceleration, the greater the peak value of the angular velocity. This is because the inertial force on the object increases with acceleration. We can also see that when the robotic arm's acceleration is not very large, the difference between the angular velocity in the second case and the third case is not significant, as shown in Figure 5a and b. This is mainly because the vacuum pressure output in the third case is close to 10 KPa.

The grasping pressure of object B is 43 KPa on account of the smaller diameter relative to A. The variation of the angular velocity is depicted in Figure 6d–f. The angular velocities without control peaked at 11.66, 55.01, and 84.25°/s when the acceleration of the robotic arm was 4, 6, and 8 $m ∕ s^{2}$ , respectively. In addition, these angular velocities were reduced by 45.97%, 66.93%, and 69.50% with the application of the DRL-based controller. It allowed the soft robotic gripper to achieve a better grasping performance during high-speed motion when compared with the uncontrolled state.

For object C, the grasping pressure is 55 KPa. Compared with the experiments with objects A and B, the angular velocities increased, where the large weight played a critical role. The peak values of angular velocity without control were 50.78, 77.82, and 123.17°/s for three different acceleration cases. Under the control of the DRL-based controller, the angular velocities were decreased by 73.93%, 79.68%, and 87.16% compared with the uncontrolled case. Also, if we reduced 10 KPa from the output vacuum pressure, the maximum values of the angular velocity increased by 188.82%, 50.92%, and 237.82% compared with the case with control. This result illustrated that the controller can select the appropriate vacuum pressure without losing too much compliance of the soft finger.

In the above nine sets of experiments, when we reduced the vacuum pressure output by the controller by 10 KPa, the angular velocity decreased compared with the case with DDPG, indicating that the vibration of the soft gripper was somewhat suppressed at this point, but not as effective as DDPG. It can be seen that the angular acceleration with the DRL-based controller has the smallest fluctuation among three cases, indicating the effectiveness of the proposed method. It showed that the vacuum pressure output by the DRL-based controller can suppress the vibration well while maintaining the flexibility of the soft gripper to the maximum.

It can be seen from Figure 7a and b that when the acceleration is 8 $m ∕ s^{2}$ , the grasped object fell off during the motion, while it could be grasped stably after applying the DRL-based controller. This phenomenon showed that the proposed controller has a good ability to suppress the vibration even with high acceleration. In fact, as the acceleration of the robotic arm and the mass of the item increase, the effectiveness of the controller becomes more apparent.

FIG. 7.

Photograph sequence of the grasping process with an acceleration of 8 m/s² and a maximum speed of 1 m/s. (a) With object A (Supplementary Video S1). (Ⅰ–Ⅳ) Without control, (Ⅴ–Ⅷ) with DDPG. (b) With object C. (Ⅰ–Ⅳ) Without control, (Ⅴ–Ⅷ) with DDPG.

In addition, fruit and vegetable models were also used in the grasping experiments, and each object was grasped five times. Details of tested objects and their grasping success rates under different accelerations are referred to in Table 2. It can be seen that in the case of high-speed motion, there is a significant increase in the success rate after the implementation of the control strategy compared with the case without control, especially for large-quality objects, such as the red delicious apple and pear. This shows the effectiveness of the control method proposed in this article.

Table 2.

The Grasping Success Rate of Objects Under Different Accelerations

Grasped objects	Without control			With control
Grasped objects	4 m/s²	6 m/s²	8 m/s²	4 m/s²	6 m/s²	8 m/s²
Ball (A)	60%	40%	20%	100%	100%	80%
Pill box (B)	60%	20%	0%	100%	100%	80%
Pill box (C)	60%	0%	0%	80%	80%	60%
Pear	40%	20%	0%	80%	60%	60%
Tomato	80%	40%	20%	100%	100%	80%
Red delicious apple	40%	0%	0%	80%	80%	60%
Capsicum chinense	40%	0%	0%	100%	100%	80%
Green apple	40%	0%	0%	100%	80%	80%

Conclusion and Future Works

In this work, a soft gripper stiffness control strategy in different scenarios based on DRL in 2D space is presented. The dynamics of a three-link rigid robot that is equivalent to the soft finger is proposed to capture all deformations and geometric variations of the soft finger during motion and is used in designing a DRL-based controller. The damping moments of the soft finger are added to the model to have near-realistic dynamic behavior. This enables the soft gripper to change the vacuum pressure in time to respond to changes in the environment, such as changes in the robotic arm's acceleration and the grasped object.

Experiment results have shown that in the process of movement, the vibration of the soft gripper under the control of the DRL-based controller is successfully suppressed. In this work, the manipulator in this task is ordered to do a plane motion. Future work will extend this approach into 3D space, and so, we can investigate motions such as moving forward or backward. In addition, we will also consider real-world situations such as the need to avoid obstacles and complete more complex motion trajectories.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This study is supported by the National Natural Science Foundation of China (52005440) and the Fundamental Research Funds for the Central Universities (D5000220210).

Supplementary Material

References

Tang

, Zhang

, Zhong

, et al. Customizing a self-healing soft pump for robot. Nat Commun, 2021; 12(1):1–11.

Wang

, Kanegae

, Hirai

. Circular shell gripper for handling food products. Soft Robot, 2021; 8(5):542–554.

, Fan

, Zhu

, et al. Origami-inspired soft twisting actuator. Soft Robot, 2023; 10(2):395–409; doi: 10.1089/SORO.2021.0185

Zhong

, Tang

, Zhang

, et al. Programmable thermochromic soft actuators with “two dimensional” bilayer architectures for soft robotics. Nano Energy, 2022; 102:107741.

Fang

, Sun

, Wu

, et al. Multimode grasping soft gripper achieved by layer jamming structure and tendon-driven mechanism. Soft Robot, 2022; 9(2):233–249.

Wang

, Wang

, Liu

. Adaptive fuzzy finite-time control of stochastic nonlinear systems with actuator faults. IEEE Trans Cybern, 2021; 104:1–12.

Teeple

, Koutros

, Graule

, et al. Multi-segment soft robotic fingers enable robust precision grasping. Int J Rob Res, 2020; 39(14):1647–1667.

Khan

, Li

. Sliding mode control with PID sliding surface for active vibration damping of pneumatically actuated soft robots. IEEE Access, 2020; 8:88793–88800.

Fitzgerald

, Delaney

, Howard

. A review of jamming actuation in soft robotics. Actuators, 2020; 9(4):104.

10.

Gerez

, Gao

, Liarokapis

. Laminar jamming flexure joints for the development of variable stiffness robot grippers and hands. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, USA; 2020; pp. 8709–8715.

11.

Levine

, Turner

, Pikul

. Materials with electroprogrammable stiffness. Adv Mater, 2021; 33(35):2007952.

12.

, Chen

, Ren

, et al. Passive and active particle damping in soft robotic actuators. In: IEEE International Conference on Robotics and Automation, Brisbane, Australia; 2018; pp. 1547–1552.

13.

Choi

, Kim

, Lee

, et al. Soft, multi-DoF, variable stiffness mechanism using layer jamming for wearable robots. IEEE Robot Autom Lett, 2019; 4(3):2539–2546.

14.

Zhu

, Mori

, Wakayama

, et al. A fully multi-material three-dimensional printed soft gripper with variable stiffness for robust grasping. Soft Robot, 2019; 6(4):507–519.

15.

Wanasinghe

, Awantha

WVI

, Kavindya

AGP

, et al. A layer jamming soft glove for hand tremor suppression. IEEE Trans Neural Syst Rehabil Eng, 2021; 29:2684–2694.

16.

Bern

, Rus

. Soft IK with stiffness control. In: IEEE 4th International Conference on Soft Robotics, New Haven, USA; 2021; pp. 465–471.

17.

Arachchige

DDK

, Chen

, Walker

, et al. A novel variable stiffness soft robotic gripper. In: IEEE International Conference on Automation Science and Engineering, Lyon, France; 2021; pp. 2222–2227.

18.

Lai

, Lu

, Chu

. Variable-stiffness control of a dual-segment soft robot using depth vision. IEEE ASME Trans Mechatron, 2022; 27(2):1034–1045.

19.

Oikawa

, Kutsuzawa

, Sakaino

, et al. Assembly robots with optimized control stiffness through reinforcement learning. arXiv, 2020; 2020:2002.12207.

20.

Kim

, Niekum

, Deshpande

. SCAPE: Learning stiffness control from augmented position control experiences. In: Conference on Robot Learning, London, UK; 2022; pp. 1512–1521.

21.

Ansari

, Manti

, Falotico

, et al. Multiobjective optimization for stiffness and position control in a soft robot arm module. IEEE Robot Autom Lett, 2018; 3(1):108–115.

22.

Della Santina

, Duriez

, Rus

. Model based control of soft robots: A survey of the state of the art and open challenges. arXiv, 2021; arXiv:2110.01358.

23.

Della Santina

, Katzschmann

, Bicchi

, et al. Dynamic control of soft robots interacting with the environment. In: 2018 IEEE International Conference on Soft Robotics, Livorno, Italy; 2018; pp. 46–53.

24.

Della Santina

, Katzschmann

, Bicchi

, et al. Model-based dynamic feedback control of a planar soft robot: Trajectory tracking and interaction with the environment. Int J Rob Res, 2020; 39(4):490–513.

25.

Della Santina

, Bicchi

, Rus

. On an improved state parametrization for soft robots with piecewise constant curvature and its use in model based control. IEEE Robot Autom Lett, 2020; 5(2):1001–1008.

26.

Katzschmann

, Della

, Toshimitsu

, et al. Dynamic motion control of multi-segment soft robots using piecewise constant curvature matched with an augmented rigid body model. In: 2019 IEEE International Conference on Soft Robotics, Seoul, Korea; 2019; pp. 454–461.

27.

Dai

, Zhu

, Feng

. Stiffness control for a soft robotic finger based on reinforcement learning for robust grasping. In: 2021 27th International Conference on Mechatronics and Machine Vision in Practice, Shanghai, China; 2021; pp. 540–545.

28.

Wang

, Hirai

. Soft gripper dynamics using a line-segment model with an optimization-based parameter identification method. IEEE Robot Autom Lett, 2017; 2(2):624–631.

29.

Lillicrap

, Hunt

, Pritzel

, et al. Continuous control with deep reinforcement learning. arXiv, 2015; 2015:1509.02971.

30.

Schulman

, Wolski

, Dhariwal

, et al. Proximal policy optimization algorithms. arXiv, 2017; 2017:1707.06347.

31.

Mnih

, Puigdomènech Badia

, Mirza

, et al. Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, New York, USA; 2016; pp. 1928–1937.

32.

Bhagat

, Banerjee

, Tse

ZTH

, et al. Deep reinforcement learning for soft, flexible robots: Brief review with impending challenges. Robotics, 2019; 8(1):4.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

44.44 MB

0.00 MB