Abstract
Industrial robots are widely deployed to perform pick-and-place tasks at high speeds to minimize manufacturing time and boost productivity. When dealing with delicate or fragile goods, soft robotic grippers are better end effectors than rigid grippers due to their softness and safe interaction. However, high-speed motion causes the soft robotic gripper to vibrate, leading to damage of the objects or failed grasping. Soft grippers with variable stiffness are considered to be effective in suppressing vibrations by adding damping devices, but it is quite challenging to compromise between stiffness and compliance. In this article, a controller based on deep reinforcement learning is proposed to control the stiffness of the soft robotic gripper, which can accurately suppress the vibration with only a minor influence on its compliance and softness. The proposed controller is a real-time vibration control strategy, which estimates the output of the controller based on the current operating environment. To demonstrate the effectiveness of the proposed controller, experiments were done with a UR5 robotic arm. For different situations, experimental results show that the proposed controller responds quickly and reduces the amplitude of the oscillation substantially.
Introduction
Rigidity and linearity make the traditional rigid robots precise and strong, but also make them dangerous in human interaction. A promising solution to this problem is to replace the rigid links and actuators with a soft deformable one.1–3 With their softness, exceptional flexibility, and low cost, soft robotic grippers make an excellent candidate for scenarios in which delicate objects need to be handled.4–6
Soft gripper is typically constructed of materials or structures that have very low damping coefficients. As a result, challenges occur when it is in high-speed motion. It vibrates due to the intense inertia forces produced by the acceleration of the robot arm. These undesirable vibrations can cause unstable or even fail grasping, leading to decreased productivity. 7 In addition it also causes damage to the structural integrity of the soft gripper, which leads to the reduction of useable life span. 8
To suppress the vibration of soft grippers, additional components are attached to them to generate mechanical damping.9–11 Based on passive and active particle damping, Li et al. found that the proposed damper can reduce the vibration amplitude and time effeectively. 12 However, the volume of particle jamming is large, as a lightweight structure, the jamming layer mechanism has gradually become the trend in robotic applications. For example, Choi et al. presented a sliding linkage-based layer jamming mechanism, 13 for controlling the stiffness of wearable robots. Inspired by this, a fully multimaterial 3D-printed soft gripper presented by Zhu et al. 14 whose stiffness is modified by the jamming layer provided a new idea for the design and manufacture of soft robots. Besides, Wanasinghe et al. developed a layer jamming-based glove and applied it to medical treatment, 15 expanding the range of its applications.
Although high stiffness can suppress vibration well, it influences the compliance of the soft actuator. Several methods were done to control the stiffness of soft robots before. Bern et al. extended an open loop control method called Soft IK, which allows the deformation shape of a cable-driven soft robot as well as its stiffness to be controlled simultaneously. 16 However, the method requires a lot of mathematical modeling, Arachchige et al. utilized empirical data gathered from testing to map input pressure to stiffness for simplifying the model of stiffness control. 17 Moreover, Lai et al. introduced a tendon-tensioning method for adaptively controlling the stiffness of the soft robot. 18 When subjected to the external load, a depth-based closed-loop controller will compensate for the stiffness. Although these methods exhibit the ability to control the stiffness, they have complex control schemes that are time-consuming.
Recently, reinforcement learning has been gradually applied to the field of stiffness control of the robot. For instance, Oikawa et al. used Q-learning to generate the stiffness matrix to modify the trajectory of robots for the execution of assembly tasks that require precise contact with objects without causing damage. 19 To learn high-dimensional stiffness control policy problems, Kim et al. introduced a method called Stiffness Control from Augmented Position control Experiences, which can control stiffness only by position control demonstrations. 20 Moreover, Ansari et al. implemented cooperative multiagent reinforcement learning to control the soft manipulator's position and stiffness simultaneously for safe human–robot interaction in the bathing task. 21
In response to the poor generalization ability and massive parameter adjustment of the traditional control method, this work applies deep reinforcement learning (DRL) to the stiffness control for vibration suppression of the soft gripper in high-speed pick-and-place tasks. The remainder of this article is organized as follows: The Model section describes the physical and mathematical model of the soft robot, the Methodology section provides the formulation of the DRL-based controller, the Experiments and Results section presents experimental results, and the Conclusion and Future Works section concludes the article.
Model
The purpose of this work is to control the stiffness of the soft gripper to suppress the vibration precisely. This not only enables the soft gripper to work in a high-speed environment but also maintains the flexibility of the soft finger. In this work, DRL is chosen to solve this problem. Compared with DRL, conventional controllers are less real-time, resulting in the need to constantly adjust parameters when faced with different tasks. To train the neural network of DRL well, data are obtained by simulations in the environment, which seems to be more reasonable and effective for soft robots than via sensors. Therefore, an approximate model of the soft robotic gripper is built.
System overview
The soft finger is surrounded by a vacuum chamber, and rotation jamming layers (RJLs) are located on both sides of the main body. The soft gripper constructed of two identical soft fingers is mounted on a 3D-printed holder (Fig. 1b). Each finger is mounted at an angle of 30° relative to the vertical axis.

Kinematics
In the interest of conciseness, only the two-dimensional (2D) case is considered. As shown in Figure 2, a simplified theoretical model of the soft gripper is constructed. A Cartesian coordinate system

The dotted line represents the initial position of the soft gripper, while the solid line represents the current position of the soft gripper.
where l,
Piecewise constant curvature (PCC)22–26
is used to establish the kinematic model of the soft finger. Suppose a soft finger with
Under the CC hypothesis, the reference frames
where Li represents the distance between the origins of
As depicted in Figure 2a, the proposed soft finger can be seen as a combination of three segments (two segments with the same curvature and a segment whose curvature is 0). Due to the structural constraints of the gripper's holder, the first segment is attached to the base, resulting in its inability to rotate around the origin of
Dynamics
In addition to kinematic analysis, dynamic analysis is also important for designing robot control strategies. The dynamic model depicts the mathematical relationship between motion, driving torque, and load of the robot. It is noteworthy that in the following, for a matrix or vector X, denote by XT the transpose.
It is always possible to match a generic PCC continuous soft robot with a rigid robot that is dynamically consistent. 24 For a single CC segment, the rigid robot matching it always satisfies two constraints: one is that the endpoints of the rigid robot always coincide with the origin of each reference frame on the CC segment, and the other is that under the assumption of a uniform distribution of mass of each CC segment, the rigid robot's center of mass always lies on the same point as the CC segment's center of mass.
According to the method proposed by Della Santina et al.,
23
by considering the overlapping area of RJL unit of the soft finger as the free rotating joints of the rigid robot, we can connect the soft finger to a three-link rigid robot by the map described by Eq. (3), which ensures that the endpoints of each CC segment coincide with the reference points of the rigid robot. During the bending motion of the soft finger, the distance that the overlapping area's center point moves is negligible compared with the length of the link. Thus, the length of the link is considered to be a constant.
where
For simplicity, it is assumed that the moment generated by the inflation pressure is just enough to allow the soft gripper to grasp the object stably. Thus, the moments generated by the grasping force, the friction force between the gripper and the object, the flexural force, and the inflation pressure cancel each other out. Only the frictional moment between jamming layers created by the vacuum pressure, the inertia moment caused by the variable motion of the robotic arm, and the damping moment of the finger itself must be considered.
Inertia moment
Due to the effect of acceleration, the inertia of the grasped object and the soft finger causes the vibration during the grasp. The inertia moment of the first and the second joints is as follows:
where m1 represents the mass of the first link, while m2 is the sum of the mass of the second link and the grasped object.
Frictional moment
The vacuum pressure in the chamber causes frictional moments that are the same for each joint. Figure 3 shows the enlarged schematic graph of the jamming layer when the soft finger is bending, and the coordinate system is established with point E as the origin. Taking the point H whose coordinate is

The schematic of the rotational jamming layer unit.
In summary, the frictional moment Mf of each joint is twice the integral of dM over the overlapping areas, which can be known as follows:
The range of x and y is
Based on our previous work,
27
the frictional force can be expressed as Eq. (8).
where n is the number of layers,
Damping moment
Because of the nature of the material, the components of a soft finger are subject to damping force during its movement. The damping force is used to eliminate the effect of elastic force and the gravity of the soft finger itself. To avoid the complex formulation of rotational damping dissipation, the damping force term is directly added as the external force. 28
The formula
Full model
The dynamics of the equivalent rigid robot in compact form can be written as follows:
where B is the inertia matrix, C is the centrifugal and Coriolis matrix, G is the gravity vector, and M is the external resultant moment.
Then, we integrate the map described by Eq. (3) directly into the dynamics of the system. The time derivatives of it read as follows:
where
Here,
Methodology
In many practical decision-making problems, the state of the Markov Decision Process is high-dimensional and cannot be solved by traditional RL algorithms. The stiffness control problem in this work is no exception. DRL algorithms incorporate deep learning to solve such problems and develop specialized algorithms that perform well in this setting. In the following, we define the state space
States
The state space
In addition, a has been set between −10 and 10
Actions
In our DRL setting, the action is modeled using the vacuum pressure for each actuator. The action takes a value from −1 to 1, and we map it to the vacuum pressure using a constant multiplier to ensure that the vacuum pressure output by the controller is limited between 0 and 80 KPa. If the vacuum pressure is >80 KPa, no matter how it changes, the friction between the jamming layers varies little due to the structure. Therefore, 80 KPa is set as the maximum upper limit.
Rewards
The objective of DRL is to learn an optimal policy that has the capability of maximizing the cumulative rewards for its assigned task. The reward quantifies the effect of the action chosen by the optimal policy.
In the pick-and-place task of this work, a reward function, denoted by
In addition, if the third link of the equivalent rigid robot is in the vertical direction, the same penalty will be returned if there is a grasping task failure, which can make the soft gripper with variable stiffness effectively avoid the falling of objects during the operation.
Taking all the above factors into consideration, the reward function is described as Eq. (12). The first item is used to motivate the soft gripper to suppress the vibration as precisely as possible, while the second item is responsible for urging the gripper to reach its steady state in the fewest steps.
where
Training and testing
Deep Deterministic Policy Gradient (DDPG), 29 Proximal Policy Optimization (PPO), 30 and Advantage Actor-Critic (A2C) 31 are used to train the DRL model, respectively, all of which are model-free DRL algorithms that skill in solving continuous control problems and have also been applied to the control of soft robot. 32
The training results for three different algorithms are shown in Figure 4a. It can be seen that the rewards of the three DRL methods begin with a noticeable increase, and start to converge at around 1000 episodes. However, the convergence rate and the convergence effect of DDPG are better than the other two algorithms, which implies that it is better able to adapt to the changes in the environment. The task success rate is an important indicator in assessing the learning performance of the DRL model. To evaluate it, we save the training model for testing in a random environment (the acceleration of the robotic arm, the size, and the mass of the grasped object). Figure 4b plots the average grasping success rate for each algorithm after several tests in the simulation.

The results show that DDPG has a success rate of more than 90% (note that an action is considered to be successful in training if the object does not fall during the motion). To illustrate that the negative pressure output by the controller just can suppress the vibration of the soft gripper, as a comparison, the output negative pressure is reduced by 10 KPa. If the grasping fails at this time, the proposed control strategy is effective. It can be seen from Figure 4b that when the vacuum pressure output is reduced by 10 KPa, the performance drops significantly, mainly due to the low robotic arm's acceleration. In addition, as can be seen from Figure 4c, under the control of DDPG, the soft finger produces less oscillation compared with A2C and PPO in the simulation. This contributes to the usage of DDPG in this work.
In conclusion, the model trained with the DDPG produces better results in situations. Although PPO also has excellent performance, it is an on-policy algorithm that faces serious sample inefficiency and requires a huge number of samples to learn. Thus, DDPG is adopted in this work to generate the control strategy in the experiment.
Experiments and Results
An experimental platform was designed and built to verify the effectiveness of the proposed controller. The platform consists of a UR5 robotic arm with a robot controller, a computer system, and a soft gripper attached to the wrist of the manipulator.
Experimental setup
Figure 5 depicts the sections and connections between the data processing (software) and the physical setup (hardware) for the experimental implementation. To assist the soft gripper to grasp different objects, an electropneumatic regulator (ITV2030) was installed among the pump with the inflation chamber to regulate the inflation pressure. Due to ITV0090's high-speed response, which is 0.1 s, it was selected to regulate the vacuum pressure in the chamber. A 24 V DC power was adopted to power the pneumatic regulators.

Four six-axis accelerometers (JY61P) whose x-axes overlapped the fingers' axes were fixed with the gripper using a rubber band. As the fingers bend or vibrate, the sensor moves with them, detecting the angle and its velocity across each joint. And these data were transmitted to the host computer, assisting it to make the correct action. Despite the fact that this fixation method has some impact on the movement of the soft gripper, it does not affect the results of the experiment as the sensors are lightweight.
For collecting the data from accelerometers, a control board (Arduino Nano RP2040 Connect) was placed on the gripper's holder horizontally. Thanks to its integrated accelerometer (LSM6DSOX) and Wi-Fi module, it is capable of reading the acceleration of the robotic arm during motion and then transmitting the data to the host computer for real-time control.
An Arduino Uno R3 was used as the control board to generate the pulse width modulation waves corresponding to the control signal to change the state of the pneumatic regulator. The control signal was sent to ITV2030 through the Arduino Uno board to keep the regulator open, allowing the soft gripper to bend to grasp the object. To make the experimental conditions consistent with the simulation, the grasping pressure is inflated into the soft gripper to grasp the target object. The steps for obtaining the grasping pressure are as follows. First, when grasping the target object, the soft gripper must be inflated with a specific pressure (such as 30 KPa); if the target object can be easily picked up at this time, the inflation pressure should be reduced, and vice versa.
This process of adjusting the inflation pressure should be repeated until a proper pressure value that enables the soft gripper to grasp the object without it falling is found. The control signal generated by the host computer was fed to the electronic vacuum regulator ITV0090.
To investigate and analyze the practicability and effectiveness of the proposed dynamic model and the DRL controller, multiple grasping experiments were conducted with different objects under a range of settings (e.g., robotic arm's acceleration, the mass, and the size of the object). The results of our previous experiments 14 indicate that the speed of the robotic arm can lead to slipperiness between layers. To ensure consistent experimental results, the maximum moving speed of the robotic arm was set to 1 m/s for each experiment.
As shown in Table 1, the following three different objects were selected as the experiment objects. A ball with 85-mm diameter weighs 40 g, and two cuboid pill boxes with the same width weigh 40 and 80 g, respectively.
The Physical Characteristics of the Grasped Object in Experiment
Experiment results
In this experiment, the trajectory during the pick-and-place task executed by the robotic arm is a straight line. As described in the Model section, the magnitude of the robotic arm's acceleration determines the inertia force to which the soft gripper is subjected. To fully confirm the performance of the controller we designed, for the same object, a total of nine sets of grasping experiments at three different robotic arm's accelerations (4, 6, and 8
For each group of experiments, the angular velocity of the left finger of the soft gripper was recorded. These data can be referred to as an indicator of the soft finger vibration amplitude during the motion. A smaller absolute value means less vibration, proving a better control effect. Based on these data, we plotted the vibration curves as shown in Figure 6, where dashed line represents the angular acceleration, while the solid line represents the angular velocity.

For object A, the grasping pressure is 30 KPa. Figure 6a–c illustrates how the angular velocity varies when the acceleration of the robotic arm changes. Under the proposed DRL-based controller, the peaks of the angular velocity had a significant drop by 69.18%, 67.33%, and 71.48% compared with the uncontrolled group, proving the effectiveness of the proposed control method. The experiment results showed that the greater the acceleration, the greater the peak value of the angular velocity. This is because the inertial force on the object increases with acceleration. We can also see that when the robotic arm's acceleration is not very large, the difference between the angular velocity in the second case and the third case is not significant, as shown in Figure 5a and b. This is mainly because the vacuum pressure output in the third case is close to 10 KPa.
The grasping pressure of object B is 43 KPa on account of the smaller diameter relative to A. The variation of the angular velocity is depicted in Figure 6d–f. The angular velocities without control peaked at 11.66, 55.01, and 84.25°/s when the acceleration of the robotic arm was 4, 6, and 8
For object C, the grasping pressure is 55 KPa. Compared with the experiments with objects A and B, the angular velocities increased, where the large weight played a critical role. The peak values of angular velocity without control were 50.78, 77.82, and 123.17°/s for three different acceleration cases. Under the control of the DRL-based controller, the angular velocities were decreased by 73.93%, 79.68%, and 87.16% compared with the uncontrolled case. Also, if we reduced 10 KPa from the output vacuum pressure, the maximum values of the angular velocity increased by 188.82%, 50.92%, and 237.82% compared with the case with control. This result illustrated that the controller can select the appropriate vacuum pressure without losing too much compliance of the soft finger.
In the above nine sets of experiments, when we reduced the vacuum pressure output by the controller by 10 KPa, the angular velocity decreased compared with the case with DDPG, indicating that the vibration of the soft gripper was somewhat suppressed at this point, but not as effective as DDPG. It can be seen that the angular acceleration with the DRL-based controller has the smallest fluctuation among three cases, indicating the effectiveness of the proposed method. It showed that the vacuum pressure output by the DRL-based controller can suppress the vibration well while maintaining the flexibility of the soft gripper to the maximum.
It can be seen from Figure 7a and b that when the acceleration is 8

Photograph sequence of the grasping process with an acceleration of 8 m/s
2
and a maximum speed of 1 m/s.
In addition, fruit and vegetable models were also used in the grasping experiments, and each object was grasped five times. Details of tested objects and their grasping success rates under different accelerations are referred to in Table 2. It can be seen that in the case of high-speed motion, there is a significant increase in the success rate after the implementation of the control strategy compared with the case without control, especially for large-quality objects, such as the red delicious apple and pear. This shows the effectiveness of the control method proposed in this article.
The Grasping Success Rate of Objects Under Different Accelerations
Conclusion and Future Works
In this work, a soft gripper stiffness control strategy in different scenarios based on DRL in 2D space is presented. The dynamics of a three-link rigid robot that is equivalent to the soft finger is proposed to capture all deformations and geometric variations of the soft finger during motion and is used in designing a DRL-based controller. The damping moments of the soft finger are added to the model to have near-realistic dynamic behavior. This enables the soft gripper to change the vacuum pressure in time to respond to changes in the environment, such as changes in the robotic arm's acceleration and the grasped object.
Experiment results have shown that in the process of movement, the vibration of the soft gripper under the control of the DRL-based controller is successfully suppressed. In this work, the manipulator in this task is ordered to do a plane motion. Future work will extend this approach into 3D space, and so, we can investigate motions such as moving forward or backward. In addition, we will also consider real-world situations such as the need to avoid obstacles and complete more complex motion trajectories.
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
