Abstract
Soft robotics promises to achieve safe and efficient interactions with the environment by exploiting its inherent compliance and designing control strategies. However, effective control for the soft robot–environment interaction has been a challenging task. The challenges arise from the nonlinearity and complexity of soft robot dynamics, especially in situations where the environment is unknown and uncertainties exist, making it difficult to establish analytical models. In this study, we propose a learning-based optimal control approach as an attempt to address these challenges, which is an optimized combination of a feedforward controller based on probabilistic model predictive control and a feedback controller based on nonparametric learning methods. The approach is purely data-driven, without prior knowledge of soft robot dynamics and environment structures, and can be easily updated online to adapt to unknown environments. A theoretical analysis of the approach is provided to ensure its stability and convergence. The proposed approach enabled a soft robotic manipulator to track target positions and forces when interacting with a manikin in different cases. Moreover, comparisons with other data-driven control methods show a better performance of our approach. Overall, this work provides a viable learning-based control approach for soft robot–environment interactions with force/position tracking capability.
Introduction
Safety and performance are important factors for robot–environment interactions. 1 Soft robotics is widely recognized as enabling safe interaction with the environment due to its inherent compliance.2–4 However, its passive compliance also limits the tasks that can be performed in certain ecological niches. Appropriate control strategies should be developed to improve its performance. 5 Most of the existing studies on soft robot control focus on the soft robot itself without considering the interaction with the environment. 6 This can be attributed to the challenges of controlling soft robot–environment interaction. Soft robotic manipulators have nonlinear and complex dynamics, let alone interact with unknown environments, and uncertainties exist in real systems such as sensor noise. The challenges make classical control methods hard to be directly applied because accurate analytical models are difficult to derive without making significant assumptions/simplifications. 7
Alternatively, machine learning considers robots as data systems so that data-driven approaches can be applied.8,9 Machine learning-based control has great advantages in highly nonlinear, nonuniform, and unstructured environment situations. 10 Therefore, the objective of this study is to develop a learning-based control approach for the soft robot–environment interaction. Specifically, we focus on a typical problem in the soft robot–environment interaction, that is, hybrid position/force control, which is usually required in assistive and surgical robots to perform their intended tasks.11,12 For example, previous studies13,14 designed soft robotic manipulators to assist elderly people in the showering activity. The motivation for a soft robot to assist in showering originates from elderly care. Elderly people usually need assistance from others to perform daily activities due to age-related conditions and loss of abilities.
According to studies on affected daily activities of older adults,15,16 showering activity is one of the most critical but high-risk task. Position control allows the robot to wipe specific areas of the body, and force control allows the right amount of force to have comfortable interactions. In this study, our approach uses this shower-assisting soft robot as an example, but is not limited to this robot because the approach is purely data-driven and does not rely on specific robot structures.
The concept of the soft robot–environment interaction in this study is shown in Figure 1. Since the focus of this study is to test the force/position control performance of the proposed approach rather than human clinical trials, a manikin is used as the test environment to simulate the human body and a soft robotic manipulator is controlled to wipe the manikin as a simulation of showering. During the soft robot–environment interaction scenarios, the system has perceptive capabilities such as force and camera sensors to provide force and position data. The data are used to train the learning-based control algorithm. The algorithm can control the soft robotic manipulator to track position and force trajectories when wiping the manikin. The sensors in this study are only used to test the approach and should be tailored for real-world applications to avoid privacy issues and reduce costs.

Schematic diagram of the soft robot–environment interaction system. A manikin is used as the test environment and a soft robotic manipulator is controlled to wipe the manikin for simulation of shower assistance. Perception data including force and position are collected to train the controller. The learning-based controller drives the soft robotic manipulator to interact with the manikin through force/position tracking capability. X1, X2, … , X_M are model outputs (position and force in this study). U1, U2, … , U_D are model inputs (robot actuations).
Article contribution
In this study, we propose a learning-based control approach for the soft robot–environment interaction system with force/position tracking capability. Figure 2 shows the general idea of our approach, which is purely data-driven without requiring prior knowledge of the system dynamics. This shows the potential of the approach to be extended to other robotic systems with available data.

Schematic diagram of the proposed control approach. The control action U_t is an optimized combination of feedforward action U_{ff} and feedback action U_{fb} to minimize the tracking error. The feedforward action U_{ff} is computed by probabilistic MPC, considering uncertainty propagation in multistep predictions. The probabilistic model can capture nonlinear, uncertain, and correlated states. Meanwhile, the feedback action U_{fb} is provided by a data-driven feedback controller to improve the control performance. Both the probabilistic model and the feedback controller are updated online, allowing the soft robot to adapt to unknown environments. MPC, model predictive control.
The approach is an optimized combination of a feedforward and a feedback controller with the aim of minimizing the tracking error. On the one hand, the feedforward controller is based on the principle of model predictive control (MPC). A probabilistic model is used to explicitly account for the nonlinearity and uncertainty characteristics of the soft robot–environment interaction system. Meanwhile, the model can capture the correlations among multiple outputs, as force and position are correlated during interactions. Based on the probabilistic model, a feedforward action is computed by MPC considering the uncertainty propagation in multistep predictions. On the other hand, an additional feedback controller is trained by system data to improve control performance. Both the probabilistic model and the feedback controller are updated online, allowing the soft robot to adapt to unknown environments.
In summary, this study contributes to the learning and control strategies of soft robots by:
proposing a learning-based control approach for soft robot–environment interactions, supported by theoretical stability and convergence analysis; presenting uncertainty propagation of probabilistic models for multistep prediction, which considers output correlations; validating the control performance through various experimental tests and comparisons with other methods.
State of the art
Learning-based control for soft robots has attracted increasing research interest in the last decade. 7 Previous studies have explored popular deep learning and reinforcement learning methods17–20 for controlling soft robots, which have shown promising results in overcoming the complex dynamics of soft robots. However, these methods typically require large amounts of data to achieve good control performance and lack stability analysis.
On the contrary, few effective control strategies have been explored for soft robots to achieve simultaneous position and force control during interactions with the environment. 21 Researchers in Refs.22,23 developed hybrid position/force control methods, while studies in Refs.24,25 used impedance control for continuum/soft robots. However, these studies require analytical models that are difficult to derive for soft robotic manipulators without making significant simplifications. Alternatively, a study in Ref. 26 developed a model-less hybrid position/force control method that learns the Jacobian of a continuum manipulator online. However, this study did not consider system uncertainties. Other studies in Refs.27,28 separately controlled the position and force of a soft pneumatic actuator but not in a simultaneous way. These related studies used different soft manipulators from ours. Researchers in Refs.18,29 used the same soft manipulator, but they only controlled positions of the soft manipulator in free space without interacting with the environment.
Related studies to our approach are the Gaussian process (GP) and locally weighted projection regression (LWPR) control methods. Soft robotic researchers in Refs.30,31 usually used the GP to model the soft actuator's forward kinematics, while researchers in Refs.32,33 used LWPR to learn inverse kinematics of soft robots. The output dimensions of these studies were multiple, including either 3D positions or pose stiffness. These studies typically assumed that each output dimension was independent. However, our study considers correlations between outputs.
Furthermore, our previous work 34 made some initial attempts to control the position and force of a soft robotic arm.
The main differences between this study and our previous work are as follows: (1) This study considers multistep prediction, which predicts a sequence of the model outputs into the future, whereas our previous work used only one-step-ahead model prediction where only a single future point is predicted. This is a significant difference because multistep-ahead prediction considers what is likely to happen in the future steps, and thus, the cost function can be designed to consider not only the immediate cost but also distant future costs. By optimizing the multistep predictive cost function, the robot can adjust its actions in time (even in advance) to improve its performance in tracking reference trajectories. However, the one-step predictive cost function can only consider the immediate cost, causing the robot's actions to be greedy and aggressive.
(2) This study gives the theoretical analysis of stability and convergence, whereas our previous work did not. (3) This study uses an optimized combination of feedforward and feedback controllers, whereas our previous work did not. (4) This study is validated by much richer experiments and detailed comparisons, whereas our previous work was performed on limited demonstrations.
Methods
In this section, we first introduce the design and actuation of a soft robotic manipulator. Afterward, we present key components of the proposed control approach, leaving detailed mathematical derivations to the Supplementary Data for clarity and readability.
Structure of the soft robotic manipulator
In this study, we use a soft robotic manipulator developed at Sant'Anna School of Advanced Studies by Manti and Cianchetti. 13 Since the contribution of this work is on the control approach, we briefly give an overview of the soft robotic manipulator's design structure. Detailed manufacturing processing and robot functionalities can be found in Refs.13,14 The soft robotic manipulator is composed of two identical modules, as shown in Figure 3. Each module includes a combination of three pairs of McKibben-based actuators and three cables. The combination of pneumatic actuators and tendons enables shortening, elongation, and bending movements.

Design structure of the soft robotic manipulator that consists of a proximal module and a distal module, driven by pneumatic actuators and cables. The cross-sectional view shows the arrangement of actuators, where r1 = 60 mm, r2 = 30 mm, and r3 = 15 mm. P1 to P6 are for each pair of pneumatic chambers, while C1 to C6 are actuation cables. P1–P3 and C1–C3 are for the proximal module, while P4–P6 and C4–C6 are for the distal module. A layered reinforcement structure (i.e., the yellow disk) is inserted along the entire module to constrain unwanted lateral and torsional movement. The total length of the soft robotic manipulator is 375 mm and the total weight is 220 g.
Feedforward controller
MPC is a popular control framework to develop optimal control policies by minimizing a receding-horizon cost function with constraint considerations.
35
In this study, we apply the principle of MPC to design the feedforward controller. In particular, the forward control action is acquired by minimizing a finite-horizon cost:
where
We assume that the system dynamics satisfies Markov property, that is, the next state
There exist four major issues in the above optimization problem that we need to deal with: (1) cost function design; (2) the dynamic model, (3) uncertainty propagation in multistep prediction; and (4) optimal action solver. The following subsections present our attempts at these issues.
Cost function
In this study, we design the stage cost as a saturating function because previous studies36,39 have shown that a saturating cost function is better at dealing with state uncertainty and system noise than a commonly used quadratic cost function. The stage cost is given as follows:
On the contrary, the terminal cost function is designed as follows:
such that we have the following:
Using such a terminal cost can make the finite horizon cost perform similarly to an infinite horizon cost. This can also avoid much effort in tuning the finite horizon parameter (i.e., T) and huge computation in infinite-horizon prediction. Detailed equations of the expected cost function can be found in Appendix A in the Supplementary Data.
Dynamic model
We model the system dynamics f using the multitask GP. 40 Previous studies41,42 have shown that an arbitrarily small uniform error bound can be obtained for GP regression models. Researchers in Refs.30,31 also showed good modeling performance by using GP to model soft robotic systems. GP is a distribution over functions rather than specific data and thus can model non-Gaussian data distribution. 43 Besides, an important property of multitask GP is that the joint Gaussian distribution over outputs is not block-diagonal, so that observations of one output can affect the predictions on another output. A correlation matrix is introduced in the multitask GP model to describe the multimodal behavior. The multitask GP model can well capture the dynamics of our soft robotic system.
The GP model is determined by its mean and covariance function. Since we assume no prior knowledge of system dynamics, we take the mean function to be zero, following the popular choice.
43
The covariance function uses a squared exponential kernel for smooth transition. Given a training set
where
The detailed formulas of model learning, and the advantages and limitations of the GP model can be found in Appendix B in the Supplementary Data.
Uncertainty propagation
The cost function involves multistep prediction where the uncertainty propagation should be considered because the predicted state by a GP model is a Gaussian distribution rather than a single point. Suppose that the input v is a Gaussian distribution
Generally, the integral Equation (6) is analytically intractable due to the complexity of nonlinear mapping through a GP. The principle of moment matching
44
can be used to estimate the first- and second-order moments (i.e., mean
Detailed equations of moment matching can be found in Appendix C and its performance is analyzed in Appendix H in the Supplementary Data.
Gradient-based optimal solver
Since the state xt is a random variable, the original cost function becomes a stochastic one and we need to optimize the expected cost:
To find the optimal solution of Equation (8), we compute gradient of the expected cost. Based on chain rule, the gradient of stage cost can be expressed as follows:
The gradient of terminal cost has similar expressions to Equation (9). Gradient-based methods can be utilized to find the optimal action such as the fmincon function provided by the MATLAB optimization toolbox. Meanwhile, a global search algorithm is utilized to avoid local optimum. 46 Detailed equations of the gradient can be found in Appendix D in the Supplementary Data.
Feedback controller
The feedforward controller solves a weighted multiobjective optimization problem. However, the solution may not achieve each target perfectly. To compensate for the feedforward control actions, we further build a feedback controller to improve control performance. In this study, the design of the feedback controller is based on the principle of LWPR method.
47
LWPR is a nonparametric local learning method that can well handle nonlinear, high-dimensional, and adaptation regression problems and it can be quickly updated with online data. Given the input
Each local model
Control algorithm
Based on the feedforward and feedback controllers, we get the final control actions:
where
where
The GP model and LWPR will be online updated to enable adaptation to environmental changes. In particular, a “forgetting” strategy is used to replace the most uninformative data by the fresh data points while keeping the size of training data to be constant.
48
We choose to forget the
where K is the same as in Equation (5), which is the matrix of covariance between all data points. The complete procedure of control algorithm is described in Table 1.
Proposed Learning-Based Control Algorithm for Soft Robot
Stability and convergence analysis
A major concern of learning-based control is how to prove its stability and convergence. 37 This analysis attempts to provide theoretical guarantees for the proposed approach. Input-to-state stability is widely used to study the stability of nonlinear control systems. 49 Since a probabilistic model is used in our approach, we prove that our system is stochastic input-to-state stable (SISS).
Besides system stability, we further analyze the tracking error convergence of the proposed control approach and have the following theorem:
where
Results
This section presents real-world experimental validations of the proposed approach. The objective of our experiment is to test the force/position tracking capability of our control approach during the soft robot–environment interaction and to compare its performance with other methods. The experiments are designed to simulate showering assistance with a focus on wiping by a soft robotic manipulator. Good wiping needs to be investigated through user studies to determine the desired position and force targets for wiping the human body in real-life showering scenarios, which is beyond the scope of this article. However, this work prepares the robot's ability to tracking targets.
The experimental setup is shown in Figure 4. The soft robotic manipulator was horizontally placed instead of commonly vertical installation because it was easier to be installed in practical situations such as the wall in a bathing room and had a larger workspace to wipe the human body. Our previous work used vertical installation but could be only successfully operated in a relatively small area. During the experiment, we controlled the tip position in the X and Z positions and the tip force applied in the Y direction (orthogonal force). The orthogonal force was measured by projecting the force sensor value in the Y direction. Thus, the state dimension was 3 (M = 3). Meanwhile, we actively control four necessary cables due to the experiment and robot configurations, which means the control action dimension was 4 (D = 4).

Experimental setup. The soft robotic manipulator is supported by an aluminum frame. One end of the soft robotic manipulator is fixed as the coordinate origin. The Y-axis pointed to the manikin and the Z-axis pointed to the ground. The other end of soft robotic manipulator is covered by a ball-like bath sponge, the center of which is the PoI. A force sensor (ATI Nano17) is embedded in the bath ball to detect contact force when the soft robotic manipulator interacts with the manikin. The air compressor supplies air pressure for pneumatic actuators. The control box contains all electronic components, including proportional valves and Arduino boards. An RGB-D camera (RealSense D435) is fixed at the top of the soft robotic manipulator to detect the spatial position of PoI. The control algorithm is implemented on a personal computer. PoI, point of interest.
Moreover, prediction horizon was 4 (T = 4), convergence rate parameter was 0.7 (
Experimental results
During experimentation, three different interaction cases were designed to test the proposed method on different paths and forces, as well as its adaptivity to different contact surfaces. The experiments were not intended to cover all possible examples in an actual showering process, but to test the controller's force/position tracking capability in several typical cases, which included the following.
Case 1: wiping the manikin's back along a horizontal path with a constant force.
Case 2: wiping the manikin's back along an inclined path with varying forces.
Case 3: wiping the manikin's chest along a horizontal path with a constant force.
Case 1 and Case 2 were designed to test the proposed method on different paths and forces. To test the online learning ability of the proposed approach, we designed Case 3 where the soft robotic manipulator was controlled to interact with the back and chest of a manikin. The manikin's back has a rigid, smooth, and flat surface, while its chest has a soft, unsmooth, and uneven surface. During experimentation, we only collected offline data of the manikin's back. We used these offline data to initialize the model. When testing on the manikin's chest, the model was updated online to help the robot adapt to the chest surface. Tracking targets were set according to the three cases and were feasible within the robot's workspace. In each case test, the model was initialized with the same offline data set and then updated online to adapt to the current task.
The experimental results of Case 1 are shown in Figure 5a–d. From Figure 5b we could see that the position along X-coordinate gradually reached the target trajectory. Meanwhile, Figure 5c shows that the Z-coordinate position could keep stable within 1 mm error and Figure 5d shows that the contact force could maintain around 0.5N within 0.05N error. For the experiment Case 2, results are shown in Figure 5e–h and demonstrate that both position and force could gradually follow target trajectories. The tracking error was more obvious before 17 s than 17 s later. This is mainly because of online learning that the tracking error decreased. The online learning also helped the soft robotic manipulator adapt to the uneven and nonsmooth surface, as depicted by the experimental results of Case 3 in Figure 5i–l.

Experimental results of the soft robotic manipulator interacting with the manikin in three different cases. Each row is the result of one case.
As the surface structures were previously unknown, the position and force tracking errors were relatively obvious in the initial phase (before 17 s) when the soft robotic manipulator contacted the manikin's chest. With the dynamic model and feedback controller updated by online data, the soft robotic manipulator could gradually adapt to the chest surface and eventually the tracking error was within 5 mm for position and 0.05 N for force.
The results of all cases demonstrated that the proposed approach could control the soft robotic manipulator to interact with the environment for force/position tracking and online adapt to a previously unknown environment. A Supplementary Video S1 also shows the tracking performance of the robot in all three cases.
Comparison with other methods
After showing the experimental results of different cases, we further analyze the performance of our control approach by comparing it with other methods. Researchers in Ref. 36 proposed a probabilistic inference for learning control (PILCO) algorithm that was also based on GP models. The PILCO algorithm is one of the state-of-the-art GP-based learning control methods. It assumed independent GPs to model each output dimension, ignoring correlations between outputs. In this study, we compare our approach with the PILCO algorithm. The PILCO open-source code 36 was utilized to execute the PILCO algorithm, where its cost width was 1, sampling period was 0.1 s, prediction horizon was 1 s, and the number of basis functions was 100.
In addition, we compare the approach of this study with our previous method,
34
where its weight coefficient was 0.1, cost width was 0.01, and feedback gain was 0.1. Moreover, feedforward action only (i.e.,
Comparison experiments were conducted to repeat the abovementioned three cases by the other methods. The comparison results are shown in Figure 6. From the figures we could observe that PILCO had larger overshoots and more obvious oscillations than other methods during position and force tracking. Our previous method showed better performance than PILCO and feedforward only, but worse than the proposed approach. The proposed approach had smaller overshoots, less response time, and more stable performance than other methods.

Tracking performance comparison of different methods in the designed three soft robot–environment interaction cases. Each row is the comparison results of one case.
To quantitatively compare the tracking performance of different controllers, we also calculate the root-mean-square error (RMSE) between actual and target states. The RMSE values are listed in Table 2. The error values also match with intuitive observations of figures in that our approach had the smallest position and force errors (less than half of PILCO). By combining the results in figures and tables, we observed that our approach could have better control performance than the other methods.
Tracking Error Comparison of Different Controllers by Root Mean Square
PILCO, probabilistic inference for learning control.
Discussion
We attribute the advantages of our approach over the other three methods to the following three aspects: (1) modeling of the system nonlinearity, uncertainty, and output correlations; (2) multistep uncertainty propagation with carefully designed cost function; and (3) optimized combination of feedforward and feedback controllers and online update strategy. Our previous method only made one-step-ahead model prediction. The one-step predictive cost function can only consider the immediate cost, causing the robot's actions to be greedy and aggressive. However, the approach in this study can make multistep-ahead prediction, which considers what is likely to happen in the future steps, and thus, the cost function can be designed to consider not only the immediate cost but also distant future costs. By optimizing the multistep predictive cost function, the robot can adjust its actions in time (even in advance) to improve its performance in tracking reference trajectories.
As a result, the robot response with multistep prediction is usually more accurate and stable than that with one-step prediction. Moreover, both the proposed approach and our previous method had smaller errors than the feedforward-only method owing to the benefits of feedback controllers. In addition, the proposed approach incorporates output correlations into the modeling and uses optimal feedforward–feedback combinations to achieve better performance than PILCO. We did simulations to choose proper PILCO parameters for comparison. However, PILCO still showed a poor performance on the real robot. This is mainly because PILCO assumed independent GPs to model each output dimension, but ignored correlations between each output.
In this study, we used the multitask GP to model the system dynamics due to its advantages that include the following: (1) the multitask GP can explicitly consider system nonlinearity, uncertainty, and correlation among outputs; (2) the GP model is data-efficient and can provide accurate predictions with a small amount of data. This is especially useful when working with limited data or when collecting data is expensive or time-consuming. State-of-the-art approaches, such as deep learning 50 or reinforcement learning 51 for robotics, usually require thousands of training data to have good performance; and (3) the GP model has fewer hyperparameters to tune and is less computationally expensive than other uncertainty-aware models such as an ensemble of neural networks. 52
On the contrary, the GP model typically suffers from the curse of dimensionality and has an increased computational complexity with more training data. In this study, we used a “forgetting” strategy to keep a fixed amount of informative data. Other possible solutions are to utilize sparse GP regression for selecting inducing data points and use principal component analysis for reducing dimensions.
Some limitations exist in this study. The target workspace in the experiment was limited by the safe actuation ranges of the soft robot (i.e., without damaging the soft robot). However, our approach does not have explicit restrictions on the target space, as long as the control targets are within the robot operation range. In future work, we can extend the workspace by using a more robust soft robotic manipulator. Although the Experimental Results section only shows performance over short periods of time, our approach can be operated over longer periods of time. To demonstrate the long-term performance of our approach, we controlled the soft robot's end effector to wipe a manikin's back along a square trajectory with constant force for a total duration of 120 s.
The results can be seen in Appendix I in the Supplementary Data, which shows that the position could gradually reach the target trajectory and the contact force could be maintained at 0.5 N with an error of around 0.05 N. On the contrary, we assume that the person is in a fixed position in this study. If the person is moving dynamically, the approach may not perform as well as when the person is static. In the meantime, we can optimize the software programming to reduce code execution time and improve hardware components such as sensor and actuation speed to increase the operation frequency. Our future studies will explore how to deal with dynamic interaction problems.
Conclusion
This article proposes a learning-based control approach for soft robot–environment interaction with force/position tracking capability. The approach is data-driven and requires no prior knowledge of the soft robot dynamics and environment structures. It contains a probabilistic model to explicitly account for system nonlinearity, uncertainty, and output correlation issues. Meanwhile, the online learning allows the soft robot to adapt to unknown environments, as validated by experimental results. A theoretical analysis of the approach is provided to ensure stability and convergence. In conclusion, this work contributes to the learning and control strategies of soft robotics and has the potential to be extended to other similar robotic systems.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This work was supported by the National Robotics Program, Singapore, under the Soft and Hybrid Phase 2a project (grant number W2125d0243) and in part by the National Research Foundation, Singapore, under its Medium Sized Centre Programme—Centre for Advanced Robotics Technology Innovation (CARTIN).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
