Learning-Based Control for Soft Robot–Environment Interaction with Force/Position Tracking Capability

Abstract

Soft robotics promises to achieve safe and efficient interactions with the environment by exploiting its inherent compliance and designing control strategies. However, effective control for the soft robot–environment interaction has been a challenging task. The challenges arise from the nonlinearity and complexity of soft robot dynamics, especially in situations where the environment is unknown and uncertainties exist, making it difficult to establish analytical models. In this study, we propose a learning-based optimal control approach as an attempt to address these challenges, which is an optimized combination of a feedforward controller based on probabilistic model predictive control and a feedback controller based on nonparametric learning methods. The approach is purely data-driven, without prior knowledge of soft robot dynamics and environment structures, and can be easily updated online to adapt to unknown environments. A theoretical analysis of the approach is provided to ensure its stability and convergence. The proposed approach enabled a soft robotic manipulator to track target positions and forces when interacting with a manikin in different cases. Moreover, comparisons with other data-driven control methods show a better performance of our approach. Overall, this work provides a viable learning-based control approach for soft robot–environment interactions with force/position tracking capability.

Introduction

Safety and performance are important factors for robot–environment interactions.¹ Soft robotics is widely recognized as enabling safe interaction with the environment due to its inherent compliance.^2–4 However, its passive compliance also limits the tasks that can be performed in certain ecological niches. Appropriate control strategies should be developed to improve its performance.⁵ Most of the existing studies on soft robot control focus on the soft robot itself without considering the interaction with the environment.⁶ This can be attributed to the challenges of controlling soft robot–environment interaction. Soft robotic manipulators have nonlinear and complex dynamics, let alone interact with unknown environments, and uncertainties exist in real systems such as sensor noise. The challenges make classical control methods hard to be directly applied because accurate analytical models are difficult to derive without making significant assumptions/simplifications.⁷

Alternatively, machine learning considers robots as data systems so that data-driven approaches can be applied.^8,9 Machine learning-based control has great advantages in highly nonlinear, nonuniform, and unstructured environment situations.¹⁰ Therefore, the objective of this study is to develop a learning-based control approach for the soft robot–environment interaction. Specifically, we focus on a typical problem in the soft robot–environment interaction, that is, hybrid position/force control, which is usually required in assistive and surgical robots to perform their intended tasks.^11,12 For example, previous studies^13,14 designed soft robotic manipulators to assist elderly people in the showering activity. The motivation for a soft robot to assist in showering originates from elderly care. Elderly people usually need assistance from others to perform daily activities due to age-related conditions and loss of abilities.

According to studies on affected daily activities of older adults,^15,16 showering activity is one of the most critical but high-risk task. Position control allows the robot to wipe specific areas of the body, and force control allows the right amount of force to have comfortable interactions. In this study, our approach uses this shower-assisting soft robot as an example, but is not limited to this robot because the approach is purely data-driven and does not rely on specific robot structures.

The concept of the soft robot–environment interaction in this study is shown in Figure 1. Since the focus of this study is to test the force/position control performance of the proposed approach rather than human clinical trials, a manikin is used as the test environment to simulate the human body and a soft robotic manipulator is controlled to wipe the manikin as a simulation of showering. During the soft robot–environment interaction scenarios, the system has perceptive capabilities such as force and camera sensors to provide force and position data. The data are used to train the learning-based control algorithm. The algorithm can control the soft robotic manipulator to track position and force trajectories when wiping the manikin. The sensors in this study are only used to test the approach and should be tailored for real-world applications to avoid privacy issues and reduce costs.

FIG. 1.

Schematic diagram of the soft robot–environment interaction system. A manikin is used as the test environment and a soft robotic manipulator is controlled to wipe the manikin for simulation of shower assistance. Perception data including force and position are collected to train the controller. The learning-based controller drives the soft robotic manipulator to interact with the manikin through force/position tracking capability. X1, X2, … , X_M are model outputs (position and force in this study). U1, U2, … , U_D are model inputs (robot actuations).

Article contribution

In this study, we propose a learning-based control approach for the soft robot–environment interaction system with force/position tracking capability. Figure 2 shows the general idea of our approach, which is purely data-driven without requiring prior knowledge of the system dynamics. This shows the potential of the approach to be extended to other robotic systems with available data.

FIG. 2.

Schematic diagram of the proposed control approach. The control action U_t is an optimized combination of feedforward action U_{ff} and feedback action U_{fb} to minimize the tracking error. The feedforward action U_{ff} is computed by probabilistic MPC, considering uncertainty propagation in multistep predictions. The probabilistic model can capture nonlinear, uncertain, and correlated states. Meanwhile, the feedback action U_{fb} is provided by a data-driven feedback controller to improve the control performance. Both the probabilistic model and the feedback controller are updated online, allowing the soft robot to adapt to unknown environments. MPC, model predictive control.

The approach is an optimized combination of a feedforward and a feedback controller with the aim of minimizing the tracking error. On the one hand, the feedforward controller is based on the principle of model predictive control (MPC). A probabilistic model is used to explicitly account for the nonlinearity and uncertainty characteristics of the soft robot–environment interaction system. Meanwhile, the model can capture the correlations among multiple outputs, as force and position are correlated during interactions. Based on the probabilistic model, a feedforward action is computed by MPC considering the uncertainty propagation in multistep predictions. On the other hand, an additional feedback controller is trained by system data to improve control performance. Both the probabilistic model and the feedback controller are updated online, allowing the soft robot to adapt to unknown environments.

In summary, this study contributes to the learning and control strategies of soft robots by:

proposing a learning-based control approach for soft robot–environment interactions, supported by theoretical stability and convergence analysis;

presenting uncertainty propagation of probabilistic models for multistep prediction, which considers output correlations;

validating the control performance through various experimental tests and comparisons with other methods.

State of the art

Learning-based control for soft robots has attracted increasing research interest in the last decade.⁷ Previous studies have explored popular deep learning and reinforcement learning methods^17–20 for controlling soft robots, which have shown promising results in overcoming the complex dynamics of soft robots. However, these methods typically require large amounts of data to achieve good control performance and lack stability analysis.

On the contrary, few effective control strategies have been explored for soft robots to achieve simultaneous position and force control during interactions with the environment.²¹ Researchers in Refs.^22,23 developed hybrid position/force control methods, while studies in Refs.^24,25 used impedance control for continuum/soft robots. However, these studies require analytical models that are difficult to derive for soft robotic manipulators without making significant simplifications. Alternatively, a study in Ref.²⁶ developed a model-less hybrid position/force control method that learns the Jacobian of a continuum manipulator online. However, this study did not consider system uncertainties. Other studies in Refs.^27,28 separately controlled the position and force of a soft pneumatic actuator but not in a simultaneous way. These related studies used different soft manipulators from ours. Researchers in Refs.^18,29 used the same soft manipulator, but they only controlled positions of the soft manipulator in free space without interacting with the environment.

Related studies to our approach are the Gaussian process (GP) and locally weighted projection regression (LWPR) control methods. Soft robotic researchers in Refs.^30,31 usually used the GP to model the soft actuator's forward kinematics, while researchers in Refs.^32,33 used LWPR to learn inverse kinematics of soft robots. The output dimensions of these studies were multiple, including either 3D positions or pose stiffness. These studies typically assumed that each output dimension was independent. However, our study considers correlations between outputs.

Furthermore, our previous work³⁴ made some initial attempts to control the position and force of a soft robotic arm.

The main differences between this study and our previous work are as follows: (1) This study considers multistep prediction, which predicts a sequence of the model outputs into the future, whereas our previous work used only one-step-ahead model prediction where only a single future point is predicted. This is a significant difference because multistep-ahead prediction considers what is likely to happen in the future steps, and thus, the cost function can be designed to consider not only the immediate cost but also distant future costs. By optimizing the multistep predictive cost function, the robot can adjust its actions in time (even in advance) to improve its performance in tracking reference trajectories. However, the one-step predictive cost function can only consider the immediate cost, causing the robot's actions to be greedy and aggressive.

(2) This study gives the theoretical analysis of stability and convergence, whereas our previous work did not. (3) This study uses an optimized combination of feedforward and feedback controllers, whereas our previous work did not. (4) This study is validated by much richer experiments and detailed comparisons, whereas our previous work was performed on limited demonstrations.

Methods

In this section, we first introduce the design and actuation of a soft robotic manipulator. Afterward, we present key components of the proposed control approach, leaving detailed mathematical derivations to the Supplementary Data for clarity and readability.

Structure of the soft robotic manipulator

In this study, we use a soft robotic manipulator developed at Sant'Anna School of Advanced Studies by Manti and Cianchetti.¹³ Since the contribution of this work is on the control approach, we briefly give an overview of the soft robotic manipulator's design structure. Detailed manufacturing processing and robot functionalities can be found in Refs.^13,14 The soft robotic manipulator is composed of two identical modules, as shown in Figure 3. Each module includes a combination of three pairs of McKibben-based actuators and three cables. The combination of pneumatic actuators and tendons enables shortening, elongation, and bending movements.

FIG. 3.

Design structure of the soft robotic manipulator that consists of a proximal module and a distal module, driven by pneumatic actuators and cables. The cross-sectional view shows the arrangement of actuators, where r₁ = 60 mm, r₂ = 30 mm, and r₃ = 15 mm. P1 to P6 are for each pair of pneumatic chambers, while C1 to C6 are actuation cables. P1–P3 and C1–C3 are for the proximal module, while P4–P6 and C4–C6 are for the distal module. A layered reinforcement structure (i.e., the yellow disk) is inserted along the entire module to constrain unwanted lateral and torsional movement. The total length of the soft robotic manipulator is 375 mm and the total weight is 220 g.

Feedforward controller

MPC is a popular control framework to develop optimal control policies by minimizing a receding-horizon cost function with constraint considerations.³⁵ In this study, we apply the principle of MPC to design the feedforward controller. In particular, the forward control action is acquired by minimizing a finite-horizon cost: $\begin{matrix} U_{t}^{*} = {arg min}_{U_{t}} \sum_{τ = t + 1}^{t + 1 + T} γ^{t} C_{t} (x_{t}) + C_{T} (x_{T}) \\ s . t . \\ x_{t + 1} = f (x_{t}, u_{t}) + ω_{t} \\ U_{t} = [u_{t}, u_{t + 1}, \cdot \cdot \cdot, u_{t + T}] \\ u_{m i n} \leq u_{t} \leq u_{m a x} \end{matrix},$ (1)

where $C_{t} (x_{t})$ is a stage cost and $C_{T} (x_{T})$ is a terminal cost, $0 < γ < 1$ is a discount factor so that costs in the distant future are weighted less than costs in the immediate future. $u_{t} \in R^{D}$ is control action (pneumatic pressure and cable length for our soft robot). $x_{t} \in R^{M}$ is state variable (position and force in this study) and we assume the state is fully observable. $ω_{t} \in R^{M}$ is Gaussian system noise. $T$ is the prediction horizon. $U_{t} = [u_{t}, u_{t + 1}, \cdot \cdot \cdot, u_{t + T}]$ is the action over the finite horizon T. $U_{t}^{*}$ is the optimal action. $f :$ $R^{M} \times R^{D} \to R^{M}$ is a multi-input multioutput model to express the robot behavior. $x_{t + 1} = f (x_{t}, u_{t}) + ω_{t}$ is a general discrete-time modeling principle for robot control tasks, as shown in previous studies.^36,37

We assume that the system dynamics satisfies Markov property, that is, the next state $x_{t + 1}$ at time instance t + 1 depends only on the current state x_t and action u_t at time instance t. This is a general assumption for modeling and solving robot control tasks under uncertainty.³⁸ Based on the principle of MPC, the control input at the first time step is taken to actually execute.

There exist four major issues in the above optimization problem that we need to deal with: (1) cost function design; (2) the dynamic model, (3) uncertainty propagation in multistep prediction; and (4) optimal action solver. The following subsections present our attempts at these issues.

Cost function

In this study, we design the stage cost as a saturating function because previous studies^36,39 have shown that a saturating cost function is better at dealing with state uncertainty and system noise than a commonly used quadratic cost function. The stage cost is given as follows: $C_{t} (x_{t}) = 1 - exp (- \frac{1}{2} {(x_{t} - x_{t}^{*})}^{T} Λ^{- 1} (x_{t} - x_{t}^{*})),$ (2)

$x_{t}^{*}$ is the target state. $Λ \in R^{M \times M}$ is a state weight matrix and is diagonal positive-definite. The elements of $Λ$ can be selected to have different importance on each state dimension.

On the contrary, the terminal cost function is designed as follows: $C_{T} (x_{T}) = 1 + \frac{γ^{T}}{1 - γ} - exp (- \frac{1}{2} {(x_{T} - x_{T}^{*})}^{T} Λ^{- 1} (x_{T} - x_{T}^{*})),$ (3)

such that we have the following: $\sum_{t = 0}^{T - 1} γ^{t} C_{t} (x_{t}) + C_{T} (x_{T}) \geq \sum_{t = 0}^{\infty} γ^{t} C_{t} (x_{t}) .$ (4)

Using such a terminal cost can make the finite horizon cost perform similarly to an infinite horizon cost. This can also avoid much effort in tuning the finite horizon parameter (i.e., T) and huge computation in infinite-horizon prediction. Detailed equations of the expected cost function can be found in Appendix A in the Supplementary Data.

Dynamic model

We model the system dynamics f using the multitask GP.⁴⁰ Previous studies^41,42 have shown that an arbitrarily small uniform error bound can be obtained for GP regression models. Researchers in Refs.^30,31 also showed good modeling performance by using GP to model soft robotic systems. GP is a distribution over functions rather than specific data and thus can model non-Gaussian data distribution.⁴³ Besides, an important property of multitask GP is that the joint Gaussian distribution over outputs is not block-diagonal, so that observations of one output can affect the predictions on another output. A correlation matrix is introduced in the multitask GP model to describe the multimodal behavior. The multitask GP model can well capture the dynamics of our soft robotic system.

The GP model is determined by its mean and covariance function. Since we assume no prior knowledge of system dynamics, we take the mean function to be zero, following the popular choice.⁴³ The covariance function uses a squared exponential kernel for smooth transition. Given a training set $D$ that contains N number of data, the GP model's predicted state $x_{t + 1}$ for a deterministic input $v_{t} = [x_{t}, u_{t}]$ is Gaussian distributed with the mean $μ_{t}$ and variance $Σ_{t}$ as follows: $\begin{matrix} P (x_{t + 1} | v_{t}, D) = N (x_{t + 1} | μ_{t}, Σ_{t}) \\ μ_{t} = {(K^{f} ⨂ k^{*})}^{T} K^{- 1} Y_{N} \\ Σ_{t} = (K^{f} ⨂ k^{* *}) - {(K^{f} ⨂ k^{*})}^{T} K^{- 1} (K^{f} ⨂ k^{*}) \end{matrix},$ (5)

where $⨂$ denotes the Kronecker product, K^f is the output correlation matrix, K is the matrix of covariance between all pairs of training points, $k^{*}$ is the vector of covariance between the input and all the training points, $Y_{N}$ is the training data, and $k^{* *}$ is the covariance of input point itself.

The detailed formulas of model learning, and the advantages and limitations of the GP model can be found in Appendix B in the Supplementary Data.

Uncertainty propagation

The cost function involves multistep prediction where the uncertainty propagation should be considered because the predicted state by a GP model is a Gaussian distribution rather than a single point. Suppose that the input v is a Gaussian distribution $P (v | η, S) = N (v | η, S)$ . Then the predicted state is obtained by integrating over the input distribution as follows: $P (x) = \int P (x | v, D) P (v | η, S) d v .$ (6)

Generally, the integral Equation (6) is analytically intractable due to the complexity of nonlinear mapping through a GP. The principle of moment matching⁴⁴ can be used to estimate the first- and second-order moments (i.e., mean $m (x)$ and variance $V (x)$ ) of $P (x)$ . Previous studies^36,45 have dealt with uncertainty propagation by assuming output independence, while our work considers the output correlations. The mean $m (x)$ and variance $V (x)$ are computed by the law of iterated expectations and conditional variances, respectively. $\begin{matrix} m (x) = E [E [x | v]] = E [μ (v)] \\ V (x) = E [V a r (x | v)] + V a r (E [x | v]) \\ = E [Σ (v)] + E [μ (v) μ {(v)}^{T}] - m (x) m {(x)}^{T} \end{matrix} .$ (7)

Detailed equations of moment matching can be found in Appendix C and its performance is analyzed in Appendix H in the Supplementary Data.

Gradient-based optimal solver

Since the state x_t is a random variable, the original cost function becomes a stochastic one and we need to optimize the expected cost: $U^{*} = {arg min}_{U} \sum_{t = 0}^{T - 1} γ^{t} E [C_{t} (x_{t})] + E [C_{T} (x_{T})] .$ (8)

To find the optimal solution of Equation (8), we compute gradient of the expected cost. Based on chain rule, the gradient of stage cost can be expressed as follows: $\begin{matrix} \frac{\partial}{\partial u_{t}} \sum_{t = 0}^{T - 1} γ^{t} E [C_{t} (x_{t})] = \sum_{t = 0}^{T - 1} γ^{t} \frac{\partial E [C_{t} (x_{t})]}{\partial u_{t}} \\ \frac{\partial E [C_{t} (x_{t})]}{\partial u_{t}} = \frac{\partial E [C_{t} (x_{t})]}{\partial μ_{t}} \frac{\partial μ_{t}}{\partial u_{t}} + \frac{\partial E [C_{t} (x_{t})]}{\partial Σ_{t}} \frac{\partial Σ_{t}}{\partial u_{t}} \end{matrix} .$ (9)

The gradient of terminal cost has similar expressions to Equation (9). Gradient-based methods can be utilized to find the optimal action such as the fmincon function provided by the MATLAB optimization toolbox. Meanwhile, a global search algorithm is utilized to avoid local optimum.⁴⁶ Detailed equations of the gradient can be found in Appendix D in the Supplementary Data.

Feedback controller

The feedforward controller solves a weighted multiobjective optimization problem. However, the solution may not achieve each target perfectly. To compensate for the feedforward control actions, we further build a feedback controller to improve control performance. In this study, the design of the feedback controller is based on the principle of LWPR method.⁴⁷ LWPR is a nonparametric local learning method that can well handle nonlinear, high-dimensional, and adaptation regression problems and it can be quickly updated with online data. Given the input $z_{t} = [x_{t}^{*} - x_{t}, x_{t}, u_{t}]$ , the feedback action is as follows: $u_{t}^{f b} = \frac{\sum_{r} w_{r} (z_{t}) Ψ_{r} (z_{t})}{\sum_{r} w_{r} (z_{t})} .$ (10)

Each local model $Ψ_{r}$ is weighted by a local kernel w_r and the predicted output is the normalized weighted mean of all local models. The number r of local models is adapted automatically. Detailed formulas of the feedback controller can be found in Appendix E in the Supplementary Data.

Control algorithm

Based on the feedforward and feedback controllers, we get the final control actions: $u_{t} = u_{t}^{f f} + G^{*} u_{t}^{f b},$ (11)

where $G^{*} \in R^{D \times D}$ is a diagonal matrix and an optimal feedback gain, obtained by minimizing the predicted tracking error: $\begin{matrix} G^{*} = {arg min}_{G} E [∥ x_{t}^{p} - x_{t}^{*} ∥] \\ P (x_{t}^{p} | u_{t}, x_{t - 1}) = N (μ_{t}, \sum_{t}) \\ ∥ μ_{t} - x_{t}^{*} ∥ \leq λ ∥ x_{t - 1} - x_{t - 1}^{*} ∥ \\ u_{m i n} \leq u_{t}^{f f} + G u_{t}^{f b} \leq u_{m a x} \end{matrix},$ (12)

where $x_{t}^{p}$ is the model predicted state, and $0 < λ < 1$ is a convergence rate parameter. The numerical solution of optimization problem in Equation (12) is given to the feedback gain G by fmincon function in MATLAB. The effect of $λ$ on control performance is analyzed in Appendix H in the Supplementary Data.

The GP model and LWPR will be online updated to enable adaptation to environmental changes. In particular, a “forgetting” strategy is used to replace the most uninformative data by the fresh data points while keeping the size of training data to be constant.⁴⁸ We choose to forget the $i_{*}$ data point that is the most similar to other data points in terms of the covariance: $i_{*} = {arg min}_{i} \sum_{j} K [i, j],$ (13)

where K is the same as in Equation (5), which is the matrix of covariance between all data points. The complete procedure of control algorithm is described in Table 1.

Table 1.

Proposed Learning-Based Control Algorithm for Soft Robot

1: Initialize: Apply random control actions and collect initial data set

2: Train dynamic model by multitask Gaussian process (GP) using the initial data set

3: Train feedback controller by locally weighted projection regression (LWPR) using the initial data set

4: Repeat:

5: Give control target

6: Compute the feedforward action by the multitask GP-based model predictive controller

7: Compute feedback action by the LWPR controller

8: Apply final control action that is an optimized combination of feedforward and feedback actions

9: Collect state-action data pair and update the GP model and LWPR

10: Until the target ends

Stability and convergence analysis

A major concern of learning-based control is how to prove its stability and convergence.³⁷ This analysis attempts to provide theoretical guarantees for the proposed approach. Input-to-state stability is widely used to study the stability of nonlinear control systems.⁴⁹ Since a probabilistic model is used in our approach, we prove that our system is stochastic input-to-state stable (SISS).

Theorem 1. There exists an SISS-Lyapunov function such that the system state x_t is bounded by the input u with a high probability $U \in (0, 1)$ , that is, $P \{∥ x_{t} ∥ < β (∥ x_{0} ∥, t) + φ (∥ u ∥)\} \geq 1 - U,$ (14)

$φ (∥ u ∥)$ is a function of class $K$ that $φ (∥ u ∥)$ is strictly increasing with respect to $∥ u ∥$ and $φ (0) = 0$ . $β (∥ x_{0} ∥, t)$ is of class $K$ in the first argument $∥ x_{0} ∥$ for each fixed $t \geq 0$ and $β (∥ x_{0} ∥, t)$ decreases to 0 as $t \to + \infty$ for each fixed $∥ x_{0} ∥$ . The proof of Theorem 1 can be found in Appendix F in the Supplementary Data.

Besides system stability, we further analyze the tracking error convergence of the proposed control approach and have the following theorem:

Theorem 2. The tracking error can converge to an error ball with probability at least $1 - δ$ , that is, $P \{{lim}_{t \to \infty} ∥ x_{t} - x^{*} ∥ \leq \frac{ξ (δ)}{1 - λ}\} \geq 1 - δ,$ (15)

where $x^{*}$ is the target state, $ξ (δ)$ is a function of probability $δ$ , and $λ$ is the convergence rate in Equation (12). The proof of Theorem 2 can be found in Appendix G in the Supplementary Data.

Results

This section presents real-world experimental validations of the proposed approach. The objective of our experiment is to test the force/position tracking capability of our control approach during the soft robot–environment interaction and to compare its performance with other methods. The experiments are designed to simulate showering assistance with a focus on wiping by a soft robotic manipulator. Good wiping needs to be investigated through user studies to determine the desired position and force targets for wiping the human body in real-life showering scenarios, which is beyond the scope of this article. However, this work prepares the robot's ability to tracking targets.

The experimental setup is shown in Figure 4. The soft robotic manipulator was horizontally placed instead of commonly vertical installation because it was easier to be installed in practical situations such as the wall in a bathing room and had a larger workspace to wipe the human body. Our previous work used vertical installation but could be only successfully operated in a relatively small area. During the experiment, we controlled the tip position in the X and Z positions and the tip force applied in the Y direction (orthogonal force). The orthogonal force was measured by projecting the force sensor value in the Y direction. Thus, the state dimension was 3 (M = 3). Meanwhile, we actively control four necessary cables due to the experiment and robot configurations, which means the control action dimension was 4 (D = 4).

FIG. 4.

Experimental setup. The soft robotic manipulator is supported by an aluminum frame. One end of the soft robotic manipulator is fixed as the coordinate origin. The Y-axis pointed to the manikin and the Z-axis pointed to the ground. The other end of soft robotic manipulator is covered by a ball-like bath sponge, the center of which is the PoI. A force sensor (ATI Nano17) is embedded in the bath ball to detect contact force when the soft robotic manipulator interacts with the manikin. The air compressor supplies air pressure for pneumatic actuators. The control box contains all electronic components, including proportional valves and Arduino boards. An RGB-D camera (RealSense D435) is fixed at the top of the soft robotic manipulator to detect the spatial position of PoI. The control algorithm is implemented on a personal computer. PoI, point of interest.

Moreover, prediction horizon was 4 (T = 4), convergence rate parameter was 0.7 ( $λ$ = 0.7), discount factor was 0.9 ( $γ$ = 0.9), and state weight was an identity matrix ( $Λ$ = I). One hundred offline data points were collected by providing random control actions for the soft robotic manipulator to interact with the manikin's back. The number of data points was unchanged by using the online update strategy in Equation (13), which was an empirical trade-off between computational complexity and model accuracy for our robot. Detailed descriptions about the software and hardware setup are provided in Appendix I in the Supplementary Data.

Experimental results

During experimentation, three different interaction cases were designed to test the proposed method on different paths and forces, as well as its adaptivity to different contact surfaces. The experiments were not intended to cover all possible examples in an actual showering process, but to test the controller's force/position tracking capability in several typical cases, which included the following.

Case 1: wiping the manikin's back along a horizontal path with a constant force.

Case 2: wiping the manikin's back along an inclined path with varying forces.

Case 3: wiping the manikin's chest along a horizontal path with a constant force.

Case 1 and Case 2 were designed to test the proposed method on different paths and forces. To test the online learning ability of the proposed approach, we designed Case 3 where the soft robotic manipulator was controlled to interact with the back and chest of a manikin. The manikin's back has a rigid, smooth, and flat surface, while its chest has a soft, unsmooth, and uneven surface. During experimentation, we only collected offline data of the manikin's back. We used these offline data to initialize the model. When testing on the manikin's chest, the model was updated online to help the robot adapt to the chest surface. Tracking targets were set according to the three cases and were feasible within the robot's workspace. In each case test, the model was initialized with the same offline data set and then updated online to adapt to the current task.

The experimental results of Case 1 are shown in Figure 5a–d. From Figure 5b we could see that the position along X-coordinate gradually reached the target trajectory. Meanwhile, Figure 5c shows that the Z-coordinate position could keep stable within 1 mm error and Figure 5d shows that the contact force could maintain around 0.5N within 0.05N error. For the experiment Case 2, results are shown in Figure 5e–h and demonstrate that both position and force could gradually follow target trajectories. The tracking error was more obvious before 17 s than 17 s later. This is mainly because of online learning that the tracking error decreased. The online learning also helped the soft robotic manipulator adapt to the uneven and nonsmooth surface, as depicted by the experimental results of Case 3 in Figure 5i–l.

FIG. 5.

Experimental results of the soft robotic manipulator interacting with the manikin in three different cases. Each row is the result of one case. (a–d) The results of Case 1 where the soft robot is controlled to wipe the manikin's back along a horizontal path with a constant force. (e–h) The results of Case 2 where the soft robotic manipulator is controlled to wipe the manikin's back along an inclined path with varying forces. (i–l) The results of Case 3 where the soft robotic manipulator is controlled to wipe the manikin's chest along a horizontal path with a constant force.

As the surface structures were previously unknown, the position and force tracking errors were relatively obvious in the initial phase (before 17 s) when the soft robotic manipulator contacted the manikin's chest. With the dynamic model and feedback controller updated by online data, the soft robotic manipulator could gradually adapt to the chest surface and eventually the tracking error was within 5 mm for position and 0.05 N for force.

The results of all cases demonstrated that the proposed approach could control the soft robotic manipulator to interact with the environment for force/position tracking and online adapt to a previously unknown environment. A Supplementary Video S1 also shows the tracking performance of the robot in all three cases.

Comparison with other methods

After showing the experimental results of different cases, we further analyze the performance of our control approach by comparing it with other methods. Researchers in Ref.³⁶ proposed a probabilistic inference for learning control (PILCO) algorithm that was also based on GP models. The PILCO algorithm is one of the state-of-the-art GP-based learning control methods. It assumed independent GPs to model each output dimension, ignoring correlations between outputs. In this study, we compare our approach with the PILCO algorithm. The PILCO open-source code³⁶ was utilized to execute the PILCO algorithm, where its cost width was 1, sampling period was 0.1 s, prediction horizon was 1 s, and the number of basis functions was 100.

In addition, we compare the approach of this study with our previous method,³⁴ where its weight coefficient was 0.1, cost width was 0.01, and feedback gain was 0.1. Moreover, feedforward action only (i.e., $u_{t} = u_{t}^{f f}$ ) provided by a multitask GP was utilized to see the benefits of additional feedback action. All methods were initially trained using the same offline data set and run on the same software and hardware platform.

Comparison experiments were conducted to repeat the abovementioned three cases by the other methods. The comparison results are shown in Figure 6. From the figures we could observe that PILCO had larger overshoots and more obvious oscillations than other methods during position and force tracking. Our previous method showed better performance than PILCO and feedforward only, but worse than the proposed approach. The proposed approach had smaller overshoots, less response time, and more stable performance than other methods.

FIG. 6.

Tracking performance comparison of different methods in the designed three soft robot–environment interaction cases. Each row is the comparison results of one case. (a–c) Tracking performance comparison of Case 1 where the soft robot is controlled to wipe the manikin's back along a horizontal path with a constant force. (d–f) Performance comparisons of Case 2 where the soft robotic manipulator is controlled to wipe the manikin's back along an inclined path with varying forces. (g–i) Performance comparisons of Case 3 where the soft robotic manipulator is controlled to wipe the manikin's chest along a horizontal path with a constant force.

To quantitatively compare the tracking performance of different controllers, we also calculate the root-mean-square error (RMSE) between actual and target states. The RMSE values are listed in Table 2. The error values also match with intuitive observations of figures in that our approach had the smallest position and force errors (less than half of PILCO). By combining the results in figures and tables, we observed that our approach could have better control performance than the other methods.

Table 2.

Tracking Error Comparison of Different Controllers by Root Mean Square

Different controllers	Case 1			Case 2			Case 3
Different controllers	X (mm)	Z (mm)	F (N)	X (mm)	Z (mm)	F (N)	X (mm)	Z (mm)	F (N)
Proposed approach	4.5773	0.6807	0.0211	4.7703	1.6675	0.0590	5.5127	1.2097	0.0379
Our previous method	4.4173	1.2297	0.0539	6.0243	2.5851	0.0814	9.2499	1.6820	0.0399
Feedforward only	7.9572	1.4816	0.0750	7.7633	3.2535	0.1063	11.8990	1.8479	0.0491
PILCO	13.3471	1.8280	0.0743	9.8575	6.3707	0.1277	16.3715	2.1584	0.0670

PILCO, probabilistic inference for learning control.

Discussion

We attribute the advantages of our approach over the other three methods to the following three aspects: (1) modeling of the system nonlinearity, uncertainty, and output correlations; (2) multistep uncertainty propagation with carefully designed cost function; and (3) optimized combination of feedforward and feedback controllers and online update strategy. Our previous method only made one-step-ahead model prediction. The one-step predictive cost function can only consider the immediate cost, causing the robot's actions to be greedy and aggressive. However, the approach in this study can make multistep-ahead prediction, which considers what is likely to happen in the future steps, and thus, the cost function can be designed to consider not only the immediate cost but also distant future costs. By optimizing the multistep predictive cost function, the robot can adjust its actions in time (even in advance) to improve its performance in tracking reference trajectories.

As a result, the robot response with multistep prediction is usually more accurate and stable than that with one-step prediction. Moreover, both the proposed approach and our previous method had smaller errors than the feedforward-only method owing to the benefits of feedback controllers. In addition, the proposed approach incorporates output correlations into the modeling and uses optimal feedforward–feedback combinations to achieve better performance than PILCO. We did simulations to choose proper PILCO parameters for comparison. However, PILCO still showed a poor performance on the real robot. This is mainly because PILCO assumed independent GPs to model each output dimension, but ignored correlations between each output.

In this study, we used the multitask GP to model the system dynamics due to its advantages that include the following: (1) the multitask GP can explicitly consider system nonlinearity, uncertainty, and correlation among outputs; (2) the GP model is data-efficient and can provide accurate predictions with a small amount of data. This is especially useful when working with limited data or when collecting data is expensive or time-consuming. State-of-the-art approaches, such as deep learning⁵⁰ or reinforcement learning⁵¹ for robotics, usually require thousands of training data to have good performance; and (3) the GP model has fewer hyperparameters to tune and is less computationally expensive than other uncertainty-aware models such as an ensemble of neural networks.⁵²

On the contrary, the GP model typically suffers from the curse of dimensionality and has an increased computational complexity with more training data. In this study, we used a “forgetting” strategy to keep a fixed amount of informative data. Other possible solutions are to utilize sparse GP regression for selecting inducing data points and use principal component analysis for reducing dimensions.

Some limitations exist in this study. The target workspace in the experiment was limited by the safe actuation ranges of the soft robot (i.e., without damaging the soft robot). However, our approach does not have explicit restrictions on the target space, as long as the control targets are within the robot operation range. In future work, we can extend the workspace by using a more robust soft robotic manipulator. Although the Experimental Results section only shows performance over short periods of time, our approach can be operated over longer periods of time. To demonstrate the long-term performance of our approach, we controlled the soft robot's end effector to wipe a manikin's back along a square trajectory with constant force for a total duration of 120 s.

The results can be seen in Appendix I in the Supplementary Data, which shows that the position could gradually reach the target trajectory and the contact force could be maintained at 0.5 N with an error of around 0.05 N. On the contrary, we assume that the person is in a fixed position in this study. If the person is moving dynamically, the approach may not perform as well as when the person is static. In the meantime, we can optimize the software programming to reduce code execution time and improve hardware components such as sensor and actuation speed to increase the operation frequency. Our future studies will explore how to deal with dynamic interaction problems.

Conclusion

This article proposes a learning-based control approach for soft robot–environment interaction with force/position tracking capability. The approach is data-driven and requires no prior knowledge of the soft robot dynamics and environment structures. It contains a probabilistic model to explicitly account for system nonlinearity, uncertainty, and output correlation issues. Meanwhile, the online learning allows the soft robot to adapt to unknown environments, as validated by experimental results. A theoretical analysis of the approach is provided to ensure stability and convergence. In conclusion, this work contributes to the learning and control strategies of soft robotics and has the potential to be extended to other similar robotic systems.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This work was supported by the National Robotics Program, Singapore, under the Soft and Hybrid Phase 2a project (grant number W2125d0243) and in part by the National Research Foundation, Singapore, under its Medium Sized Centre Programme—Centre for Advanced Robotics Technology Innovation (CARTIN).

Supplementary Material

References

Vukobratovic

Dynamics and Robust Control of Robot-Environment Interaction. Vol. 2. World Scientific: Singapore; 2009.

Rus

, Tolley

. Design, fabrication and control of soft robots. Nature, 2015; 521(7553):467–475.

Laschi

, Mazzolai

, Cianchetti

. Soft robotics: Technologies and systems pushing the boundaries of robot abilities. Sci Robot, 2016; 1(1):eaah3690.

Xie

, Yuan

, Liu

, et al. A proprioceptive soft tentacle gripper based on crosswise stretchable sensors. IEEE/ASME Trans Mechatronics, 2020; 25(4):1841–1850.

Thuruthel

, Ansari

, Falotico

, et al. Control strategies for soft robotic manipulators: A survey. Soft Robot, 2018; 5(2):149–163.

Della Santina

, Duriez

, Rus

. Model-based control of soft robots: A survey of the state of the art and open challenges. IEEE Control Syst Mag, 2023; 43(3):30–65.

Laschi

, Thuruthel

, Lida

, et al. Learning-based control strategies for soft robots: Theory, achievements, and future challenges. IEEE Control Syst Mag, 2023; 43(3):100–113.

Chin

, Hellebrekers

, Majidi

. Machine learning for soft robotic sensing and control. Adv Intell Syst, 2020; 2(6):1900171.

Kim

, Kim

, et al. Review of machine learning methods in soft robotics. PLoS One, 2021; 16(2):e0246102.

. A survey for machine learning-based control of continuum robots. Front Robot AI, 2021; 8:280–293.

, et al. Biomedical applications of soft robotics. Nat Rev Mater, 2018; 3(6):143–153.

. Soft robotics in minimally invasive surgery. Soft Robot, 2019; 6(4):423–443.

, et al. Soft Assistive Robot for Personal Care of Elderly People. In: 2016 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob). IEEE: Singapore; 2016; pp. 833–838.

, et al. Towards the development of a soft manipulator as an assistive robot for personal care of elderly people. Int J Adv Robot Syst, 2017; 14(2):1729881416687132.

. Towards Robotic Assisted Hygienic Services: Concept for Assisting and Automating Daily Activities in the Bathroom. In: Proceedings of the International Symposium on Automation and Robotics in Construction, Netherlands; Vol. 29; 2012; pp. 1–5.

, et al. Measuring the activities of daily living: Comparisons across national surveys. J Gerontol, 1990; 45(6):S229–S237.

, et al. Deep reinforcement learning for soft, flexible robots: Brief review with impending challenges. Robotics, 2019; 8(1):4.

, et al. Model- based reinforcement learning for closed-loop dynamic control of soft robotic manipulators. IEEE Trans Robot, 2018; 35(1):124–134.

, et al. SofaGym: An open platform for reinforcement learning based on soft robot simulations. Soft Robot, 2023; 10(2):410–430.

, et al. Toward Effective Soft Robot Control Via Reinforcement Learning. In: Intelligent Robotics and Applications: 10th International Conference, ICIRA 2017, Wuhan, China, August 16–18, 2017, Proceedings, Part I 10. Springer; 2017; pp. 173–184.

21.

Wang

, Chortos

. Control strategies for soft robot systems. Adv Intell Syst, 2022; 4(5):2100165.

22.

Bajo

, Simaan

. Hybrid motion/force control of multi-backbone continuum robots. Int J Robot Res, 2016; 35(4):422–434.

, et al. Hybrid vision/force control of soft robot based on a deformation model. IEEE Trans Control Syst Technol, 2019; 29(2):661–671.

, et al. Model-based dynamic feedback control of a planar soft robot: Trajectory tracking and interaction with the environment. Int J Robot Res, 2020; 39(4):490–513.

. Adaptive variable impedance control for a modular soft robot manipulator in configuration space. Meccanica, 2022; 57:1–15.

26.

Yip

, Camarillo

. Model-less hybrid position/force control: A minimalist approach for continuum manipulators in unknown, constrained environments. IEEE Robot Autom Lett, 2016; 1(2):844–851.

, et al. Position and force control of a soft pneumatic actuator. Soft Robot, 2020; 7(5):550–563.

, et al. Improving Soft Pneumatic Actuator Fingers Through Integration of Soft Sensors, Position and Force Control, and Rigid Fingernails. In: 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE: Stockholm, Sweden; 2016; pp. 5024–5031.

. Soft dagger: Sample-efficient imitation learning for control of soft robots. Sensors, 2023; 23(19): 8278.

Learning-Based Position and Stiffness Feedforward Control of Antagonistic Soft Pneumatic Actuators Using Gaussian Processes. In: IEEE International Conference on Soft Robotics (RoboSoft). IEEE: Singapore; 2023; pp. 1–7.

. Gaussian Process Dynamics Models for Soft Robots with Shape Memory Actuators. In: IEEE International Conference on Soft Robotics (RoboSoft). IEEE: Yale University, USA; 2021; pp. 191–198.

, et al. Nonparametric online learning control for soft continuum robot: An enabling technique for effective endoscopic navigation. Soft Robot, 2017; 4(4):324–337.

33.

, Lee

, Tang

, et al. Localized online learning-based control of a soft redundant manipulator under variable loading. Adv Robot, 2018; 32(21):1168–1183.

, et al. Learning-based approach for a soft assistive robotic arm to achieve simultaneous position and force control. IEEE Robot Autom Lett, 2022; 7(3):8315–8322.

, et al. Nonlinear Model Predictive Control. Springer: New York City;. 2017.

. Gaussian processes for data-efficient learning in robotics and control. IEEE Trans Pattern Anal Mach Intel, 2013; 37(2):408–423.

, et al. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annu Rev Contr Robot Auton Syst, 2022; 5:411–444.

. Partially observable markov decision processes in robotics: A survey. IEEE Trans Robot, 2022; 39(1):21–40.

, et al. A probabilistic model-based online learning optimal control algorithm for soft pneumatic actuators. IEEE Robot Autom Lett, 2020; 5(2):1437–1444.

. Multi-task Gaussian process prediction. Adv Neural Inf Process Syst, 2007; 20:153–160.

. Uniform error bounds for Gaussian process regression with application to safe control. Adv Neural Inf Process Syst, 2019; 32:1–11.

, et al. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans Inf Theory, 2012; 58(5):3250–3265.

43.

Williams

, Rasmussen

. Gaussian Processes for Machine Learning. Vol. 2. MIT Press: Cambridge, MA;. 2006.

. Prediction at an Uncertain Input for Gaussian Processes and Relevance Vector Machines-Application to Multiple-Step Ahead Time-Series Forecasting. In: Informatics and Mathematical Modelling. Technical University of Denmark: Denmark; 2003.

, et al. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Adv Neural Inf Process Syst, 2018; 31.

, et al. Scatter search and local NLP solvers: A multistart framework for global optimization. INFORMS J Comput, 2007; 19(3):328–340.

47.

Vijayakumar

, Schaal

Locally Weighted Projection Regression: An o (n) Algorithm for Incremental Real Time Learning in high Dimensional Space. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000). Vol. 1. Morgan Kaufmann; 2000; pp. 288–293.

48.

, Sun

, Pan

. Three-dimensional deformable object manipulation using fast online gaussian process regression. IEEE Robot Autom Lett, 2018; 3(2):979–986.

49.

Sontag

, Wang

. On characterizations of the input-to-state stability property. Syst Contr Lett, 1995; 24(5):351–359.

, et al. Deep learning in robotics: Survey on model structures and training strategies. IEEE Trans Syst Man Cybern Syst, 2020; 51(1):266–279.

. Reinforcement learning in robotics: A survey. Int J Robot Res, 2013; 32(11):1238–1274.

, et al. Bridging active exploration and uncertainty-aware deployment using probabilistic ensemble neural network dynamics. In: Robotics: Science and Systems 2023 Conference; Daegu, Republic of Korea; pp. 1–14.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.01 MB

10.17 MB

1.29 MB

0.00 MB