Abstract
Abstract
The twisted and coiled polymer muscle (TCPM) has two major benefits: low weight and low cost. Therefore, this new type of actuator is increasingly used in robotic applications where these benefits are relevant. Closed-loop control of these muscles, however, requires additional sensors that add weight and cost, negating the muscles' intrinsic benefits. Self-sensing enables feedback without added sensors. In this article, we investigate the feasibility of using self-sensing in closed-loop control of a Joule-heated muscle. We use a hardware module that is capable of driving the muscle, and simultaneously providing sensor measurements based on inductance. A mathematical model relates the measurements to the deflection. In combination with a simple force model, we can estimate both deflection and force, and control either of them. For a muscle that operates within deflections of [10, 30] mm and forces of [0.32, 0.51] N, our self-sensing method exhibited a 95% confidence interval of 2.14 mm around a mean estimation error of −0.27 mm and 29.0 mN around a mean estimation error of 7.5 mN, for the estimation of, respectively, deflection and force. We conclude that self-sensing in closed-loop control of Joule-heated TCPMs is feasible and may facilitate further deployment of such actuators in applications where low cost and weight are critical.
Introduction
The recently developed actuation principle represented by the twisted and coiled polymer muscle (TCPM) has a number of benefits that make it interesting for application in soft robotics. 1 Two major benefits are its low weight and low cost. The working principle of this actuator is based on the thermal torsion effect. 2 Twisting a fiber with a substructure highly aligned in the direction of the fiber, such as polymer chains or carbon nanotubes, results in a helically aligned substructure. Radial expansion of the twisted fiber and entropic contraction of the helical substructure generate a torque in the opposite direction of the twist. In nylon, both effects can be induced through heating. These torsional actuators become linear actuators through coiling.2,3
Of the varieties of the TCPM, the thermally activated Joule-heated nylon muscle receives the most attention. This specific type already has a wide range of applications: robotic fingers,4–6 joints,7–9 orthoses,10,11 complete robots,12,13 or being embedded in a silicon manipulator, 14 silicon skin for robotic facial expressions, 15 or a self-adjusting sports bra. 16
Systems that benefit most from TCPMs are typically lightweight and inexpensive, and they should function in versatile environments. However, most TCPM control schemes rely either on added sensors to enable feedback control4–7,17–20 or on predictable circumstances to enable feedforward control. 21 Added sensors increase weight and cost, negating two major benefits of these actuators. Accurate feedforward control requires a controlled environment, which limits its usability in real-life applications. One way to enjoy the benefits of TCPMs without the drawbacks of added sensors or complex models is through self-sensing. This means that a system determines its state through the interpretation of input-signal behavior, use of special input signals, or connecting additional electrical leads to existing hardware. 22 Self-sensing in TCPMs will provide an inexpensive and light-weight way to implement feedback.
TCPMs with Joule heating possess self-sensing capabilities, as demonstrated in our previous work. 23 We show the potential to use both resistance and inductance of heating wires for self-sensing purposes. Next to our work, three studies on sensing in TCPMs focus on modeling the resistance of coated nylon muscles.24–26 Two of these works use auto-coiled muscles.24,25 The first work contributes a phenomenological approach to derive a sensing model. 24 It relates resistance of a coated fiber to geometric changes during stretching of the coil. However, this approach does not include actuation, and it therefore cannot be applied as a self-sensing model. The second work contributes an analysis of the resistance when actuating the muscle. 25 The authors found nonlinearities in the resistance attributed to coil windings making contact with each other. The third study uses mandrel-coiled muscles embedded in a silicon manipulator. 26 The authors use the muscles purely as sensors, instead of actuators, and propose a fourth-order polynomial fit as a measurement model. Although these contributions demonstrate the capability for self-sensing, none use self-sensing to close the feedback loop.
In this article, we close the feedback loop via self-sensing. We first identify and validate parameters for two models: one model to estimate deflection via the muscle's inductance, and another model to estimate force, whereby the model inputs are power and estimated deflection. Second, with the models applied, we implement a feedback loop through self-sensing, and perform simple control tasks, as illustrated by Figure 1.

Impression of a self-sensing muscle. A control signal P is used to both drive the muscle to generate the force F and measure the inductance L of the Joule-heating wire. Based on the measurement and the previous control input, the self-sensing and control module estimates the force
We start with an explanation of the methods. The subsequent section contains the experimental validation of our methods. Next, we present the results of the experiments. Finally, we discuss our work and provide conclusions.
Self-Sensing and Control Methods
We first describe the hardware that combines actuation and sensing. Next, we introduce the models used for self-sensing of deflection and estimation of force, as well as their online implementations. Finally, we introduce the control method.
Combined actuation and sensing
Although several ways exist to activate the TCPM, we choose Joule heating by means of a constantan resistance wire. Joule heating has the benefit that it can be used for self-sensing. 23 In this article, we make use of hardware that realizes this principle. 27 The so-called Muscle Drive (MD) drives the TCPM by applying a Pulse Width Modulated (PWM) signal with a controlled duty cycle D. The electrical response of the TCPM during the off time of a signal period relates to inductance. Based on this response, the MD determines a measure of inductance L called decay time. 27
Self-sensing model
In our previous work, we introduced a self-sensing model to estimate deflection x, force F, and temperature, when measuring both inductance and resistance.
23
In this article, we first use the actuation power P to estimate the contribution of temperature to force FT. Next, we use L to determine x and velocity

Block diagram for estimation and control. The gray dashed rectangle contains the functionality of the Muscle Drive (MD). Within the MD, the switch indicates that either the deflection estimate
For the estimation of FT, we disregard the heating time of the resistance wire and assume it heats the fiber homogeneously. We do not measure temperature independently, and we want to use a minimal set of fitted parameters. Therefore, rather than using temperature, we directly relate input power P to the contribution of temperature to force FT. A first-order model describes the relationship between P, FT and its derivative with respect to time
where κP and κc represent the coefficient of conductive heating and convective cooling, respectively. Since FT represents the contribution of temperature to force, κP includes a factor modeling the influence of temperature on force and a factor to correct for power dissipated by the wire directly to the air. We find P by:
where Ub is the voltage at the connectors of the drive when
The model for computing deflection is taken directly from our previous work.
23
It relates L to x and temperature T by:
with λx, λl, λT, and λo as fitted parameters. In contrast to our previous work,
23
we use a constantan resistance wire that exhibits almost constant resistance regardless of temperature. We can therefore neglect the influence of temperature on the actuation and measurement signal. We further neglect the potential influences of temperature on inductance that do not also influence deflection. Omitting temperature from Equation (3) and rewriting the equation to act as a self-sensing model results in:
As a force model we combine the Standard Linear Solid (SLS) model for the mechanical behavior, 28 with a contribution by temperature in parallel, as shown in Figure 3. This makes the force model:
in which Fo represents a force offset, and for which the contribution by Fl is governed by:
with stiffnesses k1 and k2, and damping c. These three parameters, in addition to Fo, are fitted parameters.

Representation of the force model used for the muscles: the Standard Linear Solid model, 28 with a contribution by temperature in parallel.
Estimator implementation
FT and Fl can be found by transferring their respective models to discrete time. However, filtering is required to process deflection measurements into usable estimates, and we need to estimate
Control design
To keep control simple, we choose to use proportional-integral-derivative (PID) control with anti-windup via back calculation to deal with the actuation-signal limits.
29
The control law to find the desired actuation signal
with
with the error e and ė its derivative with respect to time. Control parameters Kp, Td, and Ti represent the proportional gain, and the derivative and integral time constants, respectively. We saturate
with
Stability analysis
Stability analysis requires knowledge of the full system: the physical actuator, its controller, and the load. However, for the method in this article we do not make assumptions regarding the behavior of the load. In other words, we do not know the behavior of the blocks representing the Universal Testing Machine (UTM) and the physical muscle in Figure 2 for arbitrary cases. This means that we cannot analyze stability for the full system. However, we can analyze the stability of the control loop within the gray dotted lines representing the MD, by assuming a constant x, and hence a constant L. This case represents force control with a constant deflection. In this case, closed-loop control is reduced to the interaction between the temperature model in Equation (1) and the control law in Equation (7). A potential source of instability is the saturation in Equation (9). Separating the nonlinearity from the dynamics allows for stability analysis via describing functions.
30
To that end, we determine the transfer function from P to
where s represents the Laplace variable. We can analyze the stability of this system via the describing-function method. 30 Given a properly tuned controller and positive parameters, this system is stable.
Experimental Methods
In this section, we first describe the experimental setup, followed by the construction method and limits of the muscle. We then explain the signal construction for identification, training, and warming up, followed by the control tasks. Then, we explain the experimental protocol. Lastly, we describe how we processed the data.
Experimental setup
The MD applies the PWM signal and measures L. To cope with artifacts of the device that result in spikes and predictable variations in the measurements, we apply a 2-sample moving-average filter and a 15-sample median filter. We use a UTM with a load cell to apply and measure deflection and force. The UTM is a Mark10 ESM303 that has a resolution of 0.02 mm. The load cell of the UTM is a Mark10 M5-05 Force Gauge that has a resolution of 0.5 mN. We control both the UTM and the MD with a custom Python code, running on a laptop. The perspex duct surrounding the TCPM, and a GELID silent 12 120 mm fan directed at the TCPM, with 10 V applied, ensures controlled airflow. Figure 4 illustrates this setup.

Overall setup, with the UTM and the MD in
Muscle construction and limits
For construction of the TCPM we use the method described in our previous work 23 : We align the precursor fiber and resistance wire, with a load suspended at one end, blocking rotation, and a rotary motor at the other. We twist the line until it just starts to coil upon itself. Complete coiling can be achieved either by letting the whole fiber coil upon itself or by wrapping it around a mandrel. We choose the latter, for it increases the sensitivity of inductance to muscle deflection. Annealing finishes the muscle. The endings of the resistance wire connected to the electrical leads are shaped such that when the TCPM is under tension, their influence on the force measurement is minimal. The relevant specifications for construction are shown in Table 1.
Muscle Construction Specifications
To obtain repeatable actuation behavior we had to train the muscle. 23 In addition, in pilot experiments we found that trained muscles that had been inactive for a while needed a warming up to regain that same behavior. Therefore, we included a warming-up phase each time we started an experiment and when we continued an experiment after a pause in the protocol.
Through pilot experiments we determined the following limits of deflection and power. To be sure to have overcome the preload knee and avoid nonlinear behavior due to touching coils,25,31 we choose
Signal construction
In training, warming up, identification, and validation, we excited the muscle by letting the MD apply a power, and the UTM apply a deflection. We used two signal types: a multi-sine signal m and a random-step signal g.
We constructed the multi-sine signal with N components as:
with a0 the signal offset, ai the amplitude of the ith component, fi its frequency, and ϕ
i
its phase. In construction, we determine the phases as:
where ϕ0 is a pseudo-randomly chosen phase offset. This construction method avoids high peaks.
32
We took equal amplitudes, with the signal scaled such that it fit the deflection and power limits, respectively. The frequency interval from which we took the N equally spaced frequencies was
We constructed the random-step signal with H steps as:
with h representing the Heaviside step function, b0 the signal offset, bi the amplitude for each step, and τi the step times. We determined the step times with a random generator, following the construction of step times for generalized binary noise.
33
Given a certain process time constant τp and sampling frequency fs, for each sample time, the probability p the signal switches is:
such that the average time between switching was half the process time constant. Via pilot experiments, we determined the approximate time constants for deflection and power to be, respectively,
Control tasks
We performed several control tasks to quantify the self-sensing performance and the closed-loop control performance of the muscle. We had the muscle perform both force and deflection control. Both consisted of step responses to determine control behavior, and tracking sinusoid references to find the bandwidth of the actuator. The step references contained seven steps, spread over the respective ranges of
As part of the control tasks, we implemented a calibration sequence for deflection measurements and force estimates. The calibration provided two offsets, compensating for unmodeled effects, and disturbances happening in between identification and control. For calibration of the deflection measurements, the UTM held a deflection of 20 mm. The difference between the deflection estimate and the actual deflection, averaged over 10 s, gave the calibration offset for the deflection measurements. For calibration of the force estimates, the UTM held a force of 0.40 N, whereas the MD controlled the deflection. The difference between the force estimate and the actual force, averaged over 30 s, gave the calibration offset for the force estimates.
Experimental protocol
For training, we first suspended the untrained TCPM and set the load cell to zero. We then attached the bottom of the TCPM to the UTM, and we set the position of the UTM, such that the TCPM just started to be under tension. At this point, we set the deflection of the UTM to zero. Then, we turned on the fan and the MD, and we started the training. We excited deflection and power for 600 s, using a multi-sine signal for both.
The identification was initiated in the same way as training. Before gathering identification data, we gave the TCPM a warming up by means of a multi-sine on deflection and power, lasting for 250 s. For identification, we subsequently applied a multi-sine, and a random-step signal on both deflection and power, both lasting 200 s. For validation of the identification, we used a multi-sine for 100 s, followed by a random-step signal for 120 s, applied to both deflection and power. Directly after gathering identification data and preceding the control tasks, we identified the model parameters as described in the next paragraph. During this time, the TCPM was still suspended in the UTM.
The control tasks were preceded with warming up the TCPM by means of a multi-sine for 380 s, and a random-step signal for 200 s, applied on both deflection and power. After the warm-up, we calibrated the deflection measurements and force estimates. Next, we started the force-control tasks. After completion, we recalibrated the deflection measurements and force estimates, to correct for numeric drifting or low-frequency effects that were not included in the models. We then continued the experiment with the position control tasks.
Data processing
The data acquired by the UTM and the MD had their own respective time stamps. Using those, we aligned and re-sampled both UTM and MD data to 16 Hz.
To identify the 6 parameters for Equations (1), (5), and (6), we minimized the squared error between the measured and estimated force response. We obtained the estimated force response by running a simulation of the dynamical system, with the re-sampled power and deflection as input. With MATLAB's genetic-algorithm optimization, we came close to the absolute minimum. Subsequently, with MATLAB's nonlinear least-squares optimization, via the Levenberg-Marquardt algorithm, we found the absolute minimum. We found the three parameters for Equation (4) in a similar fashion, minimizing the squared error between estimated and applied deflection.
For analysis of the models, we first calculated the root mean square error (RMSE) to quantify the estimation error of deflection and force. Second, we assessed the quality of the fit via the R2 value, given by:
where yi are the n data points with
To take a closer look at the performance and limitations of control, we calculated the rise times of the step responses. In addition, to determine the bandwidth of the actuator, we fit the amplitude, phase, and offset of a sinusoid with a given frequency to the respective responses to the last two periods of the sinusoid reference. We approximated the bandwidth by determining the −3 dB point via linear interpolation of the resulting magnitudes.
Results
Figure 5 shows the time series of the identification and validation experiment. Table 2 gives the fitted parameters for Equations (1), (4), (5), and (6). Table 3 shows the quality of the fit and the estimation error resulting from these parameters.

Time series of the identification and validation. The top figure shows the applied power. The middle figure shows the applied deflection in black and the fit deflection estimate in red. The bottom figure shows the measured force in black and the fit force in red. In all figures, the black vertical line shows the separation of identification and validation data. Color images are available online.
Fitted Parameters for Measuring Deflection and Estimating Force. The Unit at *Proportionally Relates to μHmm. The Unit at **Proportionally Relates to μH.
Fit Quality Measures for Deflection and Force, for Data Regarding Fitting, Validation, and Control
RMSE, root mean square error.
Figure 6 highlights the online estimation of deflection and force, by directly comparing the estimates with the true values. We achieved 95% confidence intervals of, respectively, 2.14 mm around a mean error of −0.27 mm for deflection estimation and 29.0 mN around a mean error of 7.5 mN for force estimation. Figure 7 shows the resulting time series of the control experiment. Herein, Figure 7a and b show the step responses during deflection and force control, respectively. Figure 7c and d show four representative periods of the respective sine sweeps. In Figure 8, we show the frequency responses of the sine sweeps during deflection control and during force control. The step responses during deflection control had rise times between 4.2 and 14.1 s, and during force control they had rise times between 2.1 and 5.1 s. Both ranges had outliers at 20 s, indicating that the response did not reach the reference value. We found the bandwidth for deflection control to be ∼1/25 Hz, and for force control to be ∼1/18 Hz.

Estimation data during, respectively, deflection control

Time series data regarding the control experiment. The top figures show the step responses with, respectively, deflection control

Frequency response data of the sine sweeps, with deflection control in black and force control in red. The cross markers indicate the measured response. The dashed lines indicate the linear interpolation between these points. This shows that the −3 dB point for deflection control lies at approximately 1/18 Hz, and for force control it lies approximately at 1/25 Hz. Color images are available online.
Discussion
Our method and implementation of self-sensing resulted in a 95% confidence interval of 2.14 mm around a mean error of −0.27 mm for estimation of deflection and of 29.0 mN around a mean error of 7.5 mN for estimation of force. Combined with our control implementation, we achieved a 1/25 Hz for deflection control and a 1/18 Hz for force control.
The RMSE and 95% confidence interval we achieved for estimation of deflection were sufficient for feedback control. From these results, we conclude that our measurement model in Equation (4) includes the most important effects. Still, tailoring the hardware to the range of inductance of this specific muscle would likely improve the measurements. In addition, we needed an averaging filter and a rather strong median filter to avoid spikes in the data. These artifacts should be taken care of in a new version of the hardware. Further, in the measurement model, we neglected the potential influence of the applied control signal and the influence of temperature. The former requires additional research, in combination with developments in hardware. The latter requires a measurement of temperature, for example via resistance, as in our previous work. 23
The presented implementation for force estimation also captures the most important effects, and it allows for feedback control. However, it does need improvement of both precision and accuracy. The force estimates in Figure 7b and Figure 7d show underestimation at the bottom edge of the achievable force interval, when the control signal is at the lower saturation limit. This indicates that the experimental procedure to find the Joule-heating parameters might underestimate the contribution by convective cooling. Moreover, the peaks in deflection measurements propagate in the force estimate. This explains the peaks in Figure 7b. In additional future work, we aim at quantifying the repeatability of the behavior of the muscles, both within and between muscles. We included a warming-up phase in the experimental protocol, to ensure repeatable behavior. The muscle seems to have a relaxation effect with a low time constant. Endurance tests will reveal this time constant. Subsequent modeling thereof allows for omission of the warming up.
Figure 7a and b illustrate the response of the muscle to step inputs on the reference during, respectively, deflection and force control. The rise times vary from 2.1 to 14.1 s, excluding outliers at 20 s. The control action gets saturated for the majority of the step responses.
Figure 8 shows a limited bandwidth, whereas a high bandwidth is beneficial for robotic applications. TCPMs inherently suffer from this issue, because in practice heating and cooling are slow processes. However, these actuators are suitable for tasks that do not require a high bandwidth. For example, in compliant structures they can slowly change the configuration or stiffness, or apply pre-tension. Further, there are possibilities to increase the bandwidth reported in this study by optimizing material properties, the activation principle, muscle configurations, and control methods. For example, we recommend using smaller-diameter fibers or a suitable configuration of several muscles, such as an antagonistic setup.19,35 In addition, we see opportunities for improving the implementation of the activation principle by expanding the control action space. For example, active cooling stimulates muscle expansion. 36 Changing the cooling medium from air to liquid improves the performance as well.17,35,37 Moreover, when the application of the actuator is known, a feedforward signal could improve the control performance.
A drawback of the TCPM is the poor scalability when considering a single muscle. Using a structure of TCPMs to perform as one actuator increases the scalability and versatility.1,38 However, closely packing the muscle might lead to interaction of actuation and sensing. In future work, we will investigate these potential disturbances for self-sensing and actuation in muscle structures, and methods to cope with those disturbances.
Conclusion
In this study, we aimed at strengthening the position of TCPMs as a feasible actuator in inexpensive and lightweight control systems. To that end, we closed the feedback loop of a controlled TCPM via self-sensing. We estimated both the deflection and force, using the applied power and self-sensing measurements of deflection as input. Subsequently, this allowed us to control either deflection or force. We achieved a 95% confidence interval of 2.14 mm around a mean estimation error of −0.27 mm and of 29.0 mN around a mean estimation error of 7.5 mN for, respectively, deflection and force. This work validated the used sensing model, and it laid the foundation for further developments of research and hardware. It demonstrated the increase in potential of TCPMs to be the actuators in inexpensive and lightweight control systems.
Footnotes
Acknowledgments
The authors would like to thank Michael Fritschi for sharing the hardware that enables self-sensing, and for deliberation on how to get the most out of it. The authors would also like to thank Ron van Ostayen and Just Herder for their consultation regarding modeling and experiment design.
Author Disclosure Statement
No competing financial interests exist.
