Abstract
In this work, the output-feedback fault-tolerant tacking control issue for underactuated autonomous underwater vehicle (AUV) with actuators faults is investigated. Firstly, an output-feedback error tacking system is constructed based on the theoretical model of underactuated AUV with actuators faults. Then, an adaptive dynamic programming (ADP) based fault-tolerant control controller is developed. In our proposed control scheme, a neural-network observer is designed to approximate the system states with actuators faults. An online policy iteration algorithm is designed with critic network and action network in order to improve the tracking accuracy. Based on Lyapunov stability theorem, the stability of the error tracking system is guaranteed by the proposed controller. At last, the simulation results show that the underactuated AUV achieves better tracking performance.
Keywords
Introduction
Tracking control is a complex motion control problem for underactuated autonomous underwater vehicle (AUV) in an unknown underwater environment [1, 2]. Traditionally, the tracking control problems of underactuated AUV without actuators faults have been solved through a variety of control schemes [3–7]. However, actuators are the very important parts of underactuated AUV. The actuators faults may lead to performance degradation of AUV [8, 9], which adds more difficulties in the process of trajectory-tracking control. These difficulties serve as the motivation of this work.
Notations and variable used in this paper
Notations and variable used in this paper
In order to maintain the system stability and the acceptable tracking accuracy, many fault-tolerant control strategies have been developed for AUV with actuators faults, such as adaptive control method [10], robust control method [11], backstepping approach [12] and so on. In this work, the adaptive dynamic programming (ADP) is introduced to solve the output-feedback fault-tolerant tacking control problem for underactuated AUV with actuators faults.
Compared with above control methods [10–12], ADP algorithm has a better adaptive and self-learning ability. An actor-critic networks based constrained generalized policy iteration framework was proposed to solve the nonlinear non-affine optimal control problem in [13]. An event-triggered-based ADP control scheme was designed for distributed formation control of multi-UAV in [14]. An online ADP algorithm was proposed to solve the robust tracking control problem for uncertain nonlinear systems in [15]. A data-based policy iteration algorithm was designed to solve the output-feedback optimal control problem for uncertain linear systems in [16]. And an event-driven ADP scheme was proposed to solve the output tracking control problem for nonlinear systems in [17]. The time delays are considered and HDP is designed to solve the tracing control problem for a class of nonlinear systems in [18]. The ADP based tracking control scheme is designed for coal gasification system in [19]. The ADP algorithm is designed for tracking control with unknown system dynamics in [20, 21].
Motivated by the aforementioned discussion, an action-critic networks based ADP control scheme via neural network observer is proposed for the output-feedback fault-tolerant tracking control of underactuated AUV. The main contribution of this work can be summarized as follows: Compared with existing output-feedback tracking control method for underactuated AUV with actuator faults, the proposed novel tracking control scheme employed the ADP schme, which tries to find a near-optimal control strategy in order to keep the higher stability and better tracking accuracy under the actuator faults. Different from the compared method [22, 24], the proposed novel control scheme introduced a discount coefficient into the performance index due to the nonlinearity and complexity of underactuated AUV. The critic-action neural networks are employed and online policy iteration algorithm and weight update law are designed. The neural network observer is designed to approximate the actuators faults.
The rest of paper is organized as follows. The output-feedback error tracking system is constructed and problem formulation is described in Section 2. In Section 3, the fault-tolerant ADP tracking controller with neural network observer is designed. Simulation examples are provided to demonstrate the effectiveness of the proposed method in Section 4. The conclusion is drawn in Section 5.
Theoretical model of underactuated AUV
Two coordinate systems are employed in the theoretical model of underactuated AUV as shown in Fig. 1. The theoretical model of underactuated AUV without actuators faults is shown as

AUV coordinate systems
where η = [χ, y, z, φ, θ, ψ] T ; ξ = [u, v, w, p, q, r] T .
The kinematics of underactuated AUV is given as
Combining (1) with (2), we can get
The desired trajectory is given as
The error vectors are defined as
Then substituting (4), (5) into (3), the output feedback based error tracking system without actuators faults is given
We define error vector
The error tracking system with actuators faults is given as
The performance index function is defined as
Based on the optimal control theory, the performance index function (9) is a Lyapunov function and satisfies as
Then, the Hamiltonian function is defined as
The optimal cost function is defined as
The optimal cost function (12) satisfies the HJB equation, then
The optimal control is expressed as
The PI scheme is designed as shown in Algorithm 1.
Problem transformation
The structural diagram of neural network observer based fault-tolerant ADP control scheme is shown in Fig. 2.

Structural diagram of neural network observer based fault-tolerant ADP control scheme.
For the error tracking system (8), we developed a radial basis function (RBF) neural network to approximate the actuators faults.
Substituting equation (15) into error tracking system (8), we can get
Then the neural-network faults observer is designed as
The weight vector
Combining (16) with (17), we can get
Substituting (19) into the time derivative of (20), we can get
We can conclude that
The ADP controller consists of critic neural network and action neural network. The critic neural network is utilized to approximate
The derivative of the cost function V3 (x, μ) is given as
Substituting (23) into (10), we can obtain
Then the Hamiltonian function can be expressed as
Then, V3 (x, μ) is approximated as
The derivative of
Then, the approximate Hamiltonian function can be expressed as
Given any admissible control policy μ, it is desired to select
The weight update law for the critic neural network is given as
The approximate weight error of critic neural network is defined as
Then, the time derivative of V4 is
Hence,
The optimal control μ* is approximated by the action neural network as
Because the ideal weight
The approximate feedback error used for training action neural network is defined as the difference between the feedback control input applied to the error tracking system (8) and the optimal control μ* as
The action neural network is defined to minimize the objective function as
The weight updating law for the action neural network is given as follows
According to (14), (23) and (34), we have
The approximate weight error of action neural network is defined as
Then, the time derivative of V5 is
Hence,
Then, the time derivative of V6 is
According to (11), (13), (44) can be transformed as
Hence,
In order to show the effectiveness of the proposed fault-tolerant tracking control based on ADP, two simulation examples are performed compared with the single critic network based ADP in this section [23, 24]. According to the kinematic and dynamic model of underactuated AUV (1) with the conditions that are η (4) =0 and ξ (4) =0, the matrices M, C (ξ), D (ξ) and g (η),J are given as follows.
Given f = 0, ϱ0 = 0.1, ϱ1 = 0.02, ϱ2 = 0.04, γ = 0.3, B = 1822.25, β = 0.15 and τ d = [500, 0, 0, 0, 200, 10] T , the simulation results are given as follows compared with the existing method [23, 24].
Figures 3 and 4 show the tracking error of desired position and attitude and the tracking error of desired velocity compared with the existing method [23, 24] respectively. The tracking trajectory is shown in Fig. 5. The method proposed in this work has received almost the same results with the existing method [23, 24]. From Figs. 3 and 4, we can know that the error tacking system (8) is bounded stable. The absolute values of the tracking error of desired position and attitude are no more than the threshold value 0.2. The absolute values of the tracking error of desired velocity are no more than the threshold value 0.05. From Fig. 5, we know that the value of the error trajectory between the desired trajectory and the simulation trajectory with the method proposed in this work is no bigger than 0.1m.

Tracking error of desired position and attitude.

Tracking error of desired velocity.

AUV trajectory.
In this simulation example, we used the parameters values of example one except for f = 0.1μ. The simulation results compared with the existing method [23, 24] are given as follows.
Figures 6 and 7 show the tracking error of desired position and attitude and the tracking error of desired velocity. From Fig. 7, we know that the jitter happens with the existing method [23, 24] from 0s to 15s. The method proposed in this work has received better results and reduce the jitter effectively.

Tracking error of desired position and attitude.

Tracking error of desired velocity.
Figures 8 and 9 give the estimated actuators faults based on the RBF neural network when the values of f = 0.1μ. With the actuators faults, the jitter happened in the estimation of actuators faults with existing method [23, 24]. When the actuators faults became bigger, the jitter became bigger.

Estimated actuators faults based on RBF neural network with the method propose in this work.

Figure 10 shows the tracking trajectories with f = 0.1μ. From the simulation results, we know that the value of the error trajectory between the desired trajectory and the simulation trajectory with the method proposed in this work is no more than 0.3m. The trajectory with the method proposed in this work is more close to the desired trajectory.

AUV trajectory.
In this work, in order to apply action-critic networks based ADP to solve output-feedback fault-tolerant tracking control problems for underactuated AUV with actuators faults, the error tracking system with actuators faults (8) has been customized. Furthermore, the online policy iteration algorithm has been designed to improve the tracking accuracy, which reduces the impact of jitters effectively. The stability of error tracking system of underactuated AUV (8) is guaranteed under the Lyapunov stability theory. Finally, simulations have been performed. When the actuators faults happened, the jitter happened with the existing method [22, 24]. Simulation results have shown the better performance compared with the existing method [23, 24].
Future researches will concentrate on improving tracking accuracy and stability for full-coupled nonaffine AUVs with complex disturbances limitations. Therefore, online deep reinforcement learning will be taken into account in future study.
Footnotes
Declarations
Funding:There is no funding to support this work. Conflicts of interest:The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this manuscript. Authors’ contributions:G. Che designs the control method, does the simulation experiments and writes the manuscript and Z. Yu designs the control method and analyzes the stability of system.
