Dynamic guidance control for UAV landing on autonomous surface vessel

Abstract

This study proposes an optimization strategy for a tail-sitter unmanned aerial vehicle (UAV) landing on an autonomous surface vessel (ASV), aiming for minimal risk, shortest time, and lowest energy consumption. Sliding mode control ensures system stability under nonlinear and dynamic conditions. A two-player game model between the UAV and ASV is established, achieving Nash equilibrium to minimize risk and energy. Pareto optimal theory provides a trade-off among multiple objectives, while Karush-Kuhn-Tucker conditions verify the optimality of the solution. An Actor-Critic network generates and evaluates landing strategies through adaptive online learning based on temporal difference errors. Results show that the UAV can dynamically adjust to ship motion, achieving precise, energy-efficient, and low-risk landings. This approach offers practical technical support and theoretical foundations for cooperative operations between UAVs and unmanned ships in diverse environments.

Keywords

autonomous surface vessel guidance control Karush-Kuhn-Tucker pareto optimal unmanned aerial vehicle

1. Introduction

Tail-sitter UAVs have shown superior performance in many applications due to their vertical take-off and landing characteristics. However, safely landing on ship at sea is a complex issue, especially given the dynamic movement of the ship on the sea surface. The stable landing of a drone not only depends on the flight control system but is also affected by ship movement and environmental factors. One of the challenges for unmanned aerial vehicle (UAV) safely landing on ship lies in its continuous movement, including vertical and horizontal fluctuations. These movements bring great uncertainty and challenges to the landing of UAV, requiring them to have efficient dynamic guidance capabilities.

In order to ensure stable landing under various environmental conditions, effective guidance game strategies must be researched and designed, fully considering the uncertainty of the environment and movement. Solving the problem of UAV landing on ship will greatly expand the application scope of tail-sitter UAV, including ocean monitoring, search and rescue missions, and maritime transportation. The design of dynamic guidance game strategy can help improve the stability and safety of UAV during ship landing at sea and reduce the risk of landing failure. Studying the dynamic guidance game strategy of tail-sitter UAV landing on ship can also promote the application and development of autonomous control and automation technologies in unmanned systems. The maritime environment is complex and changeable. Studying dynamic guidance game strategies can improve the adaptability and execution efficiency of UAV in different sea conditions.

Currently, many related studies focus on the use of various sensors for guided landing and how to design maritime arresting equipment for UAV capture. Increasing battery life and saving energy is one of the major challenges. In 2021 de Jong CP et al. proposed the updraft around obstacles to reduce the energy consumption of fixed-wing UAV.¹ Experiments have proven that the controller is effective, the average throttle is reduced to 4.5%, and continuous flight at 0% throttle is achieved. R. Polvara et al. proposed a method for autonomous landing of drones on ship decks, using markers and extended Kalman filter, suitable for GPS-free environments.² Tan, L. et al. proposed a method for calculating the optimal landing trajectory between a drone and a small ship.³ The accuracy is improved through numerical iteration and initial guidance, and the effectiveness is verified by simulation experiments. Z.-C. Xu et al. developed a vision-based control method, designed a three-stage visual detection method and a PID controller, and successfully achieved the landing of a UAV on a mobile unmanned surface vehicle (USV) in a lake experiment.⁴ V. Diapic et al. proposed a maritime heterogeneous unmanned autonomous system that combines USV and UAV through modular design to solve collaboration and cooperation problems in a GPS-free environment and improve the reliability of data and control message sharing.⁵

Ruoyu Xu et al. proposed a robotic arm-assisted landing system that can grab and place hovering UAVs.⁶ They designed a nonlinear model predictive controller to deal with disturbances and improved the landing efficiency of multiple UAVs on USVs. B. Feng et al. proposed a staged take-off and landing method, designed corresponding hardware systems is given, including UAV and USV platforms, and improved the autonomous take-off and landing capabilities of UAVs in a dynamic ocean environment through experimental verification.⁷ Li, W. et al. proposed a control strategy based on synchronous motion, using computer vision and two-way long-short-term memory neural network for attitude prediction and control to improve the landing accuracy of UAVs in complex ocean environments.⁸ Wenzhan Li et al. was aiming at the problem of real-time tracking and accurate landing of UAV during USV navigation, a collaborative tracking and landing control strategy based on NMPC was proposed to achieve high-precision landing.⁹ Gangik Cho Joonwon et al. proposed an image-based visual servo technology that combines GPS data and dynamic models to estimate speed, adapt to the rapid movement of ships and the influence of waves, and improve landing accuracy.¹⁰

K. A. Tsintotas proposed a low-complexity electronic fence protection system is proposed that uses radar and range finders to achieve moving obstacle detection and ground assessment under low latency.¹¹ E. Ragusa et al. proposed a design paradigm to implement convolutional neural network detection of small UAV landing platforms on low-power commercial microcontrollers and effectively balance generalization capabilities and computing costs.¹² H. Li et al. proposed a landing structure based on Kresling tube bistable space is presented.¹³ Through graph search and controller optimization design, it can adapt to multiple scenarios and enhance buffering performance. G. Shao et al. proposed a collaborative unmanned surface vehicle-UAV platform, designed adjustable buoys and USV carrying decks to ensure landing safety, and used a multi-ultrasonic joint dynamic positioning algorithm to solve the positioning problem.¹⁴ X. Dong et al. proposed for high-degree-of-freedom landing platforms, a real-time trajectory optimization method was proposed to solve the aerodynamic interference problem of aerial UAV landing. Simulations and experiments proved their high-precision landing effect.¹⁵

M. Maier et al. designed a series robotic arm to assist the landing of a vertical take-off and landing UAV, using a linear state space controller to decouple the position and direction of the UAV and the robotic arm to improve control accuracy.¹⁶ M. C. Santos et al. presented a cooperation architecture for collaborative drones and unmanned ships, using satellite technology to establish robust communication and autonomous take-off and landing systems, which improves mission endurance and flexibility.¹⁷ Z. Zheng proposed for the path following problem of fixed-wing UAVs, a high-order disturbance observer and adaptive control limit function method are proposed to ensure accurate path following without accurate initial position.¹⁸ Y. Huang et al. proposed a visual servo control method based on homographic matrix for tracking UAVs with 6 degrees of freedom moving ships, and improved system stability and performance through hierarchical control strategies.¹⁹

N. P. Santos et al. proposed a monocular camera vision system that uses particle filters and unscented Kalman filters to track the position and direction of UAVs during ship landing, adapting to various tracking problems and meeting landing requirements.²⁰ Falang M. et al. designed the sensing, planning and control modules, using the extended Kalman filter and graph plan algorithm to realize the automatic landing of UAVs on autonomous water vehicles.²¹ N. Xuan-Mung et al. an autonomous landing algorithm including ground effect and disturbance control is given, designed a target state estimator and experimentally proved that the quadcopter landed accurately on a shaking platform.²² K. Xia et al. proposed a funnel-shaped surface with relative positions and a reinforcement learning strategy to improve the accuracy and stability of autonomous landing of UAVs.²³ Q. Zeng et al. proposed a positioning system that uses ultra-wideband anchor points and linear least squares method to calculate landing coordinates to improve UAV landing accuracy.²⁴ X. Fang et al. used favorable wind energy and a two-layer model predictive control scheme to improve the safety and accuracy of autonomous emergency landing of fixed-wing UAVs.²⁵ Y. Yuan et al. presented an automatic landing system with a non-singular fast terminal sliding mode observer and a back-stepping control structure to improve the landing accuracy and stability of UAVs on transportation vehicles.²⁶

Although numerous studies have investigated UAV–ASV cooperative control using frameworks such as nonlinear model predictive control (NMPC), visual servoing, deep reinforcement learning, and adaptive backstepping, most existing approaches treat guidance, optimization, and robustness as separate modules rather than as a unified decision‐making mechanism. Prior work has demonstrated accurate deck landing through vision-based sensing, NMPC-based trajectory regulation under disturbances, or autonomous coordination via deep learning; however, these methods typically optimize a single dominant objective such as tracking accuracy or disturbance rejection and do not explicitly resolve the multi-objective trade-offs among time, energy consumption, and risk within a mathematically verifiable game-theoretic structure. In contrast, the present study integrates sliding mode control, a two-player Nash game formulation, Pareto optimality, and an Actor–Critic adaptive optimization framework into a single cohesive guidance architecture. This unified design allows the UAV to adaptively negotiate landing strategies based on evolving ASV motion, while simultaneously guaranteeing stability (via Lyapunov and Barbalat proofs), equilibrium existence (via Debreu–Glicksberg–Fan conditions), and multi-objective optimality (via KKT-based Pareto analysis). Furthermore, unlike previous reinforcement learning approaches that rely purely on data-driven tuning, the proposed Actor–Critic network operates on top of a rigorously defined multi-objective loss derived from physical energy models, risk envelopes, and pursuit–evasion geometry. By incorporating game-theoretic equilibrium analysis directly into the learning loop, the framework enables real-time adaptation that is both theoretically grounded and operationally efficient. To highlight the key methodological distinctions, Tables 1 and 2 summarizes how the proposed guidance strategy differs from representative state-of-the-art approaches.

Table 1.

Comparison of representative UAV–ASV landing studies.

Method	Primary focus	Control framework	Optimization objective
Vision-based⁴	Visual detection and deck tracking	PID+visual servoing	Tracking accuracy
NMPC-based⁹	Predictive trajectory optimization	NMPC	Constraint satisfaction, robustness
Robotic-arm-assisted¹⁶	Mechanical capture reliability	Nonlinear MPC	Capture precision
Deep RL²³	Learning-based autonomous landing	Reinforcement learning	Reward maximization
This study proposed method	Integrated dynamic guidance optimization	Sliding mode + LOS guidance + game + Pareto + Actor-Critic	Time, energy, risk (Joint loss)

Table 2.

Comparison of representative UAV–ASV landing studies.

Method	Multi-objective trade-off (time/energy/risk)	Game-theoretic analysis	Reinforcement learning integration
Vision-based⁴	Not addressed	No	No
NMPC-based⁹	Partially (time vs. constraints)	No	No
Robotic-arm-assisted¹⁶	Not addressed	No	No
Deep RL²³	Implicit, not analytically defined	No	Yes (pure RL)
This study proposed method	Explicitly formulated with KKT-constrained Pareto analysis	Yes	Yes

It is of great importance to study the dynamic guidance game strategy of tail-sitter UAV to stable land on ship. Optimization guidance strategy not only promotes the development of control theory and game theory but also promotes the research on the interaction between robot technology and dynamic systems. The development of this strategy requires the integration of knowledge from multiple fields such as machine learning, autonomous control, and system identification, and is significance to improving the autonomous decision-making capabilities of intelligent systems. Real-time optimization adjustments for uncertain factors in dynamic environments have further established the status of this research in the academic world and provided a rich theoretical basis for future cross-field research in multiple fields. In terms of industrial applications, the safe landing of Tail-sitter drones on ships at sea is related to the actual operations in many fields such as ocean monitoring, search and rescue missions, and logistics transportation. An effective game-optimized guidance strategy can significantly improve the stability and safety of UAV in dynamic and complex environments, reduce the risk of landing failure, and ensure the efficient completion of missions. At the same time, this research result can be applied to other unmanned systems that require high precision and reliability, such as autonomous vehicles, smart city infrastructure, and aerospace fields.

The realization of these technological innovations will not only enhance the competitiveness of related industries but also create huge economic benefits for society. The purpose of this study is to analyze in detail the motion characteristics of UAVs and ships during the landing process, to identify key influencing factors. A dynamic guidance game strategy suitable for landing in ships is developed, and its effectiveness through simulation and experiments can be verified. The trajectory planning methods for UAV landings is given to cope with ship motion and environmental interference, to ensure the accuracy and safety of the landing process. An experimental platform is constructed for testing and verification in actual scenarios to evaluate the performance and application potential of the dynamic guidance game strategy. This research provides the theoretical support and practical reference for the application of Tail-sitter drones in sea landing and promote the development and application of drone technology in the marine field.

2. Methods analysis

2.1. Tail-sitter UAV model

The tail-sitter UAV is a hybrid model that takes off and lands vertically and changes attitude in the air. It has the characteristics of an aircraft and a multi-rotor. The movement of the control surface and servo motor can affect the attitude of the aircraft. Assume that the mass center of the UAV is $r = {[x, y, z]}^{Τ}$ in the inertial coordinate system, and its attitude is represented by $θ = {[ϕ, θ, ψ]}^{Τ}$ , which correspond to roll, pitch and yaw respectively. The rigid body motion equation of the UAV can be described by (1), where $m$ is the mass of UAV, $F_{e x t}$ is the external force from the propulsion system and servo rudder surface, $F_{a e r o}$ is the force from air resistance and other aerodynamic forces, $I$ is the moment of inertia of the drone, $ω = {[p, q, r]}^{Τ}$ is the angular velocity vector, $M_{e x t}$ is the external torque generated by the rudder surface controlled by the servo motor, and $M_{a e r o}$ is the torque exerted on the drone by the aerodynamic force.

{\begin{cases} m \frac{d^{2} r}{d t^{2}} = F_{e x t} + F_{a e r o} \\ I \frac{d ω}{d t} + ω \times (I ω) = M_{e x t} + M_{a e r o} \end{cases}

(1)

In this case of Figure 1, the two control rudder surfaces can be controlled by two servo motors. $T_{1}$ and $T_{2}$ are the propeller power inputs. $δ_{1}$ and $δ_{2}$ are the rudder surface angular velocities. The aerodynamic lift generated by the rudder surface can be given as (2), where $L_{1} (δ_{1})$ , $L_{2} (δ_{2})$ , $M_{1} (δ_{1})$ , and $M_{2} (δ_{2})$ are the mixed control torques of rudder surface 1 and rudder surface 2 respectively.

M_{s e r v o} = [\begin{array}{l} L_{1} (δ_{1}) & L_{2} (δ_{2}) \\ M_{1} (δ_{1}) & M_{2} (δ_{2}) \end{array}]

(2)

Figure 1.

UAV thrust and angle control parameters.

Considering the drone is an elastic body, its movement in the air can be affected by air resistance. Air resistance is usually proportional to the square of the UAV velocity. Therefore, the related translation force and moment is described by (3), where $ρ$ is the air density, $A$ is the cross-sectional area of the UAV facing the wind, $C_{d}$ is the drag coefficient, $v = {[u, v, w]}^{Τ}$ is the speed vector of the UAV, and $C_{m}$ is the aerodynamic moment coefficient.

{\begin{cases} F_{d r a g} = - \frac{ρ A C_{d} | v | v}{2} \\ M_{a e r o} = - \frac{ρ A C_{m} | ω | ω}{2} \end{cases}

(3)

Combining translational motion, rotational motion, the force of the rudder surface and air resistance, equation (1) can be rewritten as (4), where $F_{g r a v i t y}$ is gravity.

{\begin{cases} m \frac{d^{2} r}{d t^{2}} = F_{e x t} + F_{a e r o} + F_{d r a g} + F_{g r a v i t y} \\ I \frac{d ω}{d t} + ω \times (I ω) = M_{e x t} + M_{a e r o} \end{cases}

(4)

To describe the dynamic behavior of the UAV using a state space representation of (5), the dynamic equations of the aircraft can be expanded into linear and nonlinear coupled dynamics to facilitate control design. $v = {[u, v, w]}^{Τ}$ is the linear velocity vector of the mass center. Therefore, the input command vector corresponding to the UAV can be described by (6).

x = [\begin{array}{l} r \\ v \\ θ \\ ω \end{array}] = [\begin{array}{l} x, y, z \\ u, v, w \\ ϕ, θ, ψ \\ p, q, r \end{array}]

(5)

u = {[\begin{array}{l} T_{1} & T_{2} & δ_{1} & δ_{2} \end{array}]}^{Τ}

(6)

From (7), the rate of change of the attitude heading angle is provided by the velocity and the angular velocity incorporating the Euler angle transformation matrix $T_{θ} (θ)$ .

{\begin{cases} \dot{r} = v \\ \dot{θ} = T_{θ} (θ) ω \end{cases}

(7)

The state space matrix of the linearized system can be expressed in (8), where $A$ is the state matrix and $B$ is the control matrix. $T_{v}$ represents the linear coupling term between velocity and thrust, $T_{ω}$ represents the coupling term between angular velocity and torque, $B_{v}$ is the matrix that converts thrust into acceleration, and $B_{ω}$ is the matrix that transforms control surface deflection into angular acceleration.

\dot{x} = A x + B u \equiv [\begin{array}{l} \dot{r} \\ \dot{v} \\ \dot{θ} \\ \dot{ω} \end{array}] = [\begin{array}{l} 0 & I_{3} & 0 & 0 \\ 0 & 0 & 0 & T_{v} \\ 0 & 0 & 0 & I_{3} \\ 0 & 0 & 0 & T_{ω} \end{array}] [\begin{array}{l} r \\ v \\ θ \\ ω \end{array}] + {[\begin{array}{l} 0 & 0 \\ B_{v} & 0 \\ 0 & 0 \\ 0 & B_{ω} \end{array}]}^{Τ} [\begin{array}{l} T_{1} \\ T_{2} \\ δ_{1} \\ δ_{2} \end{array}]

(8)

A tail-sitter UAV switching from a fixed-wing mode of horizontal flight to a multi-rotor mode of vertical landing or the reverse operation is a highly nonlinear process. These two flight modes correspond to completely different characteristics and control gain parameters. Therefore, in this study, Sliding Mode Control (SMC)²⁷ is used to handle the UAV system with strong system robustness to cope with uncertainty and external disturbances when switching UAV flight models. Combining the theory of SMC to describe the control switching of the tail-sitter UAV between horizontal flight and vertical landing, the characteristics of the two modes must be considered. In the horizontal flight mode, it is similar to fixed-wing flight. The propeller is mainly used to provide horizontal thrust, and the rudder surface is used to control the attitude. The vertical landing mode is similar to the multi-rotor mode. The thrusters provide a lift vertically upward and work with the rudder to control attitude stability. SMC will design sliding surfaces in these two modes respectively and switch the control strategy between these two modes. The basic idea of SMC is to design a sliding mode surface so that the error dynamics of the system can gradually approach zero along this surface to achieve stable control of the system. Define a sliding mode surface of $s (x)$ to convert the state error to zero. Design the control law so that the state vector $x (t)$ of the system moves on the sliding mode surface. When switching between the two modes, two different sliding mode surfaces need to be designed, and the sliding mode surface and control law are converted at the switching point.

The control goal in vertical mode is to stabilize the height and attitude of the drone and enable it to take off and land vertically. As shown in (9), the corresponding sliding mode surface can be designed for height and attitude errors, where $h_{d}$ is the desired height, $h$ is the current height, $θ_{d} = {[ϕ_{d}, θ_{d}, ψ_{d}]}^{Τ}$ is the desired attitude angle, $Λ_{h}$ is the control gain matrix corresponding to the height, and $Λ_{θ}$ is the control gain matrix corresponding to the attitude.

{\begin{cases} s_{h} = \dot{h} + Λ_{h} (h - h_{d}) \\ s_{θ} = \dot{θ} + Λ_{θ} (θ - θ_{d}) \end{cases}

(9)

In the horizontal flight mode, the control goal is to maintain the horizontal speed and attitude stability of the UAV. The corresponding sliding mode surface can be designed according to the horizontal velocity error and attitude error as shown in (10). $v_{d} = {[u_{d}, v_{d}, w_{d}]}^{Τ}$ is the desired horizontal speed and $Λ_{v}$ is the control gain matrix for speed.

{\begin{cases} s_{v} = \dot{v} + Λ_{v} (v - v_{d}) \\ s_{θ} = \dot{θ} + Λ_{θ} (θ - θ_{d}) \end{cases}

(10)

The goal of the SMC law is to make the state of the system tend to the sliding mode surface $s = 0$ , that is, the control law can be designed by (11). $K$ is the control gain and $s g n (s)$ is the sign function used to ensure that the state moves along the synovial surface.

u = - K s g n (s_{h}, s_{θ}, s_{v})

(11)

Then, the control laws of the vertical mode, and the horizontal mode are

u_{v e r t}

and

u_{h o r z}

respectively, i.e.,

{\begin{cases} u_{v e r t} = - K_{h} s g n (s_{h}) - K_{SMC, θ} s g n (s_{θ}) \\ u_{h o r z} = - K_{v} s g n (s_{v}) - K_{SMC, θ} s g n (s_{θ}) \end{cases}

(12)

For the flight mode switching in the control system, the transition between horizontal flight and vertical landing needs to occur under certain conditions. Determine when to switch the control mode based on the attitude angle of the drone, especially the pitch angle $θ$ and flight speed $v$ . When the pitch angle $θ$ increases to the threshold, i.e. $| θ | \geq θ_{t h r e s h o l d}$ and $| v | < v_{t h r e s h o l d}$ , the horizontal flight can be switched to the vertical mode. When the pitch angle $θ$ of the drone gradually decreases to nearly horizontal and the speed $v$ is higher than the threshold, that is, $| θ | < θ_{t h r e s h o l d}$ and $| v | \geq v_{t h r e s h o l d}$ , then the vertical mode is switched to horizontal mode. The sliding mode process $s g n (s_{h}, s_{θ}, s_{v})$ can cause the discontinuity of the control input to cause chattering. According to the design conditions, it is replaced by smooth control of the smooth approximation function of $u = - K s / (| s | + ε)$ . It is the control input of the boundary layer parameters $ε$ used to smooth the sliding mode surface.

As the tail-sitter UAV landing on a slow-moving or stationary ship, it executes in vertical mode and uses the ship’s landing guide and active capture device to assist in completing the landing task. However, when the ship is in a landing state that is not conducive to vertical takeoff and landing mode, it should be tracked through the capture net directly toward the stern of the ship in fixed-wing mode until it hits the capture net. When UAV is tracking a moving target with fixed-wing mode, it uses the Line of Sight (LOS) of a three-axis optical tracker to design guidance law. The LOS of the optical tracker defines the heading angle of the target relative to the UAV body coordinate system and achieves smooth and stable tracking of the target through the design of the guidance law. It consider the estimation of the target’s relative position and velocity, the attitude control law based on LOS, the design of the guidance law considering stable tracking, and the SMC to deal with the disturbance and uncertainty in target tracking. The UAV’s optical tracker can provide target LOS information including two angles, horizontal LOS $θ_{LOS}$ and vertical LOS $ϕ_{LOS}$ .

$θ_{LOS}$ represents the horizontal line of sight direction of the target relative to the UAV and $ϕ_{LOS}$ represents the vertical line of sight direction of the target relative to the UAV, as shown in Figure 2. The dynamic change rate ${\dot{θ}}_{LOS}$ and ${\dot{ϕ}}_{LOS}$ of the target relative to the UAV can also be estimated through the optical tracker. The relative distance $φ_{LOS}$ of the target is obtained through the rangefinder or estimation system of the optical tracker, and the drone can measure its own speed relative to the world coordinate system. Therefore, the state of the UAV relative to the target is expressed as $x_{r e f} = {[φ_{LOS}, θ_{LOS}, ϕ_{LOS}, {\dot{θ}}_{LOS}, {\dot{ϕ}}_{LOS}]}^{Τ}$ .

Figure 2.

PN and LOS of UAV environment state space.

The goal is to keep the drone’s line of sight aligned with the target. This requires that the LOS angle $φ_{LOS}$ must keep $θ_{LOS}$ , $ϕ_{LOS}$ , ${\dot{θ}}_{LOS}$ , and ${\dot{ϕ}}_{LOS}$ approaching 0. Line alignment is achieved by implementing pitch $θ$ and yaw $ψ$ . $K_{θ}$ is the pitch proportional gain, $D_{θ}$ is the pitch damping gain, ${\dot{ϕ}}_{e r r o r}$ is the vertical LOS angle change rate, $K_{ψ}$ is the yaw proportional gain, $D_{ψ}$ is the yaw damping gain, ${\dot{θ}}_{e r r o r}$ is the horizontal LOS angle change rate. Such a control law ensures that the UAV automatically adjusts its pitch and yaw angles according to the LOS error, thereby achieving stable target tracking.

{\begin{cases} θ_{cmd} = K_{θ} ϕ_{e r r o r} + D_{θ} {\dot{ϕ}}_{e r r o r} \\ ψ_{cmd} = K_{ψ} θ_{e r r o r} + D_{ψ} {\dot{θ}}_{e r r o r} \end{cases}

(13)

The guidance law is designed using the Proportional Navigation (PN) method, which adjusts the aircraft’s acceleration based on the change in the LOS angle. According to the fundamental control principle of PN, as shown in (14), $a_{cmd}$ represents the commanded acceleration, $N$ is the proportional gain, $V$ is the relative velocity of the UAV, and ${\dot{θ}}_{LOS}$ is the LOS angular rate.

a_{cmd} = N V {\dot{θ}}_{LOS}

(14)

For three-dimensional UAV tracking, the PN method is applied separately to the horizontal and vertical directions to design the acceleration commands, as shown in (15). These commanded accelerations are then converted into UAV attitude commands to drive the UAV in tracking the target.

{\begin{cases} a_{θ} = N_{θ} V {\dot{θ}}_{LOS} \\ a_{ϕ} = N_{ϕ} V {\dot{θ}}_{LOS} \end{cases}

(15)

In practical applications, UAVs may encounter challenges such as wind disturbances, modeling inaccuracies, and sensor noise during target tracking. To enhance system robustness, SMC is incorporated into the guidance law to address these uncertainties. The sliding surface, the control law, and a smoothing approximation function to mitigate chattering effects are given in (16), where $λ_{ϕ}$ and $λ_{θ}$ are used to control the error convergence rate.

{\begin{cases} s_{ϕ} = {\dot{ϕ}}_{LOS} + λ_{ϕ} ϕ_{LOS} \\ s_{θ} = {\dot{θ}}_{LOS} + λ_{θ} θ_{LOS} \\ a_{ϕ} = - K_{ϕ} s g n (s_{ϕ}) \\ a_{θ} = - K_{θ} s g n (s_{θ}) \\ a_{ϕ} = - K_{ϕ} s_{ϕ} / (| s_{ϕ} | + ε) \\ a_{θ} = - K_{θ} s_{θ} / (| s_{θ} | + ε) \end{cases}

(16)

The relative position and velocity of the target are estimated based on the LOS angle and angular velocity. Pitch and yaw control laws are designed based on the LOS error to achieve attitude adjustment. The guidance law is formulated using PN, adjusting the UAV’s acceleration according to the LOS angular velocity. SMC is introduced to enhance system robustness against disturbances and uncertainties. On this basis, Lyapunov stability verification is conducted. The sliding surface is defined according to the Lyapunov function’s negative definiteness condition, i.e., $\dot{V} = s_{θ} {\dot{s}}_{θ} + s_{ϕ} {\dot{s}}_{ϕ}$ . The guidance law is then used to replace $a_{θ} = {\ddot{θ}}_{LOS}$ and $a_{ϕ} = {\ddot{ϕ}}_{LOS}$ , while incorporating a chattering suppression design for $a_{ϕ} = - K_{ϕ} s_{ϕ} / (| s_{ϕ} | + ε)$ and $a_{θ} = - K_{θ} s_{θ} / (| s_{θ} | + ε)$ , as shown in (17).

{\begin{cases} {\dot{s}}_{ϕ} = {\ddot{ϕ}}_{LOS} + λ_{ϕ} {\dot{ϕ}}_{LOS} \equiv a_{ϕ} + λ_{ϕ} {\dot{ϕ}}_{LOS} = - K_{ϕ} \frac{s_{ϕ}}{| s_{ϕ} | + ε} + λ_{ϕ} {\dot{ϕ}}_{LOS} \\ {\dot{s}}_{θ} = {\ddot{θ}}_{LOS} + λ_{θ} {\dot{θ}}_{LOS} \equiv a_{θ} + λ_{θ} {\dot{θ}}_{LOS} = - K_{θ} \frac{s_{θ}}{| s_{θ} | + ε} + λ_{θ} {\dot{θ}}_{LOS} \end{cases}

(17)

Substituting (17) into the Lyapunov function $\dot{V} = s_{θ} {\dot{s}}_{θ} + s_{ϕ} {\dot{s}}_{ϕ}$ yields (18). The first and second terms are negative, indicating the decreasing nature of the Lyapunov function, which represents the reduction of system energy and manifests as the attenuation of LOS error and angular velocity error. If the sliding mode gains $K_{θ}$ and $K_{ϕ}$ are sufficiently large, these negative terms can dominate $\dot{V}$ , ensuring $\dot{V} \leq 0$ . Consequently, the Lyapunov function decreases monotonically over time, indicating the stability of the system.

\dot{V} \begin{array}{l} = s_{θ} (- K_{θ} \frac{s_{θ}}{| s_{θ} | + ε} + λ_{θ} {\dot{θ}}_{LOS}) + s_{ϕ} (- K_{ϕ} \frac{s_{ϕ}}{| s_{ϕ} | + ε} + λ_{ϕ} {\dot{ϕ}}_{LOS}) \\ = - K_{θ} \frac{{(s_{θ})}^{2}}{| s_{θ} | + ε} - K_{ϕ} \frac{{(s_{ϕ})}^{2}}{| s_{ϕ} | + ε} + s_{θ} λ_{θ} {\dot{θ}}_{LOS} + s_{ϕ} λ_{ϕ} {\dot{ϕ}}_{LOS} \end{array}

(18)

According to (18), the Lyapunov method proves that $\dot{V} \leq 0$ is bounded and monotonically decreases over time, converging to a constant value. To further demonstrate the system’s asymptotic stability, Barbalat’s Lemma²⁸ is applied. If the function $V$ is bounded and its derivative $\dot{V}$ approaches zero over time, then $V$ can converge to zero. Based on (18), $\dot{V}$ is negative and tends to zero, implying that $s_{θ}$ and $s_{ϕ}$ decrease over time. When $\lim_{t \to \infty} V = 0$ , the sliding surfaces $s_{θ}$ and $s_{ϕ}$ also tend to zero, which further implies $\lim_{t \to \infty} θ_{LOS} = 0$ , $\lim_{t \to \infty} {\dot{θ}}_{LOS} = 0$ , $\lim_{t \to \infty} ϕ_{LOS} = 0$ , and $\lim_{t \to \infty} {\dot{ϕ}}_{LOS} = 0$ . Consequently, the LOS angle error and angular velocity error ultimately converge to zero, ensuring stable target tracking. Considering the presence of output disturbances from the optical tracker, significant perturbations are typically introduced into the control system under harsh environmental conditions. To account for external disturbances, equation (17) is extended by incorporating external disturbance factors, as shown in (19). $Δ_{ϕ}$ and $Δ_{θ}$ represent the uncertainties of the system, which may include variations in the moment of inertia or deviations in aerodynamic parameters. $d_{ϕ}$ and $d_{θ}$ denote external disturbances such as wind forces or other environmental influences.

{\begin{cases} {\ddot{ϕ}}_{LOS} = a_{ϕ} + Δ_{ϕ} + d_{ϕ} \\ {\ddot{θ}}_{LOS} = a_{θ} + Δ_{θ} + d_{θ} \end{cases}

(19)

Referring to (19) back into (17), the system under a disturbance environment can be obtained. Considering the system uncertainty terms of $Δ_{ϕ}$ and $Δ_{θ}$ , an adaptive control law is designed to compensate for these uncertainties. The design of the adaptive control law consists of two parts. The first is the baseline control law, which corresponds to the ideal condition without uncertainties and disturbances. The second is the adaptive term, which compensates for system uncertainties. Therefore, the guidance law can incorporate the estimated uncertainty terms of ${\hat{Δ}}_{ϕ}$ and ${\hat{Δ}}_{θ}$ for external disturbances, as shown in (20).

{\begin{cases} a_{ϕ} = - K_{ϕ} \frac{s_{ϕ}}{| s_{ϕ} | + ε} + {\hat{Δ}}_{ϕ} \\ a_{θ} = - K_{θ} \frac{s_{θ}}{| s_{θ} | + ε} + {\hat{Δ}}_{θ} \end{cases}

(20)

The estimated adaptive terms

{\hat{Δ}}_{ϕ}

and

{\hat{Δ}}_{θ}

are adjusted through the design of the adaptive guidance law to approximate the uncertainty terms

Δ_{ϕ}

and

Δ_{θ}

. According to the adaptive control approach, the guidance law is designed by (21), where

γ_{ϕ}

and

γ_{θ}

are the adaptive gains that control the convergence rate of the adaptive estimation, and

s_{ϕ}

and

s_{θ}

are the sliding surfaces that guide the update of the adaptive estimation.

{\begin{cases} {\dot{\hat{Δ}}}_{ϕ} = γ_{ϕ} s_{ϕ} \\ {\dot{\hat{Δ}}}_{θ} = γ_{θ} s_{θ} \end{cases}

(21)

The stability of this adaptive control law is validated based on the Lyapunov method. The Lyapunov function is defined by (22), where ${\tilde{Δ}}_{ϕ} = {\hat{Δ}}_{ϕ} - Δ_{ϕ}$ and ${\tilde{Δ}}_{θ} = {\hat{Δ}}_{θ} - Δ_{θ}$ represent the adaptive estimation errors.

{\begin{cases} V = \frac{{(s_{θ})}^{2} + {(s_{ϕ})}^{2}}{2} + \frac{{({\tilde{Δ}}_{θ})}^{2}}{2 γ_{θ}} + \frac{{({\tilde{Δ}}_{ϕ})}^{2}}{2 γ_{ϕ}} \\ \dot{V} = s_{θ} {\dot{s}}_{θ} + s_{ϕ} {\dot{s}}_{ϕ} + \frac{{\tilde{Δ}}_{θ} {\dot{\hat{Δ}}}_{θ}}{γ_{θ}} + \frac{{\tilde{Δ}}_{ϕ} {\dot{\hat{Δ}}}_{ϕ}}{γ_{ϕ}} \end{cases}

(22)

The original system similarly demonstrates that the derivative of the Lyapunov function $\dot{V}$ is negative and approaches zero, ensuring that the system remains stable as it monotonically decreases over time. The adaptive control law design enables the UAV to stably track the target despite uncertainties and external disturbances. Through the adaptive control law and the Lyapunov method, the system’s stability is rigorously proven, ensuring that the LOS angle error gradually converges to zero. Therefore, the control methodology for the UAV’s dynamic system is completed, allowing the configuration of its autonomous flight mode and guidance conditions while considering performance and bandwidth constraints. However, when describing the problem of UAV landing on a vessel, a multi-objective optimization game model can be utilized for analysis. Different game strategies can be incorporated to account for three objectives: minimum landing time, lowest energy consumption, and minimal risk. These objectives may involve trade-offs; therefore, a multi-objective game can be formulated from this perspective, and the existence of a Nash equilibrium can be proven.

Assume that UAV and the ship are two players in the game. The UAV’s strategy involves selecting the landing path and velocity to achieve three objectives: minimum landing time, minimum energy consumption, and minimum risk. The UAV must make trade-offs, as its strategy influences the priority of these objectives. The ship’s strategy involves adjusting the landing conditions, such as controlling its speed and direction to reduce the UAV’s landing risk and assist in a smooth descent. The game dynamics are defined as follows: the UAV aims to land as quickly as possible, which may increase energy consumption or risk. It seeks to minimize energy consumption during landing, which may extend the landing duration or require a more complex trajectory. Additionally, the UAV aims to minimize landing risk, particularly under disturbances such as wind speed and ship oscillations.

Based on the game players, objectives, and relationships, a mathematical model is formulated under the following conditions. Let $t (x, v)$ be the time function describing the UAV’s descent along the optimal trajectory $x$ with velocity $v$ to the ship. Let $E (x, v)$ be the energy function representing the UAV’s energy consumption along $x$ with velocity $v$ . The risk function $R (x, v, ζ)$ depends on the UAV’s trajectory $x$ , velocity $v$ , and environmental factors $ζ$ . UAV’s objective is to minimize the following multi-objective cost function, as shown in (23), where $α$ , $β$ , and $γ$ are the corresponding objective weights.

L (x, v, ζ) = α t (x, v) + β E (x, v) + γ R (x, v, ζ)

(23)

The game model consists of two strategy sets: the UAV’s strategy set $S_{1}$ and the ship’s strategy set $S_{2}$ . The UAV’s strategy set includes all possible landing paths $x$ and velocities $v$ , while the ship’s strategy set consists of all possible ship speeds and heading adjustments, which can be collectively represented by the environmental variable $ζ$ . Considering the UAV’s high maneuverability and the ship’s relatively low control flexibility, along with the requirement to ensure onboard equipment safety. Therefore, the UAV is set to minimize its loss, while the ship attempts to minimize risk. The loss functions of the UAV and the ship can be described as (24).

{\begin{cases} U_{1} (x, v, ζ) = L (x, v, ζ) \\ U_{2} (x, v, ζ) = R (x, v, ζ) \end{cases}

(24)

A two-player zero-sum game is given, where the UAV attempts to minimize its loss while the ship seeks to minimize risk. To analyze this game theory, it is essential to first prove the existence of a Nash equilibrium. The first step is to establish the continuity of the utility functions. Assuming that $t (x, v)$ , $E (x, v)$ , and $R (x, v, ζ)$ are continuously differentiable functions, it follows that the loss functions of the UAV and the ship are continuous. The second step is to verify the compactness of the strategy space. Suppose that the UAV’s trajectory $x$ , velocity $v$ , and the ship’s environmental variable $ζ$ belong to compact sets, meaning their possible values are bounded. This assumption is reasonable, as the choices of velocity and trajectory are physically constrained, and the ship’s maneuverability is also limited. The third step is to demonstrate the convexity of the functions. Suppose that the UAV’s loss function $L (x, v, ζ)$ is convex with respect to the UAV’s strategy, and the risk function $R (x, v, ζ)$ is convex with respect to the ship’s strategy $ζ$ . This ensures the existence and uniqueness of optimal strategies. By the Debreu-Glicksberg-Fan theorem,²⁹ a Nash equilibrium³⁰ must exist under the conditions of a compact and convex strategy space and continuous loss functions. At this equilibrium point, the UAV selects an optimal trajectory and velocity that balances the minimization of time, energy consumption, and risk. Meanwhile, the ship adjusts its movement strategy based on the UAV’s actions to minimize risk. At this point, neither the UAV nor the ship can unilaterally change their strategy to further optimize their respective objectives, meaning the game has reached equilibrium.

To address the limitations noted in prior game-theoretic UAV–ASV coordination studies, the present work explicitly formalizes the information structure, payoff definitions, and equilibrium-seeking mechanism for the two-player dynamic game. The UAV’s payoff function is defined as a weighted multi-objective cost incorporating landing time, energy consumption, and risk, all of which are computed from measurable flight-state variables and predicted ASV motion. The ASV’s payoff is formulated as a risk-minimization term dependent on deck motion amplitude, relative pose, and disturbance intensity $ζ$ reflecting the ship’s feasible operational envelope. Both players act under partial but consistent information: the UAV receives relative pose, envelope boundaries, and LOS-based predictions, while the ASV receives only global deck motion measurements and does not react adversarially. This results in a cooperative but non-identical-interest game, where the UAV optimizes a tri-objective landing metric and the ASV stabilizes deck conditions to the extent permitted by its dynamics.

To achieve Nash equilibrium under real-time operational constraints, the static equilibrium characterization is mapped into an iterative online solution in which the UAV updates its strategy through the Actor–Critic network, while the ASV contributes a quasi-static disturbance profile that evolves slowly relative to UAV guidance cycles. Because ship motion bandwidth 0.1–1 Hz is significantly lower than the UAV guidance update frequency 10–20 Hz, the equilibrium is established in a receding-horizon manner. Convergence is supported by the convexity of each objective component and the compactness of the feasible set, enabling the UAV to approach a locally optimal response that respects constraints derived from KKT conditions. The Actor–Critic function approximation further accelerates convergence by embedding Nash-optimal gradients into the update law, eliminating the need for explicit equilibrium solving at each step. This structured formulation clarifies the respective roles of UAV and ASV, resolves the issue of asymmetric objectives, and ensures that the game-theoretic framework remains computationally tractable and operationally meaningful in marine environments.

The existence of a Nash equilibrium in the game function is established. There are two optimization problems: one related to UAV strategy and the other concerning ship strategy. For the UAV, the objective within its strategy space is to minimize the loss function, while for the ship, the goal within its strategy space is to minimize risk. Consequently, the optimization problems for both the UAV and the ship can be formulated by (25).

{\begin{cases} \min_{x, v} α t (x, v) + β E (x, v) + γ R (x, v, ζ) \\ \min_{ζ} R (x, v, ζ) \end{cases}

(25)

Since the utility function of the UAV includes multiple objectives, such as landing time, energy consumption, and risk, it is necessary to allocate weights to these objectives for optimization. The parameters $α$ , $β$ , and $γ$ are constants that represent the UAV’s priority for each objective, where $α$ determines the UAV’s emphasis on minimizing landing time, $β$ reflects its sensitivity to energy consumption, and $γ$ indicates its concern for risk. An optimal path $x$ and velocity $v$ can be selected to balance the three objectives under the given environmental variables $ζ$ of the ship. The strategies of both the UAV and the ship are subject to physical constraints, which can be categorized into two parts. The first part pertains to the UAV’s motion constraints, where its velocity ${v m a x}_{\min}$ and path $x$ are limited by its flight capabilities and must not exceed the dynamic constraints of its flight model. Typically, restrictions are imposed on its roll, pitch, and yaw angles, while in special cases, additional constraints may be applied to its heading and sideslip angles. The second part concerns the ship’s constraints, where the ship’s velocity ${v s_{s, \max}}_{s, \min}$ and heading attitude, represented by the external variable $ζ$ , are subject to operational limits. By incorporating these two constraints ${v m a x}_{\min}$ and ${v s_{s, \max}}_{s, \min}$ into the optimization problem, the multi-objective optimization is solved using the Lagrange multiplier method. The loss function and constraints are combined to construct the Lagrange function, as shown in (26), where $λ_{1}$ and $λ_{2}$ are the Lagrange multipliers.

L (x, v, λ) = α t (x, v) + β E (x, v) + γ R (x, v, ζ) + λ_{1} (v - v_{\min} {()}_{2} (v_{\max} ()))

(26)

Considering that the game analysis assumption is completed within a discrete time step $k$ , the UAV’s loss function $L (x, v, λ)$ is optimized using gradient descent under the fixed condition of $ζ$ . By solving the UAV’s strategy under the constraints of (27), the optimal solution $(x^{*}, v^{*})$ can be obtained.

{\begin{cases} \frac{\partial L}{\partial x} = 0 \land \frac{\partial L}{\partial v} = 0 \\ λ_{1} \geq 0 \land λ_{2} \geq 0 \end{cases}

(27)

The ship’s objective is to minimize the risk $R (x, v, ζ)$ , so taking its partial derivative and setting it to zero yields the optimal strategy $ζ^{*}$ , as shown in (28). However, $ζ^{*}$ includes uncontrollable external environmental disturbances affecting the ship, assistance devices are still required for UAV capture, such as an interception net or a synergistic motion platform UAV catcher. UAV’s strategy must balance the three objectives: time, energy consumption, and risk. This can be addressed using the concept of Pareto optimality.³¹ When a solution of $(x^{*}, v^{*})$ is Pareto optimal, it must satisfy the condition that no other solution $(x^{*}, v^{*})$ exists that satisfies (29) while improving one objective without compromising the others. This means finding a set of strategies where the UAV cannot improve one objective without deteriorating the others.

{\begin{cases} t (x, v) \leq t (x^{*}, v^{*}) \land E (x, v) \leq E (x^{*}, v^{*}) \land R (x, v, ζ) \leq R (x^{*}, v^{*}, ζ^{*}) \\ t (x, v) < t (x^{*}, v^{*}) \lor E (x, v) < E (x^{*}, v^{*}) \lor R (x, v, ζ) < R (x^{*}, v^{*}, ζ^{*}) \end{cases}

(28)

The Pareto approach can be implemented using a stepwise search method, adjusting weight iteratively and repeatedly solving single-objective optimization problems. However, this approach is time-consuming in practice and may lead to convergence to local optima, especially for complex systems. An alternative approach for multi-objective optimization is the $ε$ constraint method, which transforms certain objective functions into constraints while optimizing the remaining objective. Assuming that energy consumption $E (x, v)$ and risk $R (x, v, ζ)$ are treated as constraints while optimizing the time objective $t (x, v)$ , the optimization problem can be formulated by (29), where the upper limits for energy consumption and risk are set as $ε_{1}$ and $ε_{2}$ , respectively.

{\begin{cases} \min_{x, v} t (x, v) \\ E (x, v) \leq ε_{1} \land R (x, v, ζ) \leq ε_{2} \\ v_{\min} \leq v \leq v_{\max} \\ x \in X \end{cases}

(29)

This problem can be solved using standard constrained optimization methods such as the Lagrange multiplier method or the Karush-Kuhn-Tucker (KKT)³² conditions. Based on iteratively adjusting and converging $ε_{1}$ and $ε_{2}$ , different Pareto solutions can be explored. The optimal solutions obtained through these methods form a Pareto front, which represents a curve or surface in a multidimensional space illustrating the trade-offs between different strategies. Specifically, the Pareto front demonstrates the compromise among the three objectives under different strategies. When the UAV minimizes energy consumption, landing time and risk may increase. When the UAV minimizes landing time, energy consumption or risk may increase. By applying the KKT conditions and substituting the constraints from (26) into (29) using the $ε$ constraint method, the optimization problem can be reformulated as shown in (30).

L (x, v, λ) = \begin{array}{l} t (x, v) + λ_{1} (E (x, v) - ε_{1}) + λ_{2} (R (x, v, ζ) - ε_{2}) \\ + λ_{3} (v - v_{\min} {()}_{4} (v_{\max} ())) \end{array}

(30)

According to the KKT conditions, the gradient condition, feasibility condition, non-negativity of the multipliers, and complementary slackness condition are expressed separately as shown in (31). These conditions can be used to solve for the optimal $(x^{*}, v^{*})$ .

{\begin{cases} \nabla_{x, v} L (x, v, λ) = 0 \\ E (x, v) \leq ε_{1} \land R (x, v, ζ) \leq ε_{2} \land v_{\min} \leq v \leq v_{\max} \\ λ_{1} \geq 0 \land λ_{2} \geq 0 \land λ_{3} \geq 0 \land λ_{4} \geq 0 \\ λ_{1} (E (x, v) - ε_{1}) = 0 \land λ_{2} (R (x, v, ζ) - ε_{2}) = 0 \land λ_{3} (v - v_{\min}) = 0 \land λ_{4} (v_{\max} - v) \end{cases}

(31)

As shown as (28), a trajectory $(x^{*}, v^{*})$ is Pareto optimal if no other feasible trajectory improves one objective without degrading at least on other, as shown as (32).

\overset{}{\exists} (x, v) : ‖ [\begin{array}{l} t (x, v) \\ E (x, v) \\ R (x, v, ζ) \end{array}] ‖ ≺ ‖ [\begin{array}{l} t (x^{*}, v^{*}) \\ E (x^{*}, v^{*}) \\ R (x^{*}, v^{*}, ζ^{*}) \end{array}] ‖ \land ‖ [\begin{array}{l} t (x, v) \\ E (x, v) \\ R (x, v, ζ) \end{array}] ‖ \neq ‖ [\begin{array}{l} t (x^{*}, v^{*}) \\ E (x^{*}, v^{*}) \\ R (x^{*}, v^{*}, ζ) \end{array}] ‖

(32)

To tune priorities dynamically during different landing phases, the multi-objective vector is scalarized into a unified cost as (26) with time-varying weights as (33).

{\begin{cases} α (t) = {\begin{cases} α_{cruise}, & t > T_{terminal} - T_{final} \\ α_{final}, & t \geq T_{terminal} - T_{final} \end{cases} \\ β (t) = {\begin{cases} β_{cruise}, & t > T_{terminal} - T_{final} \\ β_{final}, & t \geq T_{terminal} - T_{final} \end{cases} \\ γ (t) = {\begin{cases} γ_{cruise}, & t > T_{terminal} - T_{final} \\ γ_{final}, & t \geq T_{terminal} - T_{final} \end{cases} \end{cases}

(33)

This dynamic weighting explicitly models practical landing priorities. In cruise phase energy efficiency is more important setting large $β_{cruise}$ and small $γ_{cruise}$ . In final approaching precision and safety dominate setting large $γ_{final}$ and small $β_{final}$ . To ensure that trade-offs are systematically explored, a second formulation is also implemented using the ε-constraint method as (31) subject to $E (x, v) \leq ε_{1}$ , $R (x, v, ζ) \leq ε_{1}$ . By scanning the pairs $(x, v)$ , a set of non-dominated solutions is produced, forming an approximate Pareto front (34).

P_{ร} = ⋃_{ε_{1}, ε_{2}} (x^{*} (ε_{1}, ε_{2}), v^{*} (ε_{1}, ε_{2}))

(34)

During training epochs, the Critic evaluates each candidate trajectory according to the scalarized cost. But the Actor is additionally constrained to ensure that condition ${[\begin{array}{l} t (x, v) & E (x, v) & R (x, v, ζ) \end{array}]}^{Τ} \in P_{ร}$ is met to preventing the policy from selecting dominated inefficient trajectories. Empirically, policies with higher risk reductions exhibit an approximate ∼23–31% increase in energy consumption and ∼12–19% increase in landing time. This quantifies the trade-offs and confirms that the UAV is indeed operating on a trade-off curve rather than collapsing the problem into a single objective.

To address practical concerns regarding online equilibrium computation, the proposed game is formulated as a semi-dynamic cooperative game in which the ASV exhibits slow-varying dynamics and limited maneuverability, while the UAV executes high-bandwidth control. The information structure is defined such that the UAV has access to:

a. Relative position and velocity via LOS estimation,

b. Envelope boundary constraints,

c. Short-horizon deck motion prediction $\hat{ζ}$ ,

d. Onboard inertial and GNSS states.

The ASV possesses only state measurements relevant to deck motion and environmental disturbance but does not modify its strategy in response to UAV actions during landing. This enforces a one-sided adaptive structure where only the UAV actively optimizes its strategy, making the equilibrium computationally attainable.

The UAV computes equilibrium-approximating strategies through repeated Actor–Critic updates, where each iteration implicitly performs a gradient step toward satisfying the first-order KKT conditions of the multi-objective game. Because the ASV’s motion evolves slowly relative to the UAV’s control loop, convergence is achieved in a receding-horizon sense, with each update ensuring that successive policies remain within the feasible Pareto set. This design prevents instability caused by communication delays or environmental disturbances and guarantees that Nash-like equilibria are approached without requiring explicit full-information negotiation.

To justify the use of Karush–Kuhn–Tucker (KKT) conditions in the multi-objective landing optimization problem, the constrained structure of the landing task is explicitly defined. The UAV must satisfy actuator, spatial, and terminal-phase safety constraints during landing, expressed as (35).

{\begin{cases} g_{1} (v) = ‖ v ‖ - v_{\max} < 0, & actuator limit \\ g_{2} (x) = d_{obs} - d < 0, & obstacke avoidance \\ g_{3} (x) = d_{deck} - r_{deck} < 0, & deck boundary \\ g_{4} (x) = h_{\min} - h < 0, & minimum altitude \\ g_{5} (x) = {\dot{h}}_{\min} - \dot{h} < 0, & sink - rate limit \end{cases}

(35)

Let the scalarized objective from the Pareto framework. The Lagrangian is written as (36) where $λ_{i} \geq 0$ are dual multipliers.

L (x, v, λ) = J_{scalar} (x, v) + \sum_{i = 1}^{5} λ_{i} g_{i} (x, v)

(36)

A control solution ACC satisfies the KKT conditions if the (37) condition hold. These conditions precisely identify which constraints become active during different landing phases.

{\begin{cases} \nabla_{x, v} J_{scalar} (x^{*}, v^{*}) + \sum_{i = 1}^{5} λ_{i}^{*} \nabla_{x, v} g_{i} (x^{*}, v^{*}) = 0, & Stationarity \\ \begin{array}{l} g_{i} (x^{*}, v^{*}), & i = 1, . . ., 5 \end{array}, & Primal feasibility \\ λ_{i}^{*} \geq 0, & Dual feasibility \\ λ_{i}^{*} g_{i} (x^{*}, v^{*}) = 0, & Complementary slackness \end{cases}

(37)

Because Actor–Critic policies are generally non-convex, exact KKT satisfaction cannot be guaranteed. Therefore, a KKT residual metric is defined and measured at each landing rollout (38).

Re s_{KKT} = ‖ \nabla_{x, v} J_{scalar} (x, v) + \sum_{i = 1}^{5} λ_{i} \nabla g_{i} (x, v) ‖ + \sum_{i = 1}^{5} | λ_{i} g_{i} (x, v) |

(38)

Empirically, policies after convergence exhibit $Re s_{KKT} \approx 0.12 \pm 0.03$ while early-stage policies show $Re s_{KKT} \approx 1.8 - 2.7$ indicating that the Actor–Critic solution moves toward a KKT-consistent neighborhood. Thus, KKT is not used as an exact solver, but rather as a quantitative consistency check confirming that the learned policy approximates constrained optimal behavior.

To complement the approximate KKT-based verification and to ensure that the proposed learning-guidance architecture performs reliably in stochastic, time-varying maritime environments, both stability and performance guarantees are jointly evaluated through a Lyapunov-informed regret analysis. Rather than treating Lyapunov stability and regret bounds as independent validation tools, the two are merged into a cohesive robustness assessment framework. Specifically, the underlying sliding-mode and LOS-based guidance structure establishes a deterministic stabilizing backbone, ensuring that the UAV’s tracking errors evolve within a contracting set. While the Actor–Critic component adapts waypoint or command decisions in real time, the stabilizing controller continuously regulates the system toward a decreasing Lyapunov function. This interaction creates a layered control architecture wherein learning-driven exploration is constrained to a region that preserves monotonic reduction of the Lyapunov measure. Thus, even as the policy evolves, the state trajectory remains inside a bounded, forward-invariant neighborhood around the reference path.

Within this stability-preserving region, the regret framework provides a quantitative measure of how effectively the learned policy approximates the optimal constrained behavior over time. Because the tracking errors remain bounded by the Lyapunov-based contraction properties of the low-level controller, the regret analysis is not performed on raw, unstable trajectories but on trajectories whose deviations are guaranteed to remain within a stability-certified corridor. This coupling ensures that each incremental policy update cannot drive the system into unstable or infeasible regions, even in the presence of unmodeled ASV motion, sensor fluctuations, or environmental disturbances. As the Actor–Critic network repeatedly refines its policy, the Lyapunov structure effectively “shapes” the learning landscape, preventing divergence and ensuring that suboptimal actions incur bounded penalties in future performance. As a result, the accumulated regret grows sub-linearly with time consistent with convergent Actor–Critic processes indicating that the learned decisions asymptotically approach the best feasible actions available within the stability-constrained domain.

The combined viewpoint also clarifies why traditional KKT-based analysis alone is insufficient for real-time marine operations: KKT offers instantaneous optimality checks but does not describe the behavior of the algorithm across time or under stochastic perturbations. In contrast, Lyapunov-constrained regret analysis accounts for both temporal adaptation and environmental uncertainty. Lyapunov properties ensure that the UAV remains resilient to disturbances and maintains bounded deviation from the intended guidance path, while the regret measure characterizes the efficiency of learning and thus the quality of real-time adaptation. When integrated, these two tools demonstrate that even if the Actor–Critic policy is non-convex and the environment stochastic the system remains stable and its long-run average performance approaches optimal constrained behavior. This blended guarantee forms a more robust and operationally meaningful alternative to pure KKT verification, validating both the safety and the asymptotic efficiency of the proposed guidance strategy under practical maritime conditions.

As shown in Figure 3, this method illustrates the integrated framework for optimization and control of a tail-sitter UAV during its landing process on an ASV, where the entire design is structured around system modeling, game-theoretic modeling, multi-objective optimization, adaptive guidance control, trajectory planning, UAV system modeling, and reinforcement learning integration.

Figure 3.

The main methodological framework of this study.

The system modeling block establishes the global environment setup, which serves as the foundation for further processes. Within this setup, game-theoretic modeling is introduced to define the landing objective and mathematically prove the existence of a Nash equilibrium, ensuring that both UAV and ASV reach an optimal cooperative state where risks and energy consumption are minimized. The multi-objective optimization module applies Pareto optimization to balance the conflicting goals of minimum risk, shortest time, and lowest energy consumption, while constraints are transformed and validated through Karush-Kuhn-Tucker conditions to guarantee both necessity and sufficiency of optimal solutions. Adaptive guidance control law governs real-time UAV behavior through line-of-sight tracking, guidance commands, and transformations between different guidance modes, supported by Lyapunov and Barbalat stability validation to ensure robustness under uncertainties. The trajectory planning and smoothing block generates and refines cruise trajectories to provide feasible and smooth paths for landing, feeding directly into the UAV system model that represents state equations, control inputs, motion state space, and mode transformations necessary for precise dynamic modeling. On the other side, reinforcement learning integration enhances adaptability by embedding an Actor-Critic structure: initialization is followed by state observation setup, then the actor generates actions in the form of trajectory strategies, while the critic evaluates the value of these actions by updating objective condition parameters. Actor updates subsequently refine the guidance law parameters, with this iterative learning process repeated up to 1000 epochs to ensure convergence, after which the system terminates training. By combining rigorous mathematical modeling, control theory validation, multi-objective optimization, and reinforcement learning, the framework ensures that the UAV achieves stable and precise landing performance under the dynamic and nonlinear environment of ship motion, thereby fulfilling mission requirements of safety, efficiency, and robustness in cooperative UAV-ASV operations.

This study presents the detailed control architecture of the UAV system, showing how different controllers, feedback systems, and hardware components interact to realize autonomous or remotely controlled missions as shown in Figure 4.

Figure 4.

The main methodological framework of this study.

At the top level, the Task Controller integrates the User Interface and Strategy Development modules. The user interface allows for two modes of operation: autonomous task planning, where the UAV follows pre-defined mission objectives, and manual remote control for direct operator intervention. The strategy development block transforms mission objectives into executable paths through multi-objective transformation and generates continuous waypoints that serve as references for the flight controller. The Flight Controller is composed of three main parts: the guidance controller, the attitude controller, and the feedback information system. The guidance controller is responsible for discrete path waypoint generation, application of the adaptive guidance control law, and trajectory planning and smoothing, which ensures feasible and safe flight paths are provided to the UAV. The attitude controller is structured hierarchically, beginning with the position controller, which defines spatial positioning; followed by the heading controller, which governs UAV orientation; and finally the speed rate controller, which regulates velocity for stability and responsiveness. These control signals are then transmitted to the hardware-level attitude controller, which directly interfaces with the UAV actuators. This hardware control includes servos for aerodynamic control surfaces, electrical speed controllers (ESCs) to regulate power distribution, and brushless DC motors that generate thrust for UAV propulsion. The Feedback Information block continuously updates the system by providing real-time state estimation through the Attitude and Heading Reference System (AHRS) and the Global Navigation Satellite System (GNSS), ensuring accurate positional and orientation data. These feedback signals are fed back into the flight controller for closed-loop control, allowing the UAV to dynamically adjust to environmental conditions and disturbances. This architecture demonstrates a tightly integrated control framework in which mission objectives are translated into real-time trajectory execution, stabilized by adaptive control laws, supported by feedback information, and implemented through actuator-level commands. This design ensures that the UAV maintains precise control and stability while executing autonomous or remotely commanded operations in complex flight missions.

3. Experimental analysis

To accommodate different ship motion modes, including circular loitering and linear navigation, this study conducts experiments based on three guidance methods. The guidance law is designed using three primary game-theoretic cost factors: time, energy consumption, and risk. A guidance strategy is proposed, integrating the Envelope Method³³ and Pursuit-Evasion Theory.³⁴ Additionally, constraint conditions and cost functions are designed to facilitate the convergence of the loss function.

The objective of the Envelope Method is to ensure that the UAV remains within a dynamically defined safe region during the landing process, thereby minimizing risk while aiming to reach the landing area in the shortest possible time. Let $E (t) \subset R^{3}$ represent the allowable landing region on the unmanned vessel’s deck, with boundaries that can change over time. The UAV’s position is denoted as $p (t) = (x (t), y (t), z (t))$ , and the dynamic landing point is $p_{d} (t) = (x_{d} (t), y_{d} (t), z_{d} (t))$ . Due to the difficulty of accurately managing energy consumption over short durations, the objective of the envelope method is set as $\min_{t} T$ , i.e., minimizes airborne time. This objective must satisfy the constraints of the safe envelope $p (t) \in E (t) \land \forall t \in [0, T]$ , the ideal terminal condition $p (T) = p_{d} (T)$ , and the distance between the UAV and the landing point decreases over a finite time, i.e., convergence condition $\frac{d}{d t} ‖ p (t) - p_{d} (t) ‖ \leq 0 \land \forall t \in [0, T]$ . The Envelope Method constrains the UAV’s flight within the defined envelope of the unmanned vessel, thereby minimizing risk. If the envelope dynamically adjusts to environmental conditions, an accurate perception and guidance system is required to provide real-time boundary information, ensuring that the UAV operates within the designated envelope.

By constraining the envelope boundary, the guidance domain is restricted within a predefined range. The pursuit-evasion strategy is then applied to formulate optimal strategies when stochastic factors affect both the ship’s and UAV’s motion. Pursuit-evasion theory is a mathematical framework that studies the optimal strategies between a pursuer and an evader within a finite space and environment. This theory is widely applied in fields such as mathematics, computer science, control engineering, and especially in robotic navigation, UAV control, game theory, and security analysis. The goal of the pursuit-evasion strategy is to enable the UAV to rapidly approach the landing point and complete the landing with the shortest possible distance while maintaining acceptable control accuracy. This method is particularly suitable for time-sensitive scenarios. Given the envelope boundary, the distance between the UAV and the ship is denoted as $D (t) = ‖ p (t) - p_{d} (t) ‖$ , and the objective function is expressed as $\min_{u (t)} \int_{0}^{T} D (t) d t = \int_{0}^{T} ‖ p (t) - p_{d} (t) ‖ d t$ , where $u (t)$ represents the control input strategy of the guidance law. Both velocity and acceleration tend toward positive infinity. Therefore, constraints on velocity and acceleration must be imposed, i.e., $‖ v (t) ‖ \leq V_{\max}$ and $‖ a (t) ‖ \leq A_{\max}$ . The designing purpose of the pursuit-evasion strategy is to minimize the UAV’s distance to the landing point, ensuring a rapid approach to the target. This method prioritizes speed but requires dynamic control of velocity and acceleration to compensate for environmental disturbances. By incorporating the pursuit-evasion strategy, which considers dynamic boundaries and real-time position updates, the risk can be reduced, allowing the UAV to land with lower risk.

The predictive control method is used to perform multi-objective optimization by integrating considerations of energy consumption, time, and risk while dynamically adjusting the UAV’s trajectory. Based on the current state, future landing positions are predicted for adaptive adjustments. Currently, the power consumption rate of the UAV is not yet defined, so energy consumption $E (T)$ is defined as $E (T) = \int_{0}^{T} P (t) d t$ , and $P (T)$ represents the power consumption at each time step. Considering the above factors, the objective function is to achieve the lowest possible energy consumption and minimal total risk within a finite time, denoted as $\min_{u (t)} α T + β E (T) + γ \int_{0}^{T} R (t) d t$ , with weighting conditions designed similarly to (27). In this process, risk is treated as a variable constrained by a risk-limiting function, where the envelope distance serves as the constraint $R = \max (0, D_{s a f e} - ‖ p - p_{d} ‖)$ , and $D_{s a f e}$ represents the safety threshold within the envelope range. It is important to note that this risk is only one constraint within the broader risk definition set. The actual risk set $R (x, v, ζ)$ includes risk assessments related to the vessel’s attitude as well as environmental risks affecting the UAV. The predictive control method enables dynamic adjustments based on real-time predictive information, balancing energy consumption, time, and risk as multiple objectives. This method requires high computational resources to predict future states and adjust control inputs, making it particularly suitable for environments with stringent risk control requirements. This completes the integration of the guidance strategy. The envelope method defines the safety boundary, providing constraints and a reference boundary. The pursuit-evasion strategy enhances the guidance process for rapid target approach, while the predictive control method balances multiple objectives and enables adaptive adjustments.

When generating the ideal trajectory, discontinuities may arise. In this study, Bézier curve³⁵ interpolation and the spline interpolation are applied to smooth the trajectory and handle discontinuities. Bézier curves are well-suited for managing abrupt changes in position and velocity due to their smooth transition properties. Suppose a set of discontinuous points ${p_{i}}_{i = 1}^{N}$ , i.e., the UAV’s desired position sequence at time step $i$ , exists on the trajectory. A quadratic Bézier curve is constructed between each three points to smooth transitions from $p_{i - 1}$ to $p_{i + 1}$ , ensuring that both velocity and acceleration change continuously over time. For the envelope method, sudden variations may occur near the envelope boundary. A quadratic Bézier curve interpolation method is applied, with designed constraints ensuring that the curve remains smoothly within the envelope boundary. Each Bézier curve segment $B_{i}$ is defined as $B_{i} = {(1 - t)}^{2} p_{i - 1} + 2 t (1 - t) p_{i} + t^{2} p_{i + 1}$ and must satisfy $t \in [0, 1]$ . The objective is to minimize the total curvature of the Bézier curve, thereby smoothing the transition regions. The curvature $ρ$ is defined as $ρ = ‖ B^{'} \times B' ‖ / {‖ B^{'} ‖}^{3}$ , and the objective function is set to $\min_{p_{i}} \sum_{i = 1}^{N - 2} \int_{0}^{1} ρ (t) d t$ while satisfying the conditions of $B_{i} \in E \land \forall t \in [0, 1]$ . The temporal constraints during this process are primarily designed for trajectory smoothing.

To construct a cubic spline function $S_{i}$ for the given sequence ${p_{i}}_{i = 1}^{N}$ , define $S_{i} (t) = a_{i} t^{3} + b_{i} t^{2} + c_{i} t + d_{i}$ with $t \in [t_{i}, t_{i + 1}]$ . The continuity condition requires that displacement, velocity, and acceleration must be continuous at each point, i.e., the constraints on function $S_{i}$ are given as $S (t_{i + 1}) = p_{i + 1}$ , $\dot{S} (t_{i + 1}) = v_{i + 1}$ , and $\ddot{S} (t_{i + 1}) = a_{i + 1}$ . The curvature minimization process ensures smoothness by minimizing the integral curvature, defined as $\min_{{a_{i}, b_{i}, c_{i}, d_{i}}} \sum_{i = 1}^{N - 1} \int_{t_{i}}^{t_{i + 1}} ‖ \ddot{S} (t) ‖ d t$ . In the pursuit-evasion strategy, discontinuities mainly occur during the UAV’s accelerated approach to the ship’s landing point, particularly when making abrupt changes in velocity or acceleration. Using cubic spline interpolation for smoothing ensures that acceleration remains continuous across each curve segment. Trajectory adjustment discontinuities may arise when the UAV rapidly responds to external predictive variations. A time-weighted smoothing method is applied to prevent excessive acceleration changes during transition phases. A set of low-pass filtering smooth functions is constructed to refine trajectory transitions, gradually aligning with the desired trajectory. The optimization objective function $\min_{p} \int_{0}^{T} {‖ p - p_{d} ‖}^{2} + λ {‖ \dot{p} ‖}^{2} + μ {‖ \ddot{p} ‖}^{2} d t$ is designed with weighted terms $λ$ for velocity smoothing and $μ$ for acceleration smoothing, ensuring a smoother trajectory as time approaches the target.

By applying quadratic Bézier curves to limit the envelope constraints, the UAV’s position undergoes a smooth transition while satisfying the envelope limitations. The pursuit-evasion strategy employs cubic spline interpolation to ensure continuity in displacement, velocity, and acceleration. The predictive control method utilizes time-weighted optimization to smooth velocity and acceleration variations, adapting to predictive changes.

The guidance relationship between the ship and the UAV is established based on a Nash equilibrium game condition. A loss function is formulated by defining three global cost functions. Reinforcement learning, specifically the Actor-Critic method, is employed to optimize the three guidance trajectory designs, treating the UAV’s control strategy as a continuous decision-making process. The global loss function enables the strategy determined by the Actor network to be updated according to the estimated values from the Critic network, ultimately achieving an optimized guidance trajectory. The objective is for the UAV to complete landing in the shortest possible time, making time cost equivalent to the total mission duration. Since energy consumption is approximately proportional to the square of velocity, integrating the square of velocity serves as an approximation for energy cost. The risk cost is defined based on the UAV’s deviation from the envelope boundary and the ship’s instability score. The deviation cost accumulates at each time step, with penalties intensified when the distance exceeds the safety threshold. Considering time, energy consumption, and risk, the global composite loss function is defined as $L = α J_{time} + β J_{energy} + γ J_{risk}$ according to (29) and (31).

In the Actor-Critic architecture,³⁶ two networks are used. The Actor generates the strategy $π (a | s; θ)$ , i.e., determines the action $a$ based on the current state $s$ , and is parameterized as network $θ$ . The Critic evaluates the value function $V (s; ω)$ of the strategy $π (a | s; θ)$ , parameterized by the network $ω$ , to assess the value of the current strategy and provide a learning signal for updating the Actor network. The input to the Actor-Critic network includes the state $s$ , comprising the UAV’s current position, velocity, predicted landing position, envelope state, and wind speed information. The output is the next position change vector for the drone, namely the heading waypoint command. The loss functions of the Actor-Critic need to be defined separately. The Critic network’s loss function $L_{critic}$ is defined by using the temporal-difference error for updating the Critic network. It is expressed as $L_{critic} = \frac{1}{2} {(r + ϒ V (\dot{s}; ω) - V (s; ω))}^{2}$ , where $r$ is the feedback obtained by the Actor through action $a$ in state $s$ , derived based on the negative value of the loss function. $ϒ$ is the discount factor, and $V (\dot{s}; ω)$ is the value estimation function of the next state $s^{'}$ . The Actor network’s loss function $L_{actor}$ is learned using the strategy gradient method and defined as $L_{actor} = ‐ \log π (a | s; θ) \cdot (r + ϒ V (\dot{s}; ω) - V (s; ω))$ . The Actor updates its action probability based on the temporal-difference error, making actions that reduce the loss more likely.

The reinforcement learning training algorithm consists of seven key steps: initialization, state observation, action generation, state update and feedback computation, Critic update, Actor update, and repeat iteration. The process is described as follows:

1. Initialization: Initialize the parameters of the Actor network $θ$ and the Critic network $ω$ . Set the weight parameters $(α, β, γ)$ required for the global loss function $L = α J_{time} + β J_{energy} + γ J_{risk}$ .

2. State Observation: At each time step, observe the global state $S$ and local state $s$ .

3. Action Generation: The Actor generates a predefined action based on the current policy $π (a | s; θ)$ to move the UAV toward the designated waypoint $a$ .

4. State Update and Reward Computation: After executing the action $a$ , update the UAV state $s^{'}$ and compute the reward $r$ . The reward is determined based on the negative value of the loss function, guiding the UAV to minimize the global loss function.

5. Critic Update: Compute the temporal difference error $(r + ϒ V (\dot{s}; ω) - V (s; ω))$ and minimize $L_{critic}$ to update the $ω$ of the Critic network.

6. Actor Update: Update the $θ$ of the Actor network based on the loss function $L_{actor}$ to enable the UAV to select optimal actions that minimize the total loss.

7. Repeat Iteration: Repeat steps 2 through 6 until the specified number of iterations is reached.

The Actor-Critic method enables the UAV to autonomously learn strategies that minimize the loss function across three different guidance trajectory designs under various environmental conditions. This framework integrates UAV decision-making with multi-objective optimization of the loss function, allowing the UAV to achieve an optimal balance between time, energy consumption, and risk. To optimize UAV landing on a ship, two different ship motion trajectories, circular navigation and linear sailing, must be designed and refined using the Actor-Critic method. This design is based on the relative motion between the UAV and the vessel, as well as the optimized guidance trajectory, ensuring that the UAV can land within a short time while minimizing energy consumption and risk. The study further explores how to optimize UAV landing trajectories based on the distinct characteristics of these two navigation modes using the Actor-Critic framework.

Both the actor and critic are implemented as lightweight feed-forward neural networks suitable for deployment on embedded avionics hardware. The actor network consists of three hidden layers with 64-128-64 units respectively, employing Rectified linear unit (ReLU) activation in all hidden layers and a hyperbolic tangent output layer to enforce bounded control commands. The critic network mirrors the actor’s topology but outputs a scalar state-action value. Learning rates are set to 3×10⁻⁴ for the actor and 10⁻³ for the critic, optimized using Adam. The discount factor is fixed at γ=0.98, balancing short-horizon landing priorities with energy-sensitive considerations in the earlier approach phase.

Exploration is implemented via a Gaussian action perturbation strategy with annealed variance, initialized at 0.3 and decaying exponentially to 0.05 over 50,000 steps, allowing gradual transition from exploration to exploitation as the policy stabilizes. To ensure sample efficiency under the partially observable conditions typical in maritime environments, an experience replay buffer of size 100,000 is used, with minibatch updates of 128 samples drawn at each learning step. Temporal difference (TD) error governs critic updates, and the actor adjusts its parameters by following the deterministic policy gradient direction weighted by the TD error, meaning that large TD errors amplify corrective updates while small errors smooth policy refinements. This produces natural adaptive behavior when the ASV induces sudden deck motion, while maintaining stability when the trajectory is near-optimal.

This study simulates UAV and unmanned vessel system parameters based on real-world specifications and designs guidance waypoints within the simulation environment. The UAV’s maneuverability is first validated under quantifiable wind speed conditions using the Beaufort wind scale, conducting experimental tests across wind levels from 1 to 5. The primary objectives include assessing fundamental stability and maneuverability under light wind, response performance under moderate wind, flight control and stability under mid-range wind speeds, control performance and tolerance under strong wind conditions, and determining the UAV’s operational limits under high wind speeds. The discrete distances between waypoints, denoted as ${p_{i}}_{i = 1}^{N}$ , are configured according to performance parameters, and the envelope constraint surface is designed along the UAV’s trajectory. Additionally, vertical takeoff and landing constraints are established based on performance limitations, including restrictions on pitch and yaw angles.

In the simulation experiments, the ASV does not execute an active strategy update during UAV landing. Instead, its trajectory is predetermined linear or circular, and deck motion disturbances are treated as external inputs. This design reflects realistic maritime operations in which ASVs typically maintain navigation objectives and do not dynamically optimize their trajectory to assist UAV landing. Under this constraint, the UAV becomes the primary decision-making agent, and the game-theoretic model reduces to a single-sided optimization problem in which the ASV’s “strategy” is represented by its disturbance profile.

The Actor–Critic framework iteratively approximates the Nash response by learning within this structured environment: the UAV’s critic network evaluates the multi-objective cost induced by the ASV’s motion, while the actor adjusts its waypoint strategy to minimize this compounded cost. This setup ensures that the learned strategy reflects physically realistic autonomy levels without assuming inaccessible perfect cooperation from the ASV.

The experimental evaluation of the proposed guidance framework is conducted under the assumption that both the UAV and the ASV are equipped with a fully functional Attitude and Heading Reference System (AHRS) or an equivalent state estimation module. All state variables used in this study including position, velocity, vertical speed, attitude angles, and deck motion estimates are taken after AHRS-level filtering or state-fusion processing. The design, tuning, or validation of the AHRS filters (e.g., complementary filtering, EKF-based sensor fusion, or model-based disturbance rejection) is not within the scope of this research. Instead, we assume that the flight controller already operates with well-designed and fully optimized attitude and stabilization loops, as would be expected in a mature commercial or research-grade UAV autopilot.

Under this assumption, the UAV’s inner-loop control system provides stable, low-level attitude tracking at bandwidths higher than those required for trajectory or waypoint updates. Therefore, the present study focuses exclusively on the generation of optimal waypoints and the associated motion strategy for the landing task. The proposed game-theoretic, Pareto-based, and Actor–Critic-guided framework operates at the outer-loop level, producing a sequence of dynamically feasible waypoints. These waypoints are subsequently processed by the UAV’s onboard autopilot, which is treated as an idealized, stable, and fully tuned system capable of executing the commanded positions and velocities with negligible internal delay or instability.

All sensor streams used in the experiments including ship deck motion, relative UAV-to-ASV pose, vertical sink rate, and horizontal drift are assumed to have undergone the full filtering chain implemented on the autopilot platform. The filtered data therefore represent the best available real-time estimates that a practical UAV–ASV system would provide. This modeling choice isolates and emphasizes the contribution of the present study, namely, the investigation of optimal landing waypoint generation, adaptive decision-making, and multi-objective optimization under realistic maritime disturbance profiles.

To replicate marine operational conditions, the deck motion of the ASV is generated based on real IMU recordings from small unmanned surface vessels, filtered to the dominant wave-induced frequency band (0.1–1 Hz). Wind disturbance, communication delay, and GNSS fluctuations are incorporated into the relative pose measurements but are attenuated through the assumed AHRS filtering pipeline. The resulting filtered state vector is then provided to the Actor–Critic network, which updates its waypoint-generation policy based on temporal-difference learning and the multi-objective cost function defined earlier.

By separating low-level attitude control from high-level waypoint planning, the experiment design ensures that the evaluation highlights the effectiveness of the proposed guidance method rather than coupling it with controller-specific tuning. The results therefore reflect the performance of the waypoint-generation strategy under realistic yet appropriately filtered environmental and system conditions, consistent with the study’s scientific focus.

The experimental prototype of UAV in this study is shown in Figure 5(a), while the unmanned ship is depicted in Figure 5(b). It is explicitly emphasized that this work does not address the design of UAV attitude control laws, AHRS filtering algorithms, or state-estimator calibration strategies; these modules are assumed to be fully functional and optimized. The scientific contribution of this study lies solely in the generation of optimal landing waypoints and the corresponding guidance logic.

Figure 5.

Experimental prototypes of (a) UAV and (b) unmanned ship.

The simulation environment is constructed to analysis the control performance. The simulated wind speed is given by Beaufort wind force scale with level 1 to level 5. Figure 6(a)-(e) illustrates the motion responses of the UAV under the level 1 to level 5 wind speeds respectively. Figure 6(f) displays the tracking error of the UAV for level 1 to level 5 wind speeds. The results of Figure 6 provide the performance constraints for UAV operations in the simulation environment, ensuring that the findings contribute to refining the guidance law design and optimizing learning parameter settings.

Figure 6.

Motion responses of the UAV under the (a) level 1, (b) level 2, (c) level 3, (d) level 4, (e) level 5, and (f) tracking error responses of UAV under level 1 to level 5 wind speeds.

In order to ensure that the proposed guidance strategy remains both practically feasible and representative of real-world maritime conditions, the simulations in this study were conducted under Level 3 operational intensity, which corresponds to moderate but realistic environmental disturbances. Specifically, performance constraints were defined to guarantee UAV stability while maintaining experimental fidelity to real-world dynamics. The maximum allowable tracking error was restricted to ±0.5 m in position and ±2° in attitude, ensuring that the UAV remained within safe proximity to the vessel deck during landing maneuvers. Attitude-rate limits were imposed at ±15°/s, preventing excessive angular accelerations that could compromise control authority or structural safety. Furthermore, the upper wind-level limit was set to 8-10 m/s, reflecting sea-state conditions where UAV operations are still viable but subject to significant aerodynamic disturbances. These thresholds were incorporated directly into the simulation environment as boundary conditions, constraining the optimization process and guiding the reinforcement learning model to operate within safe margins.

By adopting these quantified constraints, the study achieves two objectives: first, it ensures that UAV performance remains within operational safety envelopes; second, it enhances the realism of the simulation framework by aligning experimental conditions with maritime standards. The choice of Level 3 intensity reflects a deliberate balance challenging enough to test robustness under dynamic disturbances, yet sufficiently representative of practical deployment scenarios. This approach strengthens the validity of the results and provides confidence that the proposed guidance strategy can be extended to real-world UAV-ASV cooperative operations without compromising safety or reliability.

As shown in Figure 7, the motions of the ship are commanded as circular and linear trajectories, while the risk factors caused by ship attitude are represented as noise in $ζ$ . The ship serves as a fixed coordinate system relative to the UAV for observation. Physical simulation is not conducted in an experimental environment. The experiment focuses on analyzing the guidance relationship between the UAV and the ship under linear and circular navigation, including time consumption, energy consumption, risk, optimal UAV trajectory, and the convergence of the global loss function.

Figure 7.

Motions commands of the ship: (a) Linear and (b) circular trajectories.

Based on the given ship’s trajectories of Figure 7, the flight trajectory responses for the UAV in both vertical and fixed-wing modes can be found in Figure 8. Figure 8(a) presents the UAV flight trajectory response in fixed-wing mode, while that of vertical mode is given in Figure 8(b).

Figure 8.

Flight Trajectory of UAV under (a) fixed-wing mode and (b) vertical mode.

In practical, due to limitations in controller update rates and system bandwidth, it is not feasible to directly generate and execute an ideal trajectory using continuous functions. Instead, the waypoint of the actual UAV guidance system is updated by an interval of 0.1 seconds. In vertical flight mode at a speed of 25 mph, the waypoint is updated by 1.17 meters, while that of fixed-wing flight mode is 100 mph and its waypoint is updated by 4.47 meters, as shown in Figure 9(a). The update process of Figure 9(b) involves generating 720 candidate waypoints radiating outward from the current waypoint. These waypoints form a decision tree, where each decision is evaluated using the global loss function. The results are then provided to the Actor-Critic framework for iterative optimization, enabling trajectory generation based on waypoint updates.

Figure 9.

(a) UAV waypoint update distance and (b) prediction trajectory update method.

3.1. UAV landing under straight-line ship navigation

For UAV landing under straight-line ship navigation, the motion control of the UAV must adapt to the motion characteristics of the ship’s speed and direction. Straight-line navigation features a relatively simple motion mode, making UAV’s landing control and guidance more straightforward and predictable. During the training process, as shown in Figure 10, the landing strategy of the UAV undergoes 1,000 iterations to gradually adapt and optimize its landing strategy. The precision of the landing strategy is evaluated using the mean squared error (MSE) of the loss function in each iteration. In the initial iterations, the MSE of the loss function is relatively large because the UAV’s strategy is not yet well-developed and requires continuous adjustment and optimization. For the training progresses, the strategy can be improved to handle the ship’s straight-line motion pattern, resulting in a gradual decrease of MSE, reflecting strategy improvement and optimization. For the end of 1,000 iterations, MSE is expected to reach a low level, indicating that it can accurately control and guide its landing process while achieving the objectives of minimal risk, shortest landing time, and lowest energy consumption.

Figure 10.

MSE of the loss function for 1000 iterations for UAV landing in a straight-line ship navigation.

The UAV landing motion simulation under straight-line ship navigation demonstrates the UAV’s progress throughout the training process by comparing its top-down landing trajectories from the 1^st and 1000^th iterations. In the 1^st iteration, as shown in Figure 11, the UAV’s landing strategy is not yet mature and remains in the adaptation phase for the ship’s straight-line motion. The top-down trajectory may exhibit significant deviation and instability. To prevent frequent divergence in the early training stages, the study employs manual guidance for the first five epochs. However, the UAV’s landing trajectory remains insufficiently smooth, requiring multiple adjustments and deviating from the intended trajectory. At this stage, the MSE of the loss function is relatively large, indicating that the UAV has not yet fully adapted to the straight-line navigation mode and requires the further optimization and learning. After 1000 iterations, the UAV’s landing strategy is fully optimized, achieving effective adaptation to the ship’s straight-line motion characteristics. The top-down trajectory reveals a stable and smooth landing trajectory during the final guidance phase. The landing process of the UAV can precisely be controlled for maintaining high consistency under the ship’s motion trajectory. At this stage, the MSE of the loss function significantly be decreased, demonstrating the efficiency and accuracy of the optimized strategy, achieving the objectives of minimal risk, shortest landing time, and lowest energy consumption.

Figure 11.

Top-view trajectory of UAV landing motion during the ship’s straight-line navigation with the 1^st and 1000^th iterations.

As shown in Figure 12, the 3D trajectory of UAV landing motion offers a more intuitive representation of its movement modes in both vertical and horizontal dimensions. The landing strategy for the UAV is still under development, showing limited adaptability to ship navigating in a straight line. In the 1^st iteration, the 3D trajectory of the UAV may exhibit significant deviations and instability. In the vertical dimension, multiple height adjustments can be observed, while in the horizontal dimension, the path may lack smoothness and even display lateral offsets. The initial landing trajectory reflects the need for continuous adjustments to align with the ship’s straight-line motion. After 1000 iterations, the UAV’s landing strategy has been significantly improved. In the 1000^th iteration, the 3D trajectory of the UAV is highly stable and smooth. Adjustments in the vertical dimension are more precise, maintaining a stable height within the predetermined range. On the horizontal plane, the path becomes distinctly linear, allowing the UAV to land accurately along the ship’s straight-line navigation. The refined landing trajectory demonstrates that the UAV has achieved precise control over its landing process, meeting the goals of minimal risk, shortest time, and reduced energy consumption.

Figure 12.

(a)3D trajectories of UAV landing motion during the ship’s straight-line navigation and (b) attitude with the 1^st and 1000^th iterations.

Compared to straight-line navigation, curved navigation of a vessel involves more complex motion modes with continuously changing speed and direction, as shown in Figure 13. This presents greater challenges for drone landing control and guidance. Under such conditions, the drone requires more flexible and precise adjustments to its strategy to adapt to the vessel’s motion characteristics. In this training process, 1000 iterations are still utilized to progressively optimize the drone’s landing strategy. At the initial stage, due to the uncertainty and complexity of curved navigation, the MSE of the drone’s loss function is likely to be significantly higher than in straight-line navigation scenarios. However, through adaptive optimization using the Actor-Critic framework, the drone gradually learns and adapts to the vessel’s curved motion characteristics, continuously adjusting its landing strategy to reduce the loss function’s MSE. As training progresses, the drone’s strategy becomes increasingly refined, and the MSE of the loss function is expected to decrease, demonstrating substantial improvements and optimization effects. By the end of the training, approaching the 1000th iteration, the drone should be able to effectively handle the vessel’s curved navigation and achieve accurate landings, meeting the goals of minimal risk, shortest time, and reduced energy consumption.

Figure 13.

MSE of the loss function for 1000 iterations during the drone landing training in the curved navigation of a vessel.

The top-view trajectories of drone landing motion for the 1st, 500th, and 1000th iterations during circular navigation of a vessel reflect the learning and adaptation process of the drone under complex motion modes as shown as Figure 14. In the 1st iteration, the drone’s landing strategy is in its initial phase, lacking sufficient capability to respond to the intricate movement mode of circular navigation. To prevent frequent divergence during the early stages of the experiment, manual guidance is employed for training during the first five epochs of initialization. The top-view trajectory may display notable deviations and instability. The drone’s landing path might appear discontinuous with multiple adjustments, struggling to accurately follow the vessel’s motion trajectory. At this stage, the MSE of the loss function is high, indicating the drone’s lack of adaptation to the circular navigation mode. After 500 iterations, the drone’s landing strategy is progressively optimized, showing moderate adaptability to the vessel’s circular navigation motion. The top-view trajectory reveals a relatively more stable and smoother path compared to the 1st iteration, although certain deviations and adjustments may still be present. The drone demonstrates improved capability to follow the vessel’s motion trajectory, and the MSE of the loss function significantly decreases, reflecting advancements in strategy and optimization effectiveness. By the 1000th iteration, the drone’s landing strategy has been thoroughly refined, achieving high adaptability to the vessel’s circular navigation characteristics. The top-view trajectory indicates a highly stable and smooth landing path. The drone can precisely control its landing process, maintaining consistent alignment with the vessel’s motion trajectory. At this stage, the MSE of the loss function drops dramatically, demonstrating the strategy’s efficiency and precision in meeting the goals of minimal risk, shortest time, and reduced energy consumption.

Figure 14.

Top-view trajectories of drone landing motion for the 1st, 500th, and 1000th iterations during the circular navigation of a vessel.

In the circular navigation experiment of the vessel, the 3D landing trajectories of the drone illustrate its adaptation and learning process under complex motion modes, as shown in Figure 15. The drone initially struggles to adapt to the motion characteristics of circular navigation, with its strategy still under development. During the 1st iteration, the 3D trajectory might exhibit significant deviations and instability. Frequent height adjustments could occur in the vertical dimension, coupled with poor precision in height control. On the horizontal plane, the trajectory might lack smoothness, making it difficult to accurately follow the vessel’s circular navigation path. The landing trajectory at this early stage reflects the need for further learning and strategy refinement to handle the complexities of circular navigation. As the iterations progress, the drone gradually optimizes its landing strategy. By the 500th iteration, the 3D trajectory shows noticeable improvements. Height control in the vertical dimension becomes more stable, though some adjustments may still be necessary. The horizontal trajectory is smoother compared to the initial stage, yet certain deviations and path corrections might persist. The landing trajectory at this intermediate stage demonstrates the drone’s progress in adapting to circular navigation patterns but indicates the need for continued optimization. After 1000 iterations, the drone’s landing strategy achieves full optimization. During the 1000th iteration, the 3D trajectory exhibits exceptional stability and precision. Height control in the vertical dimension becomes highly accurate, consistently maintaining the intended height range. On the horizontal plane, the trajectory becomes remarkably smooth, allowing the drone to accurately follow the vessel’s circular navigation path during landing. The final stage of the landing trajectory demonstrates the drone’s ability to achieve precise landings in complex circular navigation modes, meeting the objectives of minimal risk, shortest time, and reduced energy consumption.

Figure 15.

(a)3D trajectories of drone landing motion and (b) attitude for the 1st, 500th, and 1000th iterations during the circular navigation of a vessel.

This study investigates landing control strategies of Tail-sitter drones under vessel motion, focusing on straight-line and circular navigation. In early iterations, both modes show unstable trajectories with frequent height adjustments and horizontal deviations. As iterations progress, the strategies gradually improve, with smoother 3D motion and more precise control. By the 1000th iteration, optimization is achieved in both modes: the drone maintains stable height within the intended range, and its horizontal trajectory aligns accurately with the vessel’s path. Overall, the optimized strategies minimize risk, shorten landing time, and reduce energy consumption.

In the integrated framework of adversarial game theory and reinforcement learning, UAV landing optimization on autonomous surface vessels is driven by three objectives: minimizing energy, reducing time, and lowering risk. Energy converges fastest, as effective thrust allocation and trajectory planning are quickly learned through Actor–Critic updates. Landing time converges at a moderate pace due to trade-offs with energy and risk, requiring gradual policy refinement. Risk reduction is the slowest, with early oscillations under adversarial disturbances, but eventually achieves robust safety. A hierarchical convergence prioritizing risk, then time, and finally energy ensures optimal outcomes, guiding UAVs toward Nash equilibrium and Pareto-optimal solutions. This design secures reliable convergence, avoids local optima, and provides a robust learning trajectory for cooperative UAV-ASV landing tasks (as shown in Figures 16–18).

Figure 16.

Multi-objective optimization by game theory for 3 strategy condition (Energy).

Figure 17.

Multi-objective optimization by game theory for 3 strategy condition (Time).

Figure 18.

Multi-objective optimization by game theory for 3 strategy condition (Risk).

This study examines the landing strategies of Tail-sitter drones under straight-line and circular vessel navigation, highlighting iterative optimization. While results show significant improvement, several factors remain uncertain. Meteorological conditions such as wind and rainfall could affect stability, yet experiments assume ideal weather. Vessel trajectories in practice may be more complex than straight or circular paths, requiring further testing. Communication delays and interference, also excluded from experiments, could hinder real-time control. Addressing these variables in future research is essential to enhance resilience and ensure reliable drone landings under diverse operational conditions.

The simulation results presented in this study are designed primarily to evaluate the iterative convergence behavior and the methodological effectiveness of the proposed waypoint-generation framework over a sequence of 1-1000 learning iterations. Rather than providing an exhaustive statistical characterization of all possible landing outcomes, the objective of the experimental setup is to demonstrate that the integrated game-theoretic, Pareto-guided, and Actor–Critic structure is capable of producing progressively improved landing trajectories and consistent convergence toward viable guidance solutions. This iterative analysis establishes the feasibility of the approach and lays the theoretical and algorithmic foundation for future work on full terminal guidance integration.

Figure 14 illustrates one of the central contributions of the proposed waypoint-generation framework: the ability to produce a smooth, convergent circular-descent trajectory during the approach phase of the landing process. The significance of this result does not lie in terminal touchdown precision which is beyond the scope of the present work but rather in demonstrating that the learning-based guidance strategy can generate progressively refined, dynamically consistent waypoints that yield a stable, non-oscillatory descent pattern.

The circular-descent behavior shown in the figure highlights how the Actor–Critic policy, shaped by Pareto-based multi-objective optimization, effectively suppresses abrupt lateral corrections and avoids discontinuities in commanded motion. This produces a guidance path that is both geometrically smooth and dynamically feasible for the UAV’s inner-loop attitude controller, which is assumed to be fully tuned and stable. From a guidance-system perspective, the ability to maintain a smooth, low-jerk trajectory during descent is critical: it reduces the control burden on the attitude loop, minimizes transient thrust disturbances, and preserves overall vehicle stability as the UAV transitions into tighter proximity with the ASV deck.

Thus, the result in Figure 14 should be interpreted as evidence of trajectory-level convergence and motion regularization, not as an indicator of final landing accuracy. The focus of this study is on learning-based waypoint generation and its capacity to produce structured, stable approach geometries of which the circular-descent pattern serves as a clear demonstration. The development of a full terminal landing controller capable of precision deck touchdown is reserved for future work and requires additional considerations such as close-range perception, deck pose estimation, and terminal-phase disturbance rejection.

Accordingly, the experimental results focus on evaluating whether the learning architecture achieves (i) monotonic improvement in the multi-objective cost, (ii) consistent reduction in temporal-difference error, (iii) stability under filtered AHRS-based state inputs, and (iv) adherence to the operational constraints modeled in the proposed optimization structure. While comprehensive statistical metrics such as mean landing error, standard deviation, miss distance distributions, and energy-per-attempt analyses are valuable for assessing field robustness, they fall outside the primary scope of the current investigation. These metrics are more appropriate for a later stage of research, when the algorithm is integrated with a complete terminal descent controller and validated under real hardware-in-the-loop or field-test conditions.

Similarly, the simulation environment used in this work adopts representative wave-induced deck motions and wind disturbances, but it does not attempt to model the full range of maritime operational envelopes such as explicit Beaufort scale classifications, rapidly changing sea states, sudden ship maneuvers, communication loss, or GPS-denied conditions. These extreme or failure-mode scenarios involve additional modeling complexities, controller-switching logic, and safety-critical behaviors that go beyond the intermediate objective of validating the waypoint-generation strategy itself. For these reasons, the present study restricts its scope to demonstrating core functional correctness, iterative policy improvement, and adherence to the multi-objective landing logic within a representative but controlled simulation environment.

This study’s applications of multi-objective optimization can be expanded beyond risk, time, and energy to include safety, stability, and cost for more comprehensive outcomes. Reinforcement learning has proven effective, and future work could explore advanced methods such as deep reinforcement learning and imitation learning to enhance adaptability. Beyond landing strategies, broader drone–vessel collaboration in navigation, information sharing, and joint operations could improve system efficiency. Despite demonstrated effectiveness under current vessel motion modes, further research is needed to address unknown factors and extend optimization to diverse objectives, algorithms, and cooperative scenarios. Such advancements would strengthen the theoretical foundation and technical support for UAV operations in complex environments. A detailed discussion of how the proposed framework can be extended toward full operational deployment covering emergency behaviors, sudden ASV motion changes, packet losses, estimator degradation, and more realistic sea-state spectra is provided separately in the Discussion section. These considerations outline the required next steps for transitioning from simulation-based waypoint-generation evaluation to a comprehensive end-to-end autonomous maritime landing system. Accordingly, the value of the circular-descent trajectory in Figure 14 lies in its demonstration of stable waypoint evolution and smooth path shaping rather than in terminal touchdown accuracy, which is intentionally excluded from the scope of this study.

4. Discussion

Although the proposed dynamic guidance framework demonstrates promising results in simulation, several limitations must be recognized. The current study relies primarily on modeled disturbances and controlled experimental conditions; the absence of extensive sea trials under highly variable maritime environments constrains the generalizability of the findings. The integration of sliding mode control, game-theoretic optimization, and reinforcement learning has been validated in a unified architecture, yet the computational burden of real-time implementation on resource-constrained UAV platforms remains insufficiently explored. While the Actor-Critic network adapts to evolving vessel dynamics, its performance under sensor degradation, communication delays, or adversarial disturbances has not been systematically evaluated, leaving open questions regarding robustness in operational deployments. A notable outcome of the proposed waypoint-generation framework is its ability to produce smooth, continuously convergent descent geometries, such as the circular-descent pattern illustrated in Figure 14. This characteristic offers several important advantages for UAV landing in maritime environments, where deck motion, wind shear, and ship-induced turbulence amplify the sensitivity of the vehicle to abrupt guidance commands. Smooth descent geometries inherently reduce high-frequency lateral and vertical accelerations, thereby minimizing the control effort required from the UAV’s inner-loop attitude controller. Because the present study assumes that the attitude controller is fully optimized and operating at stable bandwidths, the smoothness of the outer-loop guidance directly translates into more predictable and less saturated actuator responses, lowering the probability of transient destabilization during the approach phase. Smooth and gradually converging descent paths mitigate the risk of amplification of deck motion disturbances. Sudden or aggressive trajectory corrections often couple with ship motion in undesirable ways, particularly when the ASV exhibits low-frequency roll or pitch oscillations. By contrast, a circular or quasi-helical descent distributes lateral corrections over extended path segments, effectively averaging out short-term disturbances and preventing phase alignment with the ship’s oscillatory modes. This decoupling effect enhances both dynamic safety and deck-relative positional stability as the UAV transitions into close-range proximity.

A structured smooth descent improves downstream compatibility with terminal landing logic. In practical maritime UAV operations, the terminal 3 to 5 meters above the deck require specialized visual, LiDAR-based, or RF beacon-based localization algorithms to refine final relative positioning. These terminal-phase controllers typically assume that the UAV enters the close-range region with low residual lateral velocity, low jerk, and minimal heading oscillation. The smooth descent trajectories produced in this study satisfy these preconditions by ensuring that motion is already stabilized before the terminal guidance layer activates. In effect, smooth descent geometry acts as a conditioning layer, preparing the UAV for more precise but more sensitive terminal landing behaviors.

Smooth descent patterns offer operational benefits in safety-critical and crewed maritime environments. A predictable and visually interpretable descent profile reduces operational uncertainty for both onboard crew and autonomous ship systems, lowering the likelihood of emergency aborts triggered by erratic UAV movement. This predictability is particularly valuable when integrating UAVs with autonomous vessel systems that rely on cooperative sensing or anticipatory deck-motion compensation.

The smooth descent geometry observed in the results is not a secondary artifact but a structural advantage of the waypoint-generation method proposed in this work. It contributes to stability, reduces control burden, mitigates disturbance coupling, and improves compatibility with terminal landing modules. These benefits reinforce the argument that trajectory shaping rather than only terminal accuracy is a critical component in developing reliable autonomous maritime landing systems.

The present study provides the architectural and algorithmic specifications necessary for reproducing the proposed Actor–Critic framework, several practical considerations remain beyond the scope of the current manuscript and therefore constitute avenues for future research. While the experiments include realistic latency assumptions, a full hardware-in-the-loop (HIL) evaluation using an actual embedded flight computer is not presented here. Such real-time tests would allow detailed profiling of computational jitter, thermal throttling effects, and long-duration memory fragmentation factors that can influence onboard learning stability. Although the actor–critic design implicitly handles partial observability by incorporating filtered historical states, more advanced techniques such as LSTM-based recurrent critics or belief-state estimators (e.g., extended Kalman filters specifically tuned for nonstationary marine motion) could further improve policy robustness. These approaches, however, introduce significantly higher computational cost and were omitted here to maintain feasibility for small-format UAV processors. While exploration noise is carefully tuned for convergence, a more formal treatment of exploration-exploitation trade-offs under ship-induced nonstationary disturbances could be developed using adaptive exploration schedules or Bayesian uncertainty estimation. Such methods would deepen the theoretical grounding of the learning framework but also require substantial expansion beyond the intended scope of this study. Onboard training though conceptually valuable for long-term adaptation was not evaluated; the present study focuses solely on offline training with online inference. Future work could consider incremental or continual learning strategies that update the policy during flight, accompanied by formal stability safeguards to ensure that live adaptation cannot violate the safety envelope during deck-landing operations.

While the present study focuses on validating the algorithmic feasibility of the proposed guidance framework, several real-world operational challenges remain outside the scope of the current simulation campaign. Sudden ASV maneuvers, unexpected communication interruptions, GPS-denied operation, high-sea-state transitions, rapid wave-slope changes, and emergency abort scenarios all require additional layers of robustness logic and fault-tolerant control that are not modeled here. These behaviors typically involve switching between guidance modes, fail-safe recovery procedures, and sensor-level redundancy management, all of which will be addressed in future work as the waypoint-generation module is integrated with a complete terminal landing controller. The current results should therefore be interpreted as demonstrating the viability of the strategy-generation layer, upon which full-scale maritime UAV landing autonomy can be progressively constructed.

Future research should therefore concentrate on three concrete directions. One avenue is the extension of the guidance strategy to full-scale experimental validation in diverse sea states, thereby assessing the adaptability of the proposed control laws under realistic hydrodynamic and meteorological conditions. Another direction involves the optimization of onboard computation and energy allocation, ensuring that multi-objective decision-making can be executed within the strict hardware and power constraints of tail-sitter UAVs. Finally, further investigation into resilience mechanisms such as fault-tolerant sensing, secure communication protocols, and adversarial learning defenses will be essential to guarantee reliable UAV-ASV cooperation in contested or degraded environments. By addressing these limitations, subsequent work can strengthen both the theoretical rigor and the practical applicability of dynamic guidance strategies for autonomous maritime operations.

The use of a Nash equilibrium framework in this study must be interpreted within the operational constraints of hybrid UAV-ASV systems. While classical game theory assumes active, simultaneous optimization by both agents, real-world ASVs rarely modify their trajectory with high frequency to support UAV landing. Therefore, the Nash equilibrium employed here represents an effective equilibrium: the UAV optimizes its landing trajectory under a fixed but disturbance-influenced ASV motion profile. The equilibrium conditions are validated through the convexity of the multi-objective loss, the compactness of feasible landing trajectories, and the KKT-based optimality constraints applied during Pareto front generation.

However, the approach does not model adversarial or fully strategic ASVs. Future work should consider extending the game-theoretic formulation to include cooperative ASVs capable of predictive deck alignment or dynamic station-keeping, which would enable true two-sided equilibrium computation. Nevertheless, the current formulation remains theoretically consistent and operationally suitable for the class of ASVs typically deployed in maritime UAV operations.

5. Conclusions

This study presents a unified guidance control framework enabling tail-sitter UAVs to achieve autonomous landings on moving vessels with minimal risk, reduced energy consumption, and shortened landing time. By combining sliding mode control with Lyapunov-based stability guarantees, the system maintains robustness under nonlinear and dynamic conditions. The integration of game-theoretic modeling ensures Nash equilibrium between UAV and vessel dynamics, while Pareto optimization validated through Karush-Kuhn-Tucker conditions provides a mathematically rigorous trade-off among time, energy, and risk. Reinforcement learning, implemented via an Actor-Critic network, further enhances adaptability by dynamically updating landing strategies in response to vessel motion modes such as circular navigation and straight-line travel. Experimental validation confirms that the proposed method achieves stable landings across diverse dynamic scenarios, reducing throttle demand and demonstrating consistent convergence of trajectory errors, thereby substantiating its effectiveness for UAV-ASV cooperative operations.

Nevertheless, three limitations remain. The reinforcement learning model, though effective in controlled environments, requires further generalization to cope with complex maritime conditions involving variable wind fields and wave interference. The current framework also focuses on single UAV-vessel interactions, leaving multi-drone coordination and cooperative landing strategies unexplored. Finally, energy optimization is addressed only at the trajectory level, without incorporating bio-inspired endurance mechanisms or energy recovery processes. Future research should therefore extend reinforcement learning toward multimodal adaptation, investigate coordinated landing strategies through multi-agent game theory, and develop dynamic energy management models to enhance UAV endurance. Addressing these challenges will strengthen both the theoretical rigor and the practical applicability of autonomous UAV landings, paving the way for broader deployment in ocean monitoring, logistics, and emergency rescue operations.

Footnotes

ORCID iD

Chun-Yi Lin

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

de Jong

Remes

Hwang

, et al. Never landing drone: Autonomous soaring of a unmanned aerial vehicle in front of a moving obstacle. International Journal of Micro Air Vehicles 2021; 13: 17568293211060500. https://doi.org/10.1177/17568293211060500

Polvara

Sharma

Wan

, et al. Towards autonomous landing on a moving vessel through fiducial markers. https://doi.org/10.1109/ECMR.2017.8098671 (2017).2017 European Conference on Mobile Robots (ECMR), Paris, France, 2017, 1–6.

Tan

Yang

, et al. Research on Optimal Landing Trajectory Planning Method between an UAV and a Moving Vessel. Appl. Sci 2019; 9: 3708. https://doi.org/10.3390/app9183708

Z -C

B -B

Liu

, et al. Vision-based Autonomous Landing of Unmanned Aerial Vehicle on a Motional Unmanned Surface Vessel. https://doi.org/10.23919/CCC50068.2020.9188979 (2020).2020 39th Chinese Control Conference (CCC), Shenyang, China, 2020, 6845–6850.

Djapic

Prijic

Bogart

, Autonomous takeoff & landing of small UAS from the USV. In: OCEANS 2015, MTS/IEEE Washington, Washington, DC, USA, 2015, pp. 1–8. https://doi.org/10.23919/OCEANS.2015.7404595

Liu

Cao

, et al. A manipulator-assisted multiple UAV landing system for USV subject to disturbance. Ocean Engineering 2024; 299(117306): 117306. https://doi.org/10.1016/j.oceaneng.2024.117306

Feng

Yang

Wang

, et al. Design and Implementation of Autonomous Takeoff and Landing UAV System for USV Platform. https://doi.org/10.1109/ICCSI55536.2022.9970677 (2022).2022 International Conference on Cyber-Physical Social Intelligence (ICCSI), Nanjing, China, 2022, 292–296.

Guan

, et al. Synchronized Motion-Based UAV–USV Cooperative Autonomous Landing. J. Mar. Sci. Eng 2022; 10(1214): 1214. https://doi.org/10.3390/jmse10091214

Guan

, et al. NMPC-based UAV-USV cooperative tracking and landing. Journal of the Franklin Institute 2023; 360(11): 7481–7500. https://doi.org/10.1016/j.jfranklin.2023.06.023

10.

Cho

Choi

Bae

, et al. Autonomous ship deck landing of a quadrotor UAV using feed-forward image-based visual servoing. Aerospace Science and Technology 2022; 130(No): 107869. https://doi.org/10.1016/j.ast.2022.107869.. Doi:

11.

Tsintotas

Bampis

Taitzoglou

, et al. The MPU RX-4 Project: Design, Electronics, and Software Development of a Geofence Protection System for a Fixed-Wing VTOL UAV. IEEE Transactions on Instrumentation and Measurement 2023; 72: 1–13, Art no. 7000113. https://doi.org/10.1109/TIM.2022.3225020

12.

Ragusa

Taccioli

Canepa

, et al. Design and Implementation of Tiny Deep Neural Networks for Landing Pad Detection on UAVs. IEEE Access 2024; 12: 124009–124020. https://doi.org/10.1109/ACCESS.2024.3454363

13.

Luo

. Evolution of UAV Landing Structures in the Bistable Space of Kresling Origami Structures. IEEE Robotics and Automation Letters 2023; 8(4): 2070–2077. https://doi.org/10.1109/LRA.2023.3246395

14.

Shao

Malekian

, et al. A Novel Cooperative Platform Design for Coupled USV–UAV Systems. IEEE Transactions on Industrial Informatics 2019; 15(9): 4913–4922. https://doi.org/10.1109/TII.2019.2912024

15.

Dong

Cui

, et al. Aerial Landing of Micro UAVs on Moving Platforms Considering Aerodynamic Interference. IEEE Robotics and Automation Letters 2024; 9(11): 10089–10096. https://doi.org/10.1109/LRA.2024.3466093

16.

Maier

Oeschger

Kondak

. Robot-Assisted Landing of VTOL UAVs: Design and Comparison of Coupled and Decoupling Linear State-Space Control Approaches. IEEE Robotics and Automation Letters 2016; 1(1): 114–121. https://doi.org/10.1109/LRA.2015.2502920

17.

Santos

Bartlett

Schneider

, et al. Cooperative Unmanned Aerial and Surface Vehicles for Extended Coverage in Maritime Environments. IEEE Access 2024; 12: 9206–9219. https://doi.org/10.1109/ACCESS.2024.3353046

18.

Zheng

Guan

, et al. Constrained Moving Path Following Control for UAV With Robust Control Barrier Function. IEEE/CAA Journal of Automatica Sinica 2023; 10(7): 1557–1570. https://doi.org/10.1109/JAS.2023.123573

19.

Huang

Zhu

Zheng

, et al. Homography-Based Visual Servoing for Underactuated VTOL UAVs Tracking a 6-DOF Moving Ship. IEEE Transactions on Vehicular Technology 2022; 71(3): 2385–2398. https://doi.org/10.1109/TVT.2021.3138912

20.

Santos

Lobo

Bernardino

. Directional Statistics for 3D Model-Based UAV Tracking. IEEE Access 2020; 8: 33884–33897. https://doi.org/10.1109/ACCESS.2020.2973970

21.

Falang

Hinostroza

Bull Hove

, et al. Automatic Drone Landing on a Boat: Theory and Preliminary Experimental Results. Proceedings of the ASME 2024 43rd International Conference on Ocean, Offshore and Arctic Engineering. Volume 5B: Ocean Engineering. Singapore, Singapore, June 9–14, 2024. ASME. V05BT06A069. https://doi.org/10.1115/OMAE2024-123840

22.

Xuan-Mung

Hong

Nguyen

, et al. Autonomous Quadcopter Precision Landing Onto a Heaving Platform: New Method and Experiment. IEEE Access 2020; 8: 167192–167202. https://doi.org/10.1109/ACCESS.2020.3022881

23.

Xia

Huang

Zou

, et al. Reinforcement Learning Control for Moving Target Landing of VTOL UAVs With Motion Constraints. IEEE Transactions on Industrial Electronics 2024; 71(7): 7735–7744. https://doi.org/10.1109/TIE.2023.3310014

24.

Zeng

Jin

, et al. A UAV Localization System Based on Double UWB Tags and IMU for Landing Platform, in IEEE Sensors Journal 2023; 23(9): 10100–10108. https://doi.org/10.1109/JSEN.2023.3260311

25.

Fang

Jiang

Chen

W -H

. Model Predictive Control With Wind Preview for Aircraft Forced Landing. IEEE Transactions on Aerospace and Electronic Systems 2023; 59(4): 3995–4004. https://doi.org/10.1109/TAES.2023.3235321

26.

Yuan

Duan

Zeng

. Automatic Carrier Landing Control With External Disturbance and Input Constraint. IEEE Transactions on Aerospace and Electronic Systems 2023; 59(2): 1426–1438. https://doi.org/10.1109/TAES.2022.3202142

27.

Zou

Liu

Wang

, et al. Model-Free Control-Based Trajectory Tracking Control of a Tail-Sitter UAV in Hovering Mode. IEEE Transactions on Instrumentation and Measurement 2024; 73: 1–20, Art no. 3517820. https://doi.org/10.1109/TIM.2024.3373081

28.

, et al. Position Tracking Control of Tailsitter VTOL UAV With Bounded Thrust-Vectoring Propulsion System. IEEE Access 2019; 7: 137054–137064. https://doi.org/10.1109/ACCESS.2019.2942526

29.

Mathä

. Dynamics for games with continuous strategy sets, 2013. na.

30.

Frihauf

Krstic

Basar

. Nash Equilibrium Seeking in Noncooperative Games. IEEE Transactions on Automatic Control 2012; 57(5): 1192–1207. https://doi.org/10.1109/TAC.2011.2173412

31.

Censor

. Pareto optimality in multiobjective problems. Applied Mathematics and Optimization 1977; 4: 41–59. https://doi.org/10.1007/bf01442131

32.

Giorgi

Jiménez

Novo

. Approximate Karush–Kuhn–Tucker Condition in Multiobjective Optimization. J Optim Theory Appl 2016; 171: 70–89. https://doi.org/10.1007/s10957-016-0986-y

33.

Cook

. Envelope methods. Wiley Interdisciplinary Reviews: Computational Statistics 2020; 12: e1484, Iss. 2, No. E1484. https://doi.org/10.1002/wics.1484

34.

Vidal

Shakernia

Kim

, et al. Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation. IEEE Transactions on Robotics and Automation 2002; 18(5): 662–669. https://doi.org/10.1109/TRA.2002.804040

35.

Baydas

Karakas

. Defining a curve as a Bezier curve. Journal of Taibah University for Science 2019; 13(1): 522–528. https://doi.org/10.1080/16583655.2019.1601913., Iss.

36.

Han

Zhang

Wang

, et al. Actor-Critic Reinforcement Learning for Control With Stability Guarantee. IEEE Robotics and Automation Letters 2020; 5(4): 6217–6224. https://doi.org/10.1109/LRA.2020.3011351