Abstract
This study proposes an optimization strategy for a tail-sitter unmanned aerial vehicle (UAV) landing on an autonomous surface vessel (ASV), aiming for minimal risk, shortest time, and lowest energy consumption. Sliding mode control ensures system stability under nonlinear and dynamic conditions. A two-player game model between the UAV and ASV is established, achieving Nash equilibrium to minimize risk and energy. Pareto optimal theory provides a trade-off among multiple objectives, while Karush-Kuhn-Tucker conditions verify the optimality of the solution. An Actor-Critic network generates and evaluates landing strategies through adaptive online learning based on temporal difference errors. Results show that the UAV can dynamically adjust to ship motion, achieving precise, energy-efficient, and low-risk landings. This approach offers practical technical support and theoretical foundations for cooperative operations between UAVs and unmanned ships in diverse environments.
Keywords
1. Introduction
Tail-sitter UAVs have shown superior performance in many applications due to their vertical take-off and landing characteristics. However, safely landing on ship at sea is a complex issue, especially given the dynamic movement of the ship on the sea surface. The stable landing of a drone not only depends on the flight control system but is also affected by ship movement and environmental factors. One of the challenges for unmanned aerial vehicle (UAV) safely landing on ship lies in its continuous movement, including vertical and horizontal fluctuations. These movements bring great uncertainty and challenges to the landing of UAV, requiring them to have efficient dynamic guidance capabilities.
In order to ensure stable landing under various environmental conditions, effective guidance game strategies must be researched and designed, fully considering the uncertainty of the environment and movement. Solving the problem of UAV landing on ship will greatly expand the application scope of tail-sitter UAV, including ocean monitoring, search and rescue missions, and maritime transportation. The design of dynamic guidance game strategy can help improve the stability and safety of UAV during ship landing at sea and reduce the risk of landing failure. Studying the dynamic guidance game strategy of tail-sitter UAV landing on ship can also promote the application and development of autonomous control and automation technologies in unmanned systems. The maritime environment is complex and changeable. Studying dynamic guidance game strategies can improve the adaptability and execution efficiency of UAV in different sea conditions.
Currently, many related studies focus on the use of various sensors for guided landing and how to design maritime arresting equipment for UAV capture. Increasing battery life and saving energy is one of the major challenges. In 2021 de Jong CP et al. proposed the updraft around obstacles to reduce the energy consumption of fixed-wing UAV. 1 Experiments have proven that the controller is effective, the average throttle is reduced to 4.5%, and continuous flight at 0% throttle is achieved. R. Polvara et al. proposed a method for autonomous landing of drones on ship decks, using markers and extended Kalman filter, suitable for GPS-free environments. 2 Tan, L. et al. proposed a method for calculating the optimal landing trajectory between a drone and a small ship. 3 The accuracy is improved through numerical iteration and initial guidance, and the effectiveness is verified by simulation experiments. Z.-C. Xu et al. developed a vision-based control method, designed a three-stage visual detection method and a PID controller, and successfully achieved the landing of a UAV on a mobile unmanned surface vehicle (USV) in a lake experiment. 4 V. Diapic et al. proposed a maritime heterogeneous unmanned autonomous system that combines USV and UAV through modular design to solve collaboration and cooperation problems in a GPS-free environment and improve the reliability of data and control message sharing. 5
Ruoyu Xu et al. proposed a robotic arm-assisted landing system that can grab and place hovering UAVs. 6 They designed a nonlinear model predictive controller to deal with disturbances and improved the landing efficiency of multiple UAVs on USVs. B. Feng et al. proposed a staged take-off and landing method, designed corresponding hardware systems is given, including UAV and USV platforms, and improved the autonomous take-off and landing capabilities of UAVs in a dynamic ocean environment through experimental verification. 7 Li, W. et al. proposed a control strategy based on synchronous motion, using computer vision and two-way long-short-term memory neural network for attitude prediction and control to improve the landing accuracy of UAVs in complex ocean environments. 8 Wenzhan Li et al. was aiming at the problem of real-time tracking and accurate landing of UAV during USV navigation, a collaborative tracking and landing control strategy based on NMPC was proposed to achieve high-precision landing. 9 Gangik Cho Joonwon et al. proposed an image-based visual servo technology that combines GPS data and dynamic models to estimate speed, adapt to the rapid movement of ships and the influence of waves, and improve landing accuracy. 10
K. A. Tsintotas proposed a low-complexity electronic fence protection system is proposed that uses radar and range finders to achieve moving obstacle detection and ground assessment under low latency. 11 E. Ragusa et al. proposed a design paradigm to implement convolutional neural network detection of small UAV landing platforms on low-power commercial microcontrollers and effectively balance generalization capabilities and computing costs. 12 H. Li et al. proposed a landing structure based on Kresling tube bistable space is presented. 13 Through graph search and controller optimization design, it can adapt to multiple scenarios and enhance buffering performance. G. Shao et al. proposed a collaborative unmanned surface vehicle-UAV platform, designed adjustable buoys and USV carrying decks to ensure landing safety, and used a multi-ultrasonic joint dynamic positioning algorithm to solve the positioning problem. 14 X. Dong et al. proposed for high-degree-of-freedom landing platforms, a real-time trajectory optimization method was proposed to solve the aerodynamic interference problem of aerial UAV landing. Simulations and experiments proved their high-precision landing effect. 15
M. Maier et al. designed a series robotic arm to assist the landing of a vertical take-off and landing UAV, using a linear state space controller to decouple the position and direction of the UAV and the robotic arm to improve control accuracy. 16 M. C. Santos et al. presented a cooperation architecture for collaborative drones and unmanned ships, using satellite technology to establish robust communication and autonomous take-off and landing systems, which improves mission endurance and flexibility. 17 Z. Zheng proposed for the path following problem of fixed-wing UAVs, a high-order disturbance observer and adaptive control limit function method are proposed to ensure accurate path following without accurate initial position. 18 Y. Huang et al. proposed a visual servo control method based on homographic matrix for tracking UAVs with 6 degrees of freedom moving ships, and improved system stability and performance through hierarchical control strategies. 19
N. P. Santos et al. proposed a monocular camera vision system that uses particle filters and unscented Kalman filters to track the position and direction of UAVs during ship landing, adapting to various tracking problems and meeting landing requirements. 20 Falang M. et al. designed the sensing, planning and control modules, using the extended Kalman filter and graph plan algorithm to realize the automatic landing of UAVs on autonomous water vehicles. 21 N. Xuan-Mung et al. an autonomous landing algorithm including ground effect and disturbance control is given, designed a target state estimator and experimentally proved that the quadcopter landed accurately on a shaking platform. 22 K. Xia et al. proposed a funnel-shaped surface with relative positions and a reinforcement learning strategy to improve the accuracy and stability of autonomous landing of UAVs. 23 Q. Zeng et al. proposed a positioning system that uses ultra-wideband anchor points and linear least squares method to calculate landing coordinates to improve UAV landing accuracy. 24 X. Fang et al. used favorable wind energy and a two-layer model predictive control scheme to improve the safety and accuracy of autonomous emergency landing of fixed-wing UAVs. 25 Y. Yuan et al. presented an automatic landing system with a non-singular fast terminal sliding mode observer and a back-stepping control structure to improve the landing accuracy and stability of UAVs on transportation vehicles. 26
Comparison of representative UAV–ASV landing studies.
Comparison of representative UAV–ASV landing studies.
It is of great importance to study the dynamic guidance game strategy of tail-sitter UAV to stable land on ship. Optimization guidance strategy not only promotes the development of control theory and game theory but also promotes the research on the interaction between robot technology and dynamic systems. The development of this strategy requires the integration of knowledge from multiple fields such as machine learning, autonomous control, and system identification, and is significance to improving the autonomous decision-making capabilities of intelligent systems. Real-time optimization adjustments for uncertain factors in dynamic environments have further established the status of this research in the academic world and provided a rich theoretical basis for future cross-field research in multiple fields. In terms of industrial applications, the safe landing of Tail-sitter drones on ships at sea is related to the actual operations in many fields such as ocean monitoring, search and rescue missions, and logistics transportation. An effective game-optimized guidance strategy can significantly improve the stability and safety of UAV in dynamic and complex environments, reduce the risk of landing failure, and ensure the efficient completion of missions. At the same time, this research result can be applied to other unmanned systems that require high precision and reliability, such as autonomous vehicles, smart city infrastructure, and aerospace fields.
The realization of these technological innovations will not only enhance the competitiveness of related industries but also create huge economic benefits for society. The purpose of this study is to analyze in detail the motion characteristics of UAVs and ships during the landing process, to identify key influencing factors. A dynamic guidance game strategy suitable for landing in ships is developed, and its effectiveness through simulation and experiments can be verified. The trajectory planning methods for UAV landings is given to cope with ship motion and environmental interference, to ensure the accuracy and safety of the landing process. An experimental platform is constructed for testing and verification in actual scenarios to evaluate the performance and application potential of the dynamic guidance game strategy. This research provides the theoretical support and practical reference for the application of Tail-sitter drones in sea landing and promote the development and application of drone technology in the marine field.
2. Methods analysis
2.1. Tail-sitter UAV model
The tail-sitter UAV is a hybrid model that takes off and lands vertically and changes attitude in the air. It has the characteristics of an aircraft and a multi-rotor. The movement of the control surface and servo motor can affect the attitude of the aircraft. Assume that the mass center of the UAV is
In this case of Figure 1, the two control rudder surfaces can be controlled by two servo motors. UAV thrust and angle control parameters.

Considering the drone is an elastic body, its movement in the air can be affected by air resistance. Air resistance is usually proportional to the square of the UAV velocity. Therefore, the related translation force and moment is described by (3), where
Combining translational motion, rotational motion, the force of the rudder surface and air resistance, equation (1) can be rewritten as (4), where
To describe the dynamic behavior of the UAV using a state space representation of (5), the dynamic equations of the aircraft can be expanded into linear and nonlinear coupled dynamics to facilitate control design.
From (7), the rate of change of the attitude heading angle is provided by the velocity and the angular velocity incorporating the Euler angle transformation matrix
The state space matrix of the linearized system can be expressed in (8), where
A tail-sitter UAV switching from a fixed-wing mode of horizontal flight to a multi-rotor mode of vertical landing or the reverse operation is a highly nonlinear process. These two flight modes correspond to completely different characteristics and control gain parameters. Therefore, in this study, Sliding Mode Control (SMC)
27
is used to handle the UAV system with strong system robustness to cope with uncertainty and external disturbances when switching UAV flight models. Combining the theory of SMC to describe the control switching of the tail-sitter UAV between horizontal flight and vertical landing, the characteristics of the two modes must be considered. In the horizontal flight mode, it is similar to fixed-wing flight. The propeller is mainly used to provide horizontal thrust, and the rudder surface is used to control the attitude. The vertical landing mode is similar to the multi-rotor mode. The thrusters provide a lift vertically upward and work with the rudder to control attitude stability. SMC will design sliding surfaces in these two modes respectively and switch the control strategy between these two modes. The basic idea of SMC is to design a sliding mode surface so that the error dynamics of the system can gradually approach zero along this surface to achieve stable control of the system. Define a sliding mode surface of
The control goal in vertical mode is to stabilize the height and attitude of the drone and enable it to take off and land vertically. As shown in (9), the corresponding sliding mode surface can be designed for height and attitude errors, where
In the horizontal flight mode, the control goal is to maintain the horizontal speed and attitude stability of the UAV. The corresponding sliding mode surface can be designed according to the horizontal velocity error and attitude error as shown in (10).
The goal of the SMC law is to make the state of the system tend to the sliding mode surface
For the flight mode switching in the control system, the transition between horizontal flight and vertical landing needs to occur under certain conditions. Determine when to switch the control mode based on the attitude angle of the drone, especially the pitch angle
As the tail-sitter UAV landing on a slow-moving or stationary ship, it executes in vertical mode and uses the ship’s landing guide and active capture device to assist in completing the landing task. However, when the ship is in a landing state that is not conducive to vertical takeoff and landing mode, it should be tracked through the capture net directly toward the stern of the ship in fixed-wing mode until it hits the capture net. When UAV is tracking a moving target with fixed-wing mode, it uses the Line of Sight (LOS) of a three-axis optical tracker to design guidance law. The LOS of the optical tracker defines the heading angle of the target relative to the UAV body coordinate system and achieves smooth and stable tracking of the target through the design of the guidance law. It consider the estimation of the target’s relative position and velocity, the attitude control law based on LOS, the design of the guidance law considering stable tracking, and the SMC to deal with the disturbance and uncertainty in target tracking. The UAV’s optical tracker can provide target LOS information including two angles, horizontal LOS
PN and LOS of UAV environment state space.
The goal is to keep the drone’s line of sight aligned with the target. This requires that the LOS angle
The guidance law is designed using the Proportional Navigation (PN) method, which adjusts the aircraft’s acceleration based on the change in the LOS angle. According to the fundamental control principle of PN, as shown in (14),
For three-dimensional UAV tracking, the PN method is applied separately to the horizontal and vertical directions to design the acceleration commands, as shown in (15). These commanded accelerations are then converted into UAV attitude commands to drive the UAV in tracking the target.
In practical applications, UAVs may encounter challenges such as wind disturbances, modeling inaccuracies, and sensor noise during target tracking. To enhance system robustness, SMC is incorporated into the guidance law to address these uncertainties. The sliding surface, the control law, and a smoothing approximation function to mitigate chattering effects are given in (16), where
The relative position and velocity of the target are estimated based on the LOS angle and angular velocity. Pitch and yaw control laws are designed based on the LOS error to achieve attitude adjustment. The guidance law is formulated using PN, adjusting the UAV’s acceleration according to the LOS angular velocity. SMC is introduced to enhance system robustness against disturbances and uncertainties. On this basis, Lyapunov stability verification is conducted. The sliding surface is defined according to the Lyapunov function’s negative definiteness condition, i.e.,
Substituting (17) into the Lyapunov function
According to (18), the Lyapunov method proves that
Referring to (19) back into (17), the system under a disturbance environment can be obtained. Considering the system uncertainty terms of
The stability of this adaptive control law is validated based on the Lyapunov method. The Lyapunov function is defined by (22), where
The original system similarly demonstrates that the derivative of the Lyapunov function
Assume that UAV and the ship are two players in the game. The UAV’s strategy involves selecting the landing path and velocity to achieve three objectives: minimum landing time, minimum energy consumption, and minimum risk. The UAV must make trade-offs, as its strategy influences the priority of these objectives. The ship’s strategy involves adjusting the landing conditions, such as controlling its speed and direction to reduce the UAV’s landing risk and assist in a smooth descent. The game dynamics are defined as follows: the UAV aims to land as quickly as possible, which may increase energy consumption or risk. It seeks to minimize energy consumption during landing, which may extend the landing duration or require a more complex trajectory. Additionally, the UAV aims to minimize landing risk, particularly under disturbances such as wind speed and ship oscillations.
Based on the game players, objectives, and relationships, a mathematical model is formulated under the following conditions. Let
The game model consists of two strategy sets: the UAV’s strategy set
A two-player zero-sum game is given, where the UAV attempts to minimize its loss while the ship seeks to minimize risk. To analyze this game theory, it is essential to first prove the existence of a Nash equilibrium. The first step is to establish the continuity of the utility functions. Assuming that
To address the limitations noted in prior game-theoretic UAV–ASV coordination studies, the present work explicitly formalizes the information structure, payoff definitions, and equilibrium-seeking mechanism for the two-player dynamic game. The UAV’s payoff function is defined as a weighted multi-objective cost incorporating landing time, energy consumption, and risk, all of which are computed from measurable flight-state variables and predicted ASV motion. The ASV’s payoff is formulated as a risk-minimization term dependent on deck motion amplitude, relative pose, and disturbance intensity
To achieve Nash equilibrium under real-time operational constraints, the static equilibrium characterization is mapped into an iterative online solution in which the UAV updates its strategy through the Actor–Critic network, while the ASV contributes a quasi-static disturbance profile that evolves slowly relative to UAV guidance cycles. Because ship motion bandwidth 0.1–1 Hz is significantly lower than the UAV guidance update frequency 10–20 Hz, the equilibrium is established in a receding-horizon manner. Convergence is supported by the convexity of each objective component and the compactness of the feasible set, enabling the UAV to approach a locally optimal response that respects constraints derived from KKT conditions. The Actor–Critic function approximation further accelerates convergence by embedding Nash-optimal gradients into the update law, eliminating the need for explicit equilibrium solving at each step. This structured formulation clarifies the respective roles of UAV and ASV, resolves the issue of asymmetric objectives, and ensures that the game-theoretic framework remains computationally tractable and operationally meaningful in marine environments.
The existence of a Nash equilibrium in the game function is established. There are two optimization problems: one related to UAV strategy and the other concerning ship strategy. For the UAV, the objective within its strategy space is to minimize the loss function, while for the ship, the goal within its strategy space is to minimize risk. Consequently, the optimization problems for both the UAV and the ship can be formulated by (25).
Since the utility function of the UAV includes multiple objectives, such as landing time, energy consumption, and risk, it is necessary to allocate weights to these objectives for optimization. The parameters
Considering that the game analysis assumption is completed within a discrete time step
The ship’s objective is to minimize the risk
The Pareto approach can be implemented using a stepwise search method, adjusting weight iteratively and repeatedly solving single-objective optimization problems. However, this approach is time-consuming in practice and may lead to convergence to local optima, especially for complex systems. An alternative approach for multi-objective optimization is the
This problem can be solved using standard constrained optimization methods such as the Lagrange multiplier method or the Karush-Kuhn-Tucker (KKT)
32
conditions. Based on iteratively adjusting and converging
According to the KKT conditions, the gradient condition, feasibility condition, non-negativity of the multipliers, and complementary slackness condition are expressed separately as shown in (31). These conditions can be used to solve for the optimal
As shown as (28), a trajectory
To tune priorities dynamically during different landing phases, the multi-objective vector is scalarized into a unified cost as (26) with time-varying weights as (33).
This dynamic weighting explicitly models practical landing priorities. In cruise phase energy efficiency is more important setting large
During training epochs, the Critic evaluates each candidate trajectory according to the scalarized cost. But the Actor is additionally constrained to ensure that condition
To address practical concerns regarding online equilibrium computation, the proposed game is formulated as a semi-dynamic cooperative game in which the ASV exhibits slow-varying dynamics and limited maneuverability, while the UAV executes high-bandwidth control. The information structure is defined such that the UAV has access to: a. Relative position and velocity via LOS estimation, b. Envelope boundary constraints, c. Short-horizon deck motion prediction d. Onboard inertial and GNSS states.
The ASV possesses only state measurements relevant to deck motion and environmental disturbance but does not modify its strategy in response to UAV actions during landing. This enforces a one-sided adaptive structure where only the UAV actively optimizes its strategy, making the equilibrium computationally attainable.
The UAV computes equilibrium-approximating strategies through repeated Actor–Critic updates, where each iteration implicitly performs a gradient step toward satisfying the first-order KKT conditions of the multi-objective game. Because the ASV’s motion evolves slowly relative to the UAV’s control loop, convergence is achieved in a receding-horizon sense, with each update ensuring that successive policies remain within the feasible Pareto set. This design prevents instability caused by communication delays or environmental disturbances and guarantees that Nash-like equilibria are approached without requiring explicit full-information negotiation.
To justify the use of Karush–Kuhn–Tucker (KKT) conditions in the multi-objective landing optimization problem, the constrained structure of the landing task is explicitly defined. The UAV must satisfy actuator, spatial, and terminal-phase safety constraints during landing, expressed as (35).
Let the scalarized objective from the Pareto framework. The Lagrangian is written as (36) where
A control solution ACC satisfies the KKT conditions if the (37) condition hold. These conditions precisely identify which constraints become active during different landing phases.
Because Actor–Critic policies are generally non-convex, exact KKT satisfaction cannot be guaranteed. Therefore, a KKT residual metric is defined and measured at each landing rollout (38).
Empirically, policies after convergence exhibit
To complement the approximate KKT-based verification and to ensure that the proposed learning-guidance architecture performs reliably in stochastic, time-varying maritime environments, both stability and performance guarantees are jointly evaluated through a Lyapunov-informed regret analysis. Rather than treating Lyapunov stability and regret bounds as independent validation tools, the two are merged into a cohesive robustness assessment framework. Specifically, the underlying sliding-mode and LOS-based guidance structure establishes a deterministic stabilizing backbone, ensuring that the UAV’s tracking errors evolve within a contracting set. While the Actor–Critic component adapts waypoint or command decisions in real time, the stabilizing controller continuously regulates the system toward a decreasing Lyapunov function. This interaction creates a layered control architecture wherein learning-driven exploration is constrained to a region that preserves monotonic reduction of the Lyapunov measure. Thus, even as the policy evolves, the state trajectory remains inside a bounded, forward-invariant neighborhood around the reference path.
Within this stability-preserving region, the regret framework provides a quantitative measure of how effectively the learned policy approximates the optimal constrained behavior over time. Because the tracking errors remain bounded by the Lyapunov-based contraction properties of the low-level controller, the regret analysis is not performed on raw, unstable trajectories but on trajectories whose deviations are guaranteed to remain within a stability-certified corridor. This coupling ensures that each incremental policy update cannot drive the system into unstable or infeasible regions, even in the presence of unmodeled ASV motion, sensor fluctuations, or environmental disturbances. As the Actor–Critic network repeatedly refines its policy, the Lyapunov structure effectively “shapes” the learning landscape, preventing divergence and ensuring that suboptimal actions incur bounded penalties in future performance. As a result, the accumulated regret grows sub-linearly with time consistent with convergent Actor–Critic processes indicating that the learned decisions asymptotically approach the best feasible actions available within the stability-constrained domain.
The combined viewpoint also clarifies why traditional KKT-based analysis alone is insufficient for real-time marine operations: KKT offers instantaneous optimality checks but does not describe the behavior of the algorithm across time or under stochastic perturbations. In contrast, Lyapunov-constrained regret analysis accounts for both temporal adaptation and environmental uncertainty. Lyapunov properties ensure that the UAV remains resilient to disturbances and maintains bounded deviation from the intended guidance path, while the regret measure characterizes the efficiency of learning and thus the quality of real-time adaptation. When integrated, these two tools demonstrate that even if the Actor–Critic policy is non-convex and the environment stochastic the system remains stable and its long-run average performance approaches optimal constrained behavior. This blended guarantee forms a more robust and operationally meaningful alternative to pure KKT verification, validating both the safety and the asymptotic efficiency of the proposed guidance strategy under practical maritime conditions.
As shown in Figure 3, this method illustrates the integrated framework for optimization and control of a tail-sitter UAV during its landing process on an ASV, where the entire design is structured around system modeling, game-theoretic modeling, multi-objective optimization, adaptive guidance control, trajectory planning, UAV system modeling, and reinforcement learning integration. The main methodological framework of this study.
The system modeling block establishes the global environment setup, which serves as the foundation for further processes. Within this setup, game-theoretic modeling is introduced to define the landing objective and mathematically prove the existence of a Nash equilibrium, ensuring that both UAV and ASV reach an optimal cooperative state where risks and energy consumption are minimized. The multi-objective optimization module applies Pareto optimization to balance the conflicting goals of minimum risk, shortest time, and lowest energy consumption, while constraints are transformed and validated through Karush-Kuhn-Tucker conditions to guarantee both necessity and sufficiency of optimal solutions. Adaptive guidance control law governs real-time UAV behavior through line-of-sight tracking, guidance commands, and transformations between different guidance modes, supported by Lyapunov and Barbalat stability validation to ensure robustness under uncertainties. The trajectory planning and smoothing block generates and refines cruise trajectories to provide feasible and smooth paths for landing, feeding directly into the UAV system model that represents state equations, control inputs, motion state space, and mode transformations necessary for precise dynamic modeling. On the other side, reinforcement learning integration enhances adaptability by embedding an Actor-Critic structure: initialization is followed by state observation setup, then the actor generates actions in the form of trajectory strategies, while the critic evaluates the value of these actions by updating objective condition parameters. Actor updates subsequently refine the guidance law parameters, with this iterative learning process repeated up to 1000 epochs to ensure convergence, after which the system terminates training. By combining rigorous mathematical modeling, control theory validation, multi-objective optimization, and reinforcement learning, the framework ensures that the UAV achieves stable and precise landing performance under the dynamic and nonlinear environment of ship motion, thereby fulfilling mission requirements of safety, efficiency, and robustness in cooperative UAV-ASV operations.
This study presents the detailed control architecture of the UAV system, showing how different controllers, feedback systems, and hardware components interact to realize autonomous or remotely controlled missions as shown in Figure 4. The main methodological framework of this study.
At the top level, the Task Controller integrates the User Interface and Strategy Development modules. The user interface allows for two modes of operation: autonomous task planning, where the UAV follows pre-defined mission objectives, and manual remote control for direct operator intervention. The strategy development block transforms mission objectives into executable paths through multi-objective transformation and generates continuous waypoints that serve as references for the flight controller. The Flight Controller is composed of three main parts: the guidance controller, the attitude controller, and the feedback information system. The guidance controller is responsible for discrete path waypoint generation, application of the adaptive guidance control law, and trajectory planning and smoothing, which ensures feasible and safe flight paths are provided to the UAV. The attitude controller is structured hierarchically, beginning with the position controller, which defines spatial positioning; followed by the heading controller, which governs UAV orientation; and finally the speed rate controller, which regulates velocity for stability and responsiveness. These control signals are then transmitted to the hardware-level attitude controller, which directly interfaces with the UAV actuators. This hardware control includes servos for aerodynamic control surfaces, electrical speed controllers (ESCs) to regulate power distribution, and brushless DC motors that generate thrust for UAV propulsion. The Feedback Information block continuously updates the system by providing real-time state estimation through the Attitude and Heading Reference System (AHRS) and the Global Navigation Satellite System (GNSS), ensuring accurate positional and orientation data. These feedback signals are fed back into the flight controller for closed-loop control, allowing the UAV to dynamically adjust to environmental conditions and disturbances. This architecture demonstrates a tightly integrated control framework in which mission objectives are translated into real-time trajectory execution, stabilized by adaptive control laws, supported by feedback information, and implemented through actuator-level commands. This design ensures that the UAV maintains precise control and stability while executing autonomous or remotely commanded operations in complex flight missions.
3. Experimental analysis
To accommodate different ship motion modes, including circular loitering and linear navigation, this study conducts experiments based on three guidance methods. The guidance law is designed using three primary game-theoretic cost factors: time, energy consumption, and risk. A guidance strategy is proposed, integrating the Envelope Method 33 and Pursuit-Evasion Theory. 34 Additionally, constraint conditions and cost functions are designed to facilitate the convergence of the loss function.
The objective of the Envelope Method is to ensure that the UAV remains within a dynamically defined safe region during the landing process, thereby minimizing risk while aiming to reach the landing area in the shortest possible time. Let
By constraining the envelope boundary, the guidance domain is restricted within a predefined range. The pursuit-evasion strategy is then applied to formulate optimal strategies when stochastic factors affect both the ship’s and UAV’s motion. Pursuit-evasion theory is a mathematical framework that studies the optimal strategies between a pursuer and an evader within a finite space and environment. This theory is widely applied in fields such as mathematics, computer science, control engineering, and especially in robotic navigation, UAV control, game theory, and security analysis. The goal of the pursuit-evasion strategy is to enable the UAV to rapidly approach the landing point and complete the landing with the shortest possible distance while maintaining acceptable control accuracy. This method is particularly suitable for time-sensitive scenarios. Given the envelope boundary, the distance between the UAV and the ship is denoted as
The predictive control method is used to perform multi-objective optimization by integrating considerations of energy consumption, time, and risk while dynamically adjusting the UAV’s trajectory. Based on the current state, future landing positions are predicted for adaptive adjustments. Currently, the power consumption rate of the UAV is not yet defined, so energy consumption
When generating the ideal trajectory, discontinuities may arise. In this study, Bézier curve
35
interpolation and the spline interpolation are applied to smooth the trajectory and handle discontinuities. Bézier curves are well-suited for managing abrupt changes in position and velocity due to their smooth transition properties. Suppose a set of discontinuous points
To construct a cubic spline function
By applying quadratic Bézier curves to limit the envelope constraints, the UAV’s position undergoes a smooth transition while satisfying the envelope limitations. The pursuit-evasion strategy employs cubic spline interpolation to ensure continuity in displacement, velocity, and acceleration. The predictive control method utilizes time-weighted optimization to smooth velocity and acceleration variations, adapting to predictive changes.
The guidance relationship between the ship and the UAV is established based on a Nash equilibrium game condition. A loss function is formulated by defining three global cost functions. Reinforcement learning, specifically the Actor-Critic method, is employed to optimize the three guidance trajectory designs, treating the UAV’s control strategy as a continuous decision-making process. The global loss function enables the strategy determined by the Actor network to be updated according to the estimated values from the Critic network, ultimately achieving an optimized guidance trajectory. The objective is for the UAV to complete landing in the shortest possible time, making time cost equivalent to the total mission duration. Since energy consumption is approximately proportional to the square of velocity, integrating the square of velocity serves as an approximation for energy cost. The risk cost is defined based on the UAV’s deviation from the envelope boundary and the ship’s instability score. The deviation cost accumulates at each time step, with penalties intensified when the distance exceeds the safety threshold. Considering time, energy consumption, and risk, the global composite loss function is defined as
In the Actor-Critic architecture,
36
two networks are used. The Actor generates the strategy
The reinforcement learning training algorithm consists of seven key steps: initialization, state observation, action generation, state update and feedback computation, Critic update, Actor update, and repeat iteration. The process is described as follows: 1. Initialization: Initialize the parameters of the Actor network 2. State Observation: At each time step, observe the global state 3. Action Generation: The Actor generates a predefined action based on the current policy 4. State Update and Reward Computation: After executing the action 5. Critic Update: Compute the temporal difference error 6. Actor Update: Update the 7. Repeat Iteration: Repeat steps 2 through 6 until the specified number of iterations is reached.
The Actor-Critic method enables the UAV to autonomously learn strategies that minimize the loss function across three different guidance trajectory designs under various environmental conditions. This framework integrates UAV decision-making with multi-objective optimization of the loss function, allowing the UAV to achieve an optimal balance between time, energy consumption, and risk. To optimize UAV landing on a ship, two different ship motion trajectories, circular navigation and linear sailing, must be designed and refined using the Actor-Critic method. This design is based on the relative motion between the UAV and the vessel, as well as the optimized guidance trajectory, ensuring that the UAV can land within a short time while minimizing energy consumption and risk. The study further explores how to optimize UAV landing trajectories based on the distinct characteristics of these two navigation modes using the Actor-Critic framework.
Both the actor and critic are implemented as lightweight feed-forward neural networks suitable for deployment on embedded avionics hardware. The actor network consists of three hidden layers with 64-128-64 units respectively, employing Rectified linear unit (ReLU) activation in all hidden layers and a hyperbolic tangent output layer to enforce bounded control commands. The critic network mirrors the actor’s topology but outputs a scalar state-action value. Learning rates are set to 3×10−4 for the actor and 10−3 for the critic, optimized using Adam. The discount factor is fixed at γ=0.98, balancing short-horizon landing priorities with energy-sensitive considerations in the earlier approach phase.
Exploration is implemented via a Gaussian action perturbation strategy with annealed variance, initialized at 0.3 and decaying exponentially to 0.05 over 50,000 steps, allowing gradual transition from exploration to exploitation as the policy stabilizes. To ensure sample efficiency under the partially observable conditions typical in maritime environments, an experience replay buffer of size 100,000 is used, with minibatch updates of 128 samples drawn at each learning step. Temporal difference (TD) error governs critic updates, and the actor adjusts its parameters by following the deterministic policy gradient direction weighted by the TD error, meaning that large TD errors amplify corrective updates while small errors smooth policy refinements. This produces natural adaptive behavior when the ASV induces sudden deck motion, while maintaining stability when the trajectory is near-optimal.
This study simulates UAV and unmanned vessel system parameters based on real-world specifications and designs guidance waypoints within the simulation environment. The UAV’s maneuverability is first validated under quantifiable wind speed conditions using the Beaufort wind scale, conducting experimental tests across wind levels from 1 to 5. The primary objectives include assessing fundamental stability and maneuverability under light wind, response performance under moderate wind, flight control and stability under mid-range wind speeds, control performance and tolerance under strong wind conditions, and determining the UAV’s operational limits under high wind speeds. The discrete distances between waypoints, denoted as
In the simulation experiments, the ASV does not execute an active strategy update during UAV landing. Instead, its trajectory is predetermined linear or circular, and deck motion disturbances are treated as external inputs. This design reflects realistic maritime operations in which ASVs typically maintain navigation objectives and do not dynamically optimize their trajectory to assist UAV landing. Under this constraint, the UAV becomes the primary decision-making agent, and the game-theoretic model reduces to a single-sided optimization problem in which the ASV’s “strategy” is represented by its disturbance profile.
The Actor–Critic framework iteratively approximates the Nash response by learning within this structured environment: the UAV’s critic network evaluates the multi-objective cost induced by the ASV’s motion, while the actor adjusts its waypoint strategy to minimize this compounded cost. This setup ensures that the learned strategy reflects physically realistic autonomy levels without assuming inaccessible perfect cooperation from the ASV.
The experimental evaluation of the proposed guidance framework is conducted under the assumption that both the UAV and the ASV are equipped with a fully functional Attitude and Heading Reference System (AHRS) or an equivalent state estimation module. All state variables used in this study including position, velocity, vertical speed, attitude angles, and deck motion estimates are taken after AHRS-level filtering or state-fusion processing. The design, tuning, or validation of the AHRS filters (e.g., complementary filtering, EKF-based sensor fusion, or model-based disturbance rejection) is not within the scope of this research. Instead, we assume that the flight controller already operates with well-designed and fully optimized attitude and stabilization loops, as would be expected in a mature commercial or research-grade UAV autopilot.
Under this assumption, the UAV’s inner-loop control system provides stable, low-level attitude tracking at bandwidths higher than those required for trajectory or waypoint updates. Therefore, the present study focuses exclusively on the generation of optimal waypoints and the associated motion strategy for the landing task. The proposed game-theoretic, Pareto-based, and Actor–Critic-guided framework operates at the outer-loop level, producing a sequence of dynamically feasible waypoints. These waypoints are subsequently processed by the UAV’s onboard autopilot, which is treated as an idealized, stable, and fully tuned system capable of executing the commanded positions and velocities with negligible internal delay or instability.
All sensor streams used in the experiments including ship deck motion, relative UAV-to-ASV pose, vertical sink rate, and horizontal drift are assumed to have undergone the full filtering chain implemented on the autopilot platform. The filtered data therefore represent the best available real-time estimates that a practical UAV–ASV system would provide. This modeling choice isolates and emphasizes the contribution of the present study, namely, the investigation of optimal landing waypoint generation, adaptive decision-making, and multi-objective optimization under realistic maritime disturbance profiles.
To replicate marine operational conditions, the deck motion of the ASV is generated based on real IMU recordings from small unmanned surface vessels, filtered to the dominant wave-induced frequency band (0.1–1 Hz). Wind disturbance, communication delay, and GNSS fluctuations are incorporated into the relative pose measurements but are attenuated through the assumed AHRS filtering pipeline. The resulting filtered state vector is then provided to the Actor–Critic network, which updates its waypoint-generation policy based on temporal-difference learning and the multi-objective cost function defined earlier.
By separating low-level attitude control from high-level waypoint planning, the experiment design ensures that the evaluation highlights the effectiveness of the proposed guidance method rather than coupling it with controller-specific tuning. The results therefore reflect the performance of the waypoint-generation strategy under realistic yet appropriately filtered environmental and system conditions, consistent with the study’s scientific focus.
The experimental prototype of UAV in this study is shown in Figure 5(a), while the unmanned ship is depicted in Figure 5(b). It is explicitly emphasized that this work does not address the design of UAV attitude control laws, AHRS filtering algorithms, or state-estimator calibration strategies; these modules are assumed to be fully functional and optimized. The scientific contribution of this study lies solely in the generation of optimal landing waypoints and the corresponding guidance logic. Experimental prototypes of (a) UAV and (b) unmanned ship.
The simulation environment is constructed to analysis the control performance. The simulated wind speed is given by Beaufort wind force scale with level 1 to level 5. Figure 6(a)-(e) illustrates the motion responses of the UAV under the level 1 to level 5 wind speeds respectively. Figure 6(f) displays the tracking error of the UAV for level 1 to level 5 wind speeds. The results of Figure 6 provide the performance constraints for UAV operations in the simulation environment, ensuring that the findings contribute to refining the guidance law design and optimizing learning parameter settings. Motion responses of the UAV under the (a) level 1, (b) level 2, (c) level 3, (d) level 4, (e) level 5, and (f) tracking error responses of UAV under level 1 to level 5 wind speeds.
In order to ensure that the proposed guidance strategy remains both practically feasible and representative of real-world maritime conditions, the simulations in this study were conducted under Level 3 operational intensity, which corresponds to moderate but realistic environmental disturbances. Specifically, performance constraints were defined to guarantee UAV stability while maintaining experimental fidelity to real-world dynamics. The maximum allowable tracking error was restricted to ±0.5 m in position and ±2° in attitude, ensuring that the UAV remained within safe proximity to the vessel deck during landing maneuvers. Attitude-rate limits were imposed at ±15°/s, preventing excessive angular accelerations that could compromise control authority or structural safety. Furthermore, the upper wind-level limit was set to 8-10 m/s, reflecting sea-state conditions where UAV operations are still viable but subject to significant aerodynamic disturbances. These thresholds were incorporated directly into the simulation environment as boundary conditions, constraining the optimization process and guiding the reinforcement learning model to operate within safe margins.
By adopting these quantified constraints, the study achieves two objectives: first, it ensures that UAV performance remains within operational safety envelopes; second, it enhances the realism of the simulation framework by aligning experimental conditions with maritime standards. The choice of Level 3 intensity reflects a deliberate balance challenging enough to test robustness under dynamic disturbances, yet sufficiently representative of practical deployment scenarios. This approach strengthens the validity of the results and provides confidence that the proposed guidance strategy can be extended to real-world UAV-ASV cooperative operations without compromising safety or reliability.
As shown in Figure 7, the motions of the ship are commanded as circular and linear trajectories, while the risk factors caused by ship attitude are represented as noise in Motions commands of the ship: (a) Linear and (b) circular trajectories.
Based on the given ship’s trajectories of Figure 7, the flight trajectory responses for the UAV in both vertical and fixed-wing modes can be found in Figure 8. Figure 8(a) presents the UAV flight trajectory response in fixed-wing mode, while that of vertical mode is given in Figure 8(b). Flight Trajectory of UAV under (a) fixed-wing mode and (b) vertical mode.
In practical, due to limitations in controller update rates and system bandwidth, it is not feasible to directly generate and execute an ideal trajectory using continuous functions. Instead, the waypoint of the actual UAV guidance system is updated by an interval of 0.1 seconds. In vertical flight mode at a speed of 25 mph, the waypoint is updated by 1.17 meters, while that of fixed-wing flight mode is 100 mph and its waypoint is updated by 4.47 meters, as shown in Figure 9(a). The update process of Figure 9(b) involves generating 720 candidate waypoints radiating outward from the current waypoint. These waypoints form a decision tree, where each decision is evaluated using the global loss function. The results are then provided to the Actor-Critic framework for iterative optimization, enabling trajectory generation based on waypoint updates. (a) UAV waypoint update distance and (b) prediction trajectory update method.
3.1. UAV landing under straight-line ship navigation
For UAV landing under straight-line ship navigation, the motion control of the UAV must adapt to the motion characteristics of the ship’s speed and direction. Straight-line navigation features a relatively simple motion mode, making UAV’s landing control and guidance more straightforward and predictable. During the training process, as shown in Figure 10, the landing strategy of the UAV undergoes 1,000 iterations to gradually adapt and optimize its landing strategy. The precision of the landing strategy is evaluated using the mean squared error (MSE) of the loss function in each iteration. In the initial iterations, the MSE of the loss function is relatively large because the UAV’s strategy is not yet well-developed and requires continuous adjustment and optimization. For the training progresses, the strategy can be improved to handle the ship’s straight-line motion pattern, resulting in a gradual decrease of MSE, reflecting strategy improvement and optimization. For the end of 1,000 iterations, MSE is expected to reach a low level, indicating that it can accurately control and guide its landing process while achieving the objectives of minimal risk, shortest landing time, and lowest energy consumption. MSE of the loss function for 1000 iterations for UAV landing in a straight-line ship navigation.
The UAV landing motion simulation under straight-line ship navigation demonstrates the UAV’s progress throughout the training process by comparing its top-down landing trajectories from the 1st and 1000th iterations. In the 1st iteration, as shown in Figure 11, the UAV’s landing strategy is not yet mature and remains in the adaptation phase for the ship’s straight-line motion. The top-down trajectory may exhibit significant deviation and instability. To prevent frequent divergence in the early training stages, the study employs manual guidance for the first five epochs. However, the UAV’s landing trajectory remains insufficiently smooth, requiring multiple adjustments and deviating from the intended trajectory. At this stage, the MSE of the loss function is relatively large, indicating that the UAV has not yet fully adapted to the straight-line navigation mode and requires the further optimization and learning. After 1000 iterations, the UAV’s landing strategy is fully optimized, achieving effective adaptation to the ship’s straight-line motion characteristics. The top-down trajectory reveals a stable and smooth landing trajectory during the final guidance phase. The landing process of the UAV can precisely be controlled for maintaining high consistency under the ship’s motion trajectory. At this stage, the MSE of the loss function significantly be decreased, demonstrating the efficiency and accuracy of the optimized strategy, achieving the objectives of minimal risk, shortest landing time, and lowest energy consumption. Top-view trajectory of UAV landing motion during the ship’s straight-line navigation with the 1st and 1000th iterations.
As shown in Figure 12, the 3D trajectory of UAV landing motion offers a more intuitive representation of its movement modes in both vertical and horizontal dimensions. The landing strategy for the UAV is still under development, showing limited adaptability to ship navigating in a straight line. In the 1st iteration, the 3D trajectory of the UAV may exhibit significant deviations and instability. In the vertical dimension, multiple height adjustments can be observed, while in the horizontal dimension, the path may lack smoothness and even display lateral offsets. The initial landing trajectory reflects the need for continuous adjustments to align with the ship’s straight-line motion. After 1000 iterations, the UAV’s landing strategy has been significantly improved. In the 1000th iteration, the 3D trajectory of the UAV is highly stable and smooth. Adjustments in the vertical dimension are more precise, maintaining a stable height within the predetermined range. On the horizontal plane, the path becomes distinctly linear, allowing the UAV to land accurately along the ship’s straight-line navigation. The refined landing trajectory demonstrates that the UAV has achieved precise control over its landing process, meeting the goals of minimal risk, shortest time, and reduced energy consumption. (a)3D trajectories of UAV landing motion during the ship’s straight-line navigation and (b) attitude with the 1st and 1000th iterations.
Compared to straight-line navigation, curved navigation of a vessel involves more complex motion modes with continuously changing speed and direction, as shown in Figure 13. This presents greater challenges for drone landing control and guidance. Under such conditions, the drone requires more flexible and precise adjustments to its strategy to adapt to the vessel’s motion characteristics. In this training process, 1000 iterations are still utilized to progressively optimize the drone’s landing strategy. At the initial stage, due to the uncertainty and complexity of curved navigation, the MSE of the drone’s loss function is likely to be significantly higher than in straight-line navigation scenarios. However, through adaptive optimization using the Actor-Critic framework, the drone gradually learns and adapts to the vessel’s curved motion characteristics, continuously adjusting its landing strategy to reduce the loss function’s MSE. As training progresses, the drone’s strategy becomes increasingly refined, and the MSE of the loss function is expected to decrease, demonstrating substantial improvements and optimization effects. By the end of the training, approaching the 1000th iteration, the drone should be able to effectively handle the vessel’s curved navigation and achieve accurate landings, meeting the goals of minimal risk, shortest time, and reduced energy consumption. MSE of the loss function for 1000 iterations during the drone landing training in the curved navigation of a vessel.
The top-view trajectories of drone landing motion for the 1st, 500th, and 1000th iterations during circular navigation of a vessel reflect the learning and adaptation process of the drone under complex motion modes as shown as Figure 14. In the 1st iteration, the drone’s landing strategy is in its initial phase, lacking sufficient capability to respond to the intricate movement mode of circular navigation. To prevent frequent divergence during the early stages of the experiment, manual guidance is employed for training during the first five epochs of initialization. The top-view trajectory may display notable deviations and instability. The drone’s landing path might appear discontinuous with multiple adjustments, struggling to accurately follow the vessel’s motion trajectory. At this stage, the MSE of the loss function is high, indicating the drone’s lack of adaptation to the circular navigation mode. After 500 iterations, the drone’s landing strategy is progressively optimized, showing moderate adaptability to the vessel’s circular navigation motion. The top-view trajectory reveals a relatively more stable and smoother path compared to the 1st iteration, although certain deviations and adjustments may still be present. The drone demonstrates improved capability to follow the vessel’s motion trajectory, and the MSE of the loss function significantly decreases, reflecting advancements in strategy and optimization effectiveness. By the 1000th iteration, the drone’s landing strategy has been thoroughly refined, achieving high adaptability to the vessel’s circular navigation characteristics. The top-view trajectory indicates a highly stable and smooth landing path. The drone can precisely control its landing process, maintaining consistent alignment with the vessel’s motion trajectory. At this stage, the MSE of the loss function drops dramatically, demonstrating the strategy’s efficiency and precision in meeting the goals of minimal risk, shortest time, and reduced energy consumption. Top-view trajectories of drone landing motion for the 1st, 500th, and 1000th iterations during the circular navigation of a vessel.
In the circular navigation experiment of the vessel, the 3D landing trajectories of the drone illustrate its adaptation and learning process under complex motion modes, as shown in Figure 15. The drone initially struggles to adapt to the motion characteristics of circular navigation, with its strategy still under development. During the 1st iteration, the 3D trajectory might exhibit significant deviations and instability. Frequent height adjustments could occur in the vertical dimension, coupled with poor precision in height control. On the horizontal plane, the trajectory might lack smoothness, making it difficult to accurately follow the vessel’s circular navigation path. The landing trajectory at this early stage reflects the need for further learning and strategy refinement to handle the complexities of circular navigation. As the iterations progress, the drone gradually optimizes its landing strategy. By the 500th iteration, the 3D trajectory shows noticeable improvements. Height control in the vertical dimension becomes more stable, though some adjustments may still be necessary. The horizontal trajectory is smoother compared to the initial stage, yet certain deviations and path corrections might persist. The landing trajectory at this intermediate stage demonstrates the drone’s progress in adapting to circular navigation patterns but indicates the need for continued optimization. After 1000 iterations, the drone’s landing strategy achieves full optimization. During the 1000th iteration, the 3D trajectory exhibits exceptional stability and precision. Height control in the vertical dimension becomes highly accurate, consistently maintaining the intended height range. On the horizontal plane, the trajectory becomes remarkably smooth, allowing the drone to accurately follow the vessel’s circular navigation path during landing. The final stage of the landing trajectory demonstrates the drone’s ability to achieve precise landings in complex circular navigation modes, meeting the objectives of minimal risk, shortest time, and reduced energy consumption. (a)3D trajectories of drone landing motion and (b) attitude for the 1st, 500th, and 1000th iterations during the circular navigation of a vessel.
This study investigates landing control strategies of Tail-sitter drones under vessel motion, focusing on straight-line and circular navigation. In early iterations, both modes show unstable trajectories with frequent height adjustments and horizontal deviations. As iterations progress, the strategies gradually improve, with smoother 3D motion and more precise control. By the 1000th iteration, optimization is achieved in both modes: the drone maintains stable height within the intended range, and its horizontal trajectory aligns accurately with the vessel’s path. Overall, the optimized strategies minimize risk, shorten landing time, and reduce energy consumption.
In the integrated framework of adversarial game theory and reinforcement learning, UAV landing optimization on autonomous surface vessels is driven by three objectives: minimizing energy, reducing time, and lowering risk. Energy converges fastest, as effective thrust allocation and trajectory planning are quickly learned through Actor–Critic updates. Landing time converges at a moderate pace due to trade-offs with energy and risk, requiring gradual policy refinement. Risk reduction is the slowest, with early oscillations under adversarial disturbances, but eventually achieves robust safety. A hierarchical convergence prioritizing risk, then time, and finally energy ensures optimal outcomes, guiding UAVs toward Nash equilibrium and Pareto-optimal solutions. This design secures reliable convergence, avoids local optima, and provides a robust learning trajectory for cooperative UAV-ASV landing tasks (as shown in Figures 16–18). Multi-objective optimization by game theory for 3 strategy condition (Energy). Multi-objective optimization by game theory for 3 strategy condition (Time). Multi-objective optimization by game theory for 3 strategy condition (Risk).


This study examines the landing strategies of Tail-sitter drones under straight-line and circular vessel navigation, highlighting iterative optimization. While results show significant improvement, several factors remain uncertain. Meteorological conditions such as wind and rainfall could affect stability, yet experiments assume ideal weather. Vessel trajectories in practice may be more complex than straight or circular paths, requiring further testing. Communication delays and interference, also excluded from experiments, could hinder real-time control. Addressing these variables in future research is essential to enhance resilience and ensure reliable drone landings under diverse operational conditions.
The simulation results presented in this study are designed primarily to evaluate the iterative convergence behavior and the methodological effectiveness of the proposed waypoint-generation framework over a sequence of 1-1000 learning iterations. Rather than providing an exhaustive statistical characterization of all possible landing outcomes, the objective of the experimental setup is to demonstrate that the integrated game-theoretic, Pareto-guided, and Actor–Critic structure is capable of producing progressively improved landing trajectories and consistent convergence toward viable guidance solutions. This iterative analysis establishes the feasibility of the approach and lays the theoretical and algorithmic foundation for future work on full terminal guidance integration.
Figure 14 illustrates one of the central contributions of the proposed waypoint-generation framework: the ability to produce a smooth, convergent circular-descent trajectory during the approach phase of the landing process. The significance of this result does not lie in terminal touchdown precision which is beyond the scope of the present work but rather in demonstrating that the learning-based guidance strategy can generate progressively refined, dynamically consistent waypoints that yield a stable, non-oscillatory descent pattern.
The circular-descent behavior shown in the figure highlights how the Actor–Critic policy, shaped by Pareto-based multi-objective optimization, effectively suppresses abrupt lateral corrections and avoids discontinuities in commanded motion. This produces a guidance path that is both geometrically smooth and dynamically feasible for the UAV’s inner-loop attitude controller, which is assumed to be fully tuned and stable. From a guidance-system perspective, the ability to maintain a smooth, low-jerk trajectory during descent is critical: it reduces the control burden on the attitude loop, minimizes transient thrust disturbances, and preserves overall vehicle stability as the UAV transitions into tighter proximity with the ASV deck.
Thus, the result in Figure 14 should be interpreted as evidence of trajectory-level convergence and motion regularization, not as an indicator of final landing accuracy. The focus of this study is on learning-based waypoint generation and its capacity to produce structured, stable approach geometries of which the circular-descent pattern serves as a clear demonstration. The development of a full terminal landing controller capable of precision deck touchdown is reserved for future work and requires additional considerations such as close-range perception, deck pose estimation, and terminal-phase disturbance rejection.
Accordingly, the experimental results focus on evaluating whether the learning architecture achieves (i) monotonic improvement in the multi-objective cost, (ii) consistent reduction in temporal-difference error, (iii) stability under filtered AHRS-based state inputs, and (iv) adherence to the operational constraints modeled in the proposed optimization structure. While comprehensive statistical metrics such as mean landing error, standard deviation, miss distance distributions, and energy-per-attempt analyses are valuable for assessing field robustness, they fall outside the primary scope of the current investigation. These metrics are more appropriate for a later stage of research, when the algorithm is integrated with a complete terminal descent controller and validated under real hardware-in-the-loop or field-test conditions.
Similarly, the simulation environment used in this work adopts representative wave-induced deck motions and wind disturbances, but it does not attempt to model the full range of maritime operational envelopes such as explicit Beaufort scale classifications, rapidly changing sea states, sudden ship maneuvers, communication loss, or GPS-denied conditions. These extreme or failure-mode scenarios involve additional modeling complexities, controller-switching logic, and safety-critical behaviors that go beyond the intermediate objective of validating the waypoint-generation strategy itself. For these reasons, the present study restricts its scope to demonstrating core functional correctness, iterative policy improvement, and adherence to the multi-objective landing logic within a representative but controlled simulation environment.
This study’s applications of multi-objective optimization can be expanded beyond risk, time, and energy to include safety, stability, and cost for more comprehensive outcomes. Reinforcement learning has proven effective, and future work could explore advanced methods such as deep reinforcement learning and imitation learning to enhance adaptability. Beyond landing strategies, broader drone–vessel collaboration in navigation, information sharing, and joint operations could improve system efficiency. Despite demonstrated effectiveness under current vessel motion modes, further research is needed to address unknown factors and extend optimization to diverse objectives, algorithms, and cooperative scenarios. Such advancements would strengthen the theoretical foundation and technical support for UAV operations in complex environments. A detailed discussion of how the proposed framework can be extended toward full operational deployment covering emergency behaviors, sudden ASV motion changes, packet losses, estimator degradation, and more realistic sea-state spectra is provided separately in the Discussion section. These considerations outline the required next steps for transitioning from simulation-based waypoint-generation evaluation to a comprehensive end-to-end autonomous maritime landing system. Accordingly, the value of the circular-descent trajectory in Figure 14 lies in its demonstration of stable waypoint evolution and smooth path shaping rather than in terminal touchdown accuracy, which is intentionally excluded from the scope of this study.
4. Discussion
Although the proposed dynamic guidance framework demonstrates promising results in simulation, several limitations must be recognized. The current study relies primarily on modeled disturbances and controlled experimental conditions; the absence of extensive sea trials under highly variable maritime environments constrains the generalizability of the findings. The integration of sliding mode control, game-theoretic optimization, and reinforcement learning has been validated in a unified architecture, yet the computational burden of real-time implementation on resource-constrained UAV platforms remains insufficiently explored. While the Actor-Critic network adapts to evolving vessel dynamics, its performance under sensor degradation, communication delays, or adversarial disturbances has not been systematically evaluated, leaving open questions regarding robustness in operational deployments. A notable outcome of the proposed waypoint-generation framework is its ability to produce smooth, continuously convergent descent geometries, such as the circular-descent pattern illustrated in Figure 14. This characteristic offers several important advantages for UAV landing in maritime environments, where deck motion, wind shear, and ship-induced turbulence amplify the sensitivity of the vehicle to abrupt guidance commands. Smooth descent geometries inherently reduce high-frequency lateral and vertical accelerations, thereby minimizing the control effort required from the UAV’s inner-loop attitude controller. Because the present study assumes that the attitude controller is fully optimized and operating at stable bandwidths, the smoothness of the outer-loop guidance directly translates into more predictable and less saturated actuator responses, lowering the probability of transient destabilization during the approach phase. Smooth and gradually converging descent paths mitigate the risk of amplification of deck motion disturbances. Sudden or aggressive trajectory corrections often couple with ship motion in undesirable ways, particularly when the ASV exhibits low-frequency roll or pitch oscillations. By contrast, a circular or quasi-helical descent distributes lateral corrections over extended path segments, effectively averaging out short-term disturbances and preventing phase alignment with the ship’s oscillatory modes. This decoupling effect enhances both dynamic safety and deck-relative positional stability as the UAV transitions into close-range proximity.
A structured smooth descent improves downstream compatibility with terminal landing logic. In practical maritime UAV operations, the terminal 3 to 5 meters above the deck require specialized visual, LiDAR-based, or RF beacon-based localization algorithms to refine final relative positioning. These terminal-phase controllers typically assume that the UAV enters the close-range region with low residual lateral velocity, low jerk, and minimal heading oscillation. The smooth descent trajectories produced in this study satisfy these preconditions by ensuring that motion is already stabilized before the terminal guidance layer activates. In effect, smooth descent geometry acts as a conditioning layer, preparing the UAV for more precise but more sensitive terminal landing behaviors.
Smooth descent patterns offer operational benefits in safety-critical and crewed maritime environments. A predictable and visually interpretable descent profile reduces operational uncertainty for both onboard crew and autonomous ship systems, lowering the likelihood of emergency aborts triggered by erratic UAV movement. This predictability is particularly valuable when integrating UAVs with autonomous vessel systems that rely on cooperative sensing or anticipatory deck-motion compensation.
The smooth descent geometry observed in the results is not a secondary artifact but a structural advantage of the waypoint-generation method proposed in this work. It contributes to stability, reduces control burden, mitigates disturbance coupling, and improves compatibility with terminal landing modules. These benefits reinforce the argument that trajectory shaping rather than only terminal accuracy is a critical component in developing reliable autonomous maritime landing systems.
The present study provides the architectural and algorithmic specifications necessary for reproducing the proposed Actor–Critic framework, several practical considerations remain beyond the scope of the current manuscript and therefore constitute avenues for future research. While the experiments include realistic latency assumptions, a full hardware-in-the-loop (HIL) evaluation using an actual embedded flight computer is not presented here. Such real-time tests would allow detailed profiling of computational jitter, thermal throttling effects, and long-duration memory fragmentation factors that can influence onboard learning stability. Although the actor–critic design implicitly handles partial observability by incorporating filtered historical states, more advanced techniques such as LSTM-based recurrent critics or belief-state estimators (e.g., extended Kalman filters specifically tuned for nonstationary marine motion) could further improve policy robustness. These approaches, however, introduce significantly higher computational cost and were omitted here to maintain feasibility for small-format UAV processors. While exploration noise is carefully tuned for convergence, a more formal treatment of exploration-exploitation trade-offs under ship-induced nonstationary disturbances could be developed using adaptive exploration schedules or Bayesian uncertainty estimation. Such methods would deepen the theoretical grounding of the learning framework but also require substantial expansion beyond the intended scope of this study. Onboard training though conceptually valuable for long-term adaptation was not evaluated; the present study focuses solely on offline training with online inference. Future work could consider incremental or continual learning strategies that update the policy during flight, accompanied by formal stability safeguards to ensure that live adaptation cannot violate the safety envelope during deck-landing operations.
While the present study focuses on validating the algorithmic feasibility of the proposed guidance framework, several real-world operational challenges remain outside the scope of the current simulation campaign. Sudden ASV maneuvers, unexpected communication interruptions, GPS-denied operation, high-sea-state transitions, rapid wave-slope changes, and emergency abort scenarios all require additional layers of robustness logic and fault-tolerant control that are not modeled here. These behaviors typically involve switching between guidance modes, fail-safe recovery procedures, and sensor-level redundancy management, all of which will be addressed in future work as the waypoint-generation module is integrated with a complete terminal landing controller. The current results should therefore be interpreted as demonstrating the viability of the strategy-generation layer, upon which full-scale maritime UAV landing autonomy can be progressively constructed.
Future research should therefore concentrate on three concrete directions. One avenue is the extension of the guidance strategy to full-scale experimental validation in diverse sea states, thereby assessing the adaptability of the proposed control laws under realistic hydrodynamic and meteorological conditions. Another direction involves the optimization of onboard computation and energy allocation, ensuring that multi-objective decision-making can be executed within the strict hardware and power constraints of tail-sitter UAVs. Finally, further investigation into resilience mechanisms such as fault-tolerant sensing, secure communication protocols, and adversarial learning defenses will be essential to guarantee reliable UAV-ASV cooperation in contested or degraded environments. By addressing these limitations, subsequent work can strengthen both the theoretical rigor and the practical applicability of dynamic guidance strategies for autonomous maritime operations.
The use of a Nash equilibrium framework in this study must be interpreted within the operational constraints of hybrid UAV-ASV systems. While classical game theory assumes active, simultaneous optimization by both agents, real-world ASVs rarely modify their trajectory with high frequency to support UAV landing. Therefore, the Nash equilibrium employed here represents an effective equilibrium: the UAV optimizes its landing trajectory under a fixed but disturbance-influenced ASV motion profile. The equilibrium conditions are validated through the convexity of the multi-objective loss, the compactness of feasible landing trajectories, and the KKT-based optimality constraints applied during Pareto front generation.
However, the approach does not model adversarial or fully strategic ASVs. Future work should consider extending the game-theoretic formulation to include cooperative ASVs capable of predictive deck alignment or dynamic station-keeping, which would enable true two-sided equilibrium computation. Nevertheless, the current formulation remains theoretically consistent and operationally suitable for the class of ASVs typically deployed in maritime UAV operations.
5. Conclusions
This study presents a unified guidance control framework enabling tail-sitter UAVs to achieve autonomous landings on moving vessels with minimal risk, reduced energy consumption, and shortened landing time. By combining sliding mode control with Lyapunov-based stability guarantees, the system maintains robustness under nonlinear and dynamic conditions. The integration of game-theoretic modeling ensures Nash equilibrium between UAV and vessel dynamics, while Pareto optimization validated through Karush-Kuhn-Tucker conditions provides a mathematically rigorous trade-off among time, energy, and risk. Reinforcement learning, implemented via an Actor-Critic network, further enhances adaptability by dynamically updating landing strategies in response to vessel motion modes such as circular navigation and straight-line travel. Experimental validation confirms that the proposed method achieves stable landings across diverse dynamic scenarios, reducing throttle demand and demonstrating consistent convergence of trajectory errors, thereby substantiating its effectiveness for UAV-ASV cooperative operations.
Nevertheless, three limitations remain. The reinforcement learning model, though effective in controlled environments, requires further generalization to cope with complex maritime conditions involving variable wind fields and wave interference. The current framework also focuses on single UAV-vessel interactions, leaving multi-drone coordination and cooperative landing strategies unexplored. Finally, energy optimization is addressed only at the trajectory level, without incorporating bio-inspired endurance mechanisms or energy recovery processes. Future research should therefore extend reinforcement learning toward multimodal adaptation, investigate coordinated landing strategies through multi-agent game theory, and develop dynamic energy management models to enhance UAV endurance. Addressing these challenges will strengthen both the theoretical rigor and the practical applicability of autonomous UAV landings, paving the way for broader deployment in ocean monitoring, logistics, and emergency rescue operations.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
