Abstract
This paper presents an optimal trajectory tracking control algorithm for autonomous surface vessels (ASVs) using data-driven reinforcement learning (RL) to address challenges arising from model uncertainties and time-varying external disturbances in complex marine environments. To ensure robust performance under these conditions, we first employ the H∞ control method. Then, we design a model-based RL algorithm to achieve the optimal trajectory tracking control law for the ASV despite uncertainties and disturbances. In addition, we extend the model-based RL algorithm to a model-free data-driven RL algorithm, removing the requirement for model of the ASV. The model-free algorithm directly learns the optimal control law from real-time data, providing a more flexible solution when the model of the ASV is unknown. Simulations are conducted to verify the proposed algorithms.
Keywords
Introduction
Autonomous surface vessels (ASVs), as prominent representatives of autonomous maritime platforms, possess immense potential for various applications (Hou et al., 2024; Jiang et al., 2025; Qiao et al., 2023; Yan et al., 2025). These applications include search and rescue operations, marine resource exploration and security patrols (Cao et al., 2024, 2022; Wang et al., 2024a). In these domains, the ability of ASVs to track a desired trajectory is a prerequisite for achieving mission objectives (Meng et al., 2024; Mu et al., 2024; Wang et al., 2022a,b; Zhang and Chen, 2022). Due to the significant energy demands in marine transportation and exploration, it is crucial to incorporate optimal control into ASV design. Optimal control enhances the system performance of ASVs and addresses trajectory tracking control problems (Chen et al., 2024; Lewis et al., 2012). Therefore, ensuring safe, accurate, and energy-efficient optimal trajectory tracking of ASVs has emerged as a significant challenge in marine engineering and control. Although optimal trajectory tracking for ASVs is crucial, deriving the optimal control law based on classical optimal control theory becomes exceedingly challenging because of the challenges posed by the marine environment, the inherent nonlinear characteristics of the ASV system and the uncertainties of system parameters.
The development of robust control techniques aims to address the challenges posed by uncertainties in dynamic models, ensuring that systems can maintain reliable performance despite modeling uncertainties and disturbances (Khalaji and Bahrami, 2022; Li et al., 2024; Liu and Yao, 2016; Wang et al., 2021). As a robust control method, H∞ control aims to design control strategies capable of handling system model uncertainties and disturbances (Chen, 2013). As a result, H∞ control has been widely applied to ASV systems (Huang et al., 2020; Rigatos, 2023). For the ASV system with parametric uncertainties and external disturbances, its course control system can be modeled as a polynomial system with polytopic uncertainties. Stability and H∞ conditions are derived using parameter-dependent Lyapunov function methods and positive polynomial theory, enabling effective control of the ASV system (Huang et al., 2020). A nonlinear optimal control law based on H∞ control is proposed to address the ASV control problem under model uncertainties and disturbances, aiming to enhance autonomy and reduce energy consumption (Rigatos, 2023). However, these control strategies are based on nominal ASV models and may fail to achieve optimal performance in uncertain real-world systems. The controller proposed in this paper effectively addresses uncertain systems in real-world ASV applications, achieving optimal trajectory tracking.
Classical optimal controllers typically rely on precise dynamic system models, which present challenges in their application to complex environments, particularly when the system model is difficult to obtain accurately (Lewis et al., 2012; Zhou et al., 2023a). Several approaches have been introduced to improve the trajectory tracking performance of ASVs in order to overcome these obstacles (Gu et al., 2023; Huang et al., 2024; Zhou et al., 2023b). For the optimal path tracking problem of ASVs under state constraints, a control method combining backstepping, adaptive dynamic programming and event-triggered mechanisms has been proposed by Zhou et al. (2023b). This method significantly reduces communication and computational burdens by introducing guidance laws, dynamic controllers, and event-triggered modules, ensuring near-optimal performance in the process. A technique is introduced for optimal trajectory control of ASVs with obstacles, using a fixed-time extended state observer and a recursive neural network (NN) optimization algorithm to ensure safety and achieve the desired objectives (Gu et al., 2023). Furthermore, an integrated optimal control framework is proposed to tackle the distributed optimal coordination challenge for ASVs, combining trajectory optimization for coordination with local tracking subsystems to ensure global optimality (Huang et al., 2024). However, all of these methods depend on precise dynamic system data of the ASV, which is often challenging to obtain in the complex and variable marine environment. In contrast, the controller introduced in this paper can interact iteratively with the ASV in real time in such environments, continuously learning the optimal control law from the generated state data, thus overcoming the limitations of relying on system model.
In recent years, inspired by biological systems, reinforcement learning (RL) has become an effective tool for addressing complex problems such as nonlinear dynamics, high-dimensional state spaces, and time-varying disturbances (Buşoniu et al., 2018; Cheng et al., 2023; Huang et al., 2025; Wang et al., 2019; Zhao et al., 2020). In the face of these problems, the RL as a direct adaptive control method, mainly focuses on decision-making in complex environments to minimize long-term costs, thereby achieving optimal control (Lewis and Vrabie, 2009; Modares and Lewis, 2014; Wang et al., 2024b; Yuan et al., 2022). As a result, the RL has shown significant potential for realizing optimal control of ASVs (Liu et al., 2024; Nguyen et al., 2023; Vu et al., 2022a; Wang et al., 2023). To address the optimal control problem of ASVs under external disturbance uncertainties, researchers have proposed an adaptive optimal control strategy to achieve optimal trajectory tracking for ASVs (Vu et al., 2022a). Considering the decoupling of the kinematic and dynamic subsystems in ASVs, an adaptive RL algorithm is combined with a kinematic controller to achieve precise trajectory tracking control (Vu et al., 2022b). In addition, some studies combine the kinematic and dynamic models of ASVs with formation control and RL to design control laws, thereby achieving optimal trajectory tracking for ASV formations (Nguyen et al., 2023). Although these studies provide robust control strategies, they often overlook the impact of system parameter variations on the optimal control strategy. The aforementioned methods have not sufficiently explored how to achieve optimal trajectory tracking control for ASVs when the system model is uncertain and environmental information is unknown, which is the core motivation behind this study. Meanwhile, unlike the aforementioned methods, this paper adopts a model-free data-driven RL method, which learns the control law directly from collected system data, thereby avoiding the influence of model parameters on control performance.
Inspired by the above discussion, we propose a model-free data-driven RL algorithm to address the optimal trajectory tracking control problem of ASVs under the influence of model uncertainties and external disturbances. The main contributions of this paper are as follows:
This paper introduces the H∞ control framework enhances the robustness of the ASV system. Building on this, we propose an RL control algorithm based on the dynamic model of the ASV system. This algorithm effectively addresses the challenges of model uncertainties and disturbances in complex marine environments under specific operational conditions by incorporating dynamic model information of the ASV.
This paper overcomes the limitations of traditional model–dependent methods by proposing a model-free data-driven RL control algorithm. This algorithm eliminates reliance on prior information about the dynamic model of the ASV system and directly learns the optimal control law from online interactive data. Compared to existing control methods by Zhou et al. (2023a) and Hasanvand and Seif (2024), the proposed algorithm not only mitigates the impact of model uncertainties and disturbances on control performance but also enhances adaptability to complex environments, thereby improves feasibility for engineering deployment.
This paper is organized as follows. Section 2 provides the background and the problem description. Section 3 develops a model-based RL algorithm and a model-free data-driven RL algorithm for ASVs. Simulation and comparison results are presented in Section 4, and the conclusion is given in Section 5.
Preliminaries
The ASV model with 3 degrees of freedom can be represented as follows
where
Letting
where
where
According to Modares et al. (2015) and from equation (3), we present the following Assumption 1 and Definition 1.
where
where
This paper proposes an optimal trajectory tracking control law for ASVs in the presence of model uncertainties and external disturbances. Specifically, to address the challenges posed by uncertainties in the ASV model and disturbances, a model-free data-driven RL algorithm is proposed to learn the optimal control law, ensuring optimal trajectory tracking performance.
Main results
Model-based tracking control law
This subsection first addresses the lumped uncertainties using the H∞ method and then designs a model-based RL algorithm to tackle the optimal trajectory tracking control problem for ASVs with model uncertainties and disturbances.
To mitigate the impact of the lumped uncertainties h on the ASV system (3), we employ the lumped uncertainties attenuation condition established in Definition 1. As given in equation (5), the impact of the lumped uncertainties h on the position tracking performance can be reduced by a certain degree at least equal to γ. The cost function is expressed as follows
where Ψ and Λ are positive definite matrices. Then the value function is represented as
From the ASV system (3) and the cost function (6), the Hamilton–Jacobi–Isaacs (HJI) equation is derived as
In fact, the control input τ and the lumped uncertainties h can be viewed as two players in a zero-sum differential game. According to the stationary conditions
Substituting equations (9) and (10) into equation (8), we obtain
Based on the preceding discussion, we propose Theorem 1.
Substituting
Therefore, it can be concluded that
where
Multiplying both sides of equation (15) by
Since
From equation (17), it is clear that the ASV system (3) meets the lumped uncertainties attenuation condition (5) when the optimal control law
Based on equation (8), we propose the model-based RL algorithm for finding the solutions
Next, Lemma 1 is introduced to prove the convergence of Algorithm 1.
Data-driven tracking control law
This subsection adapts the model-based RL algorithm into a model-free data-driven RL algorithm, enabling optimal trajectory tracking control for the ASV system (3) without requiring a system model and external disturbance information.
Based on Algorithm 1, we propose Algorithm 2.
where
where
The model-free data-driven RL control algorithm is depicted in Figure 1. As seen in Figure 1, we use the collected data and apply the model-free data-driven RL control algorithm through iterative updates to ultimately acquire the optimal tracking control law, enabling optimal trajectory tracking control of the ASV system.

The block diagram illustrating the proposed data-driven RL controller.
Next, Lemma 2 is introduced to demonstrate the convergence of Algorithm 2.
From equation (3), we obtain
Substituting equations (18) and (27) into equation (26), we obtain
From equations (19) and (20), it follows that
Substituting equations (29) and (30) into equation (28) yields
Multiplying both sides of equation (31) by
Since
Next, a proof by contradiction is used to establish the uniqueness of the solution. Assume the existence of another solution
Since the derivatives of
Equation (6) shows that when the system is at the origin, its value is 0. Therefore, we have
Lemma 2 is derived from Lemma 1 and has a unique solution, meaning that it is equivalent to Lemma 1. The proof is complete. □
Next, we propose Theorem 2 and prove that the stability of the ASV system is guaranteed by the control law derived from Algorithm 2.
Simulation results
In this section, we provide examples to illustrate the effectiveness of the designed model-free data-driven RL algorithm. We use the ASV called Cybership II, with its parameters detailed by Skjetne et al. (2005). The reference trajectory is defined as

Convergence of weights of NNs.
To emphasize the benefits of the RL method in Algorithm 2 introduced in this paper, we compare its performance with the RBF ASMC control method proposed by Jiang et al. (2022). In the comparison figure, the red line represents the reference trajectory, the blue line corresponds to the method proposed in this paper, and the green line represents the RBF ASMC method.
Figure 3 presents a comparison of the ASV’s desired trajectory tracking performance. The results demonstrate that the RL method can track the desired trajectory quickly and accurately, validating the effectiveness of Algorithm 2.

Desired and actual trajectories compared with RBF ASMC in the
Figure 4 compares the tracking performance of the ASV for desired position trajectories. The x and y plots in Figure 4 show that the RL method achieves faster trajectory tracking under the same conditions, demonstrating superior response capability and control efficiency. The ψ plot in Figure 4 demonstrates that the RL method demonstrates tracking accuracy for the actual position trajectory. Figure 5 compares the position tracking errors between the RL method and the RBF ASMC method. Overall, the results in Figures 4 and 5 show that the RL method tracks the desired trajectory faster and demonstrates superior accuracy compared to the RBF ASMC method.

Desired and actual state x, y, and ψ compared with RBF ASMC.

Tracking errors
Figure 6 compares the tracking performance of the ASV desired velocity trajectories. The u and r plots in Figure 6 demonstrate that the RL method tracks the desired velocity more smoothly under identical conditions, significantly reducing fluctuations and errors in the tracking process, which in turn improves the system’s overall performance and reliability. The v plot in Figure 6 shows that the RL method achieves significantly faster tracking of the desired actual velocity than the RBF ASMC method. Figure 7 compares the RL and the RBF ASMC methods’ position tracking errors. In summary, the results in Figures 6 and 7 highlight the advantage of the RL method in achieving stable tracking of the desired velocity trajectories and demonstrate its faster response speed.

Desired and actual state u, v, and r compared with RBF ASMC.

Tracking errors
The comparisons outlined above emphasize the notable benefits of the RL method, especially with regard to important performance metrics such as tracking accuracy, stability, and response speed. These advantages not only highlight the potential of the RL method in practical applications but also emphasize its contribution to improving the system’s overall performance. Consequently, these results highlight the innovation and practical significance of our research, showing that the RL method effectively addresses the shortcomings of current methods.
Conclusion
This paper presents a model-free data-driven RL algorithm to address the optimal trajectory tracking control problem of ASVs in complex environments. The proposed algorithm eliminates the reliance on system models and environmental information. Instead, it learns the optimal control law directly from interaction data, effectively addressing the challenges posed by complex environments. Theoretical analysis shows that this method can yield the optimal control law. Future work will focus on the high-performance cooperative control of multiple ASVs in complex environments. In addition, we will explore control techniques for heterogeneous agents to overcome the reliance on system models and environmental information in traditional methods.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China (grant nos. 62173054 and 62073054).
Data availability statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
