Ship hydrodynamic prediction by a novel transformer model with incremental learning

Abstract

With the rapid development of maritime commerce, the complexity of ship operation environments continues to grow, so that it is crutial for ship designers to make accurate forecasting of ship performance under various sea conditions. However, there exist the following challenges in ship performance evaluation and optimization. The first challenge is the computational overhead and slow response of numerical simulations based on Computational Fluid Dynamics (CFD), and the second is the inability of traditional machine learning to achieve the computational accuracy of numerical simulations. To address these issues, this paper proposes an online surrogate model, called Increformer, for ship performance prediction. The Increformer leverages continuous self-attention mechanisms to explore the temporal dependencies between feature variables and employs continuous normalization mechanisms to handle non-stationary data issues. In addition, in order to improve prediction accuracy by the model, we employ an incremental training strategy based on elastic weight consolidation to acquire new knowledge from data streams. Experiments are conducted by using historical performance data from various types of vessels including KCS, Wigley-III, and C60. The results demonstrate that the Increformer model effectively captures the temporal information and inter-dimensional correlations in the data, with the accuracy of ship performance prediction enhanced significantly. Furthermore, ablation experiments are also carried out to assess the effectiveness and necessity of the continuous normalization mechanism, continuous attention mechanism, and incremental training strategy for the Increformer model. The findings validate the accuracy and universality of the proposed model. It is also shown that the Increformer model adeptly captures trends and fluctuations in ship sequence data and thus provides a reliable solution for ship performance prediction.

Keywords

Time series prediction transformer incremental learning continual attention

1 Introduction

With the continuous development of computing technology, simulation-based design (SBD)^1,2 become the mainstream method for optimal design of ships. It effectively combine numerical methods of computational fluid dynamics (CFD) with hull geometry reconstruction techniques, aiming to achieve optimal sailing performance under given constraints. As such, SBD has essetntially promoted the development of modern ship hull optimization methods, driving the transition of ship design methodologies from traditional experience-based approaches to data-driven intelligent design methods.

CFD technology is the main tool in the SBD method. However, when used for actual optimal ship hull design, it also encounters problem of high computational costs, which thus are detrimental to ship design work. Therefore, it is of great significance to ship design to develope methods of ship performance prediction and evaluation with lower computational requirements and higher efficiency, based on physical experimental data and CFD simulation data. To this end, many scholars have introduced surrogate modeling methods to replace CFD methods.

A surrogate model refers to a fast and non-linear response model built based on existing ship performance data. It can effectively reduce the computational costs required for CFD numerical simulations during ship hull design and performance test.¹ By employing suitable machine learning models, a surrogate model with high prediction efficiency and accuracy can be constructed to precisely predict various sub-performances and the overall performance of a ship during the design process, and thus can provide a objective function for ship performance optimization.^2–5

It is certainly helpful for for efficient ship design to employ machine learning-based surrogate models to predict ship performance. However, traditional machine learning or deep learning models are generally used for static scenarios and cannot adapt or expand their peridction capabilities over time. Since ship type data is often not obtained and imported to the model all at once, the model should be iterated based on the already trained model with newly acquired data. Besides, due to the limitation of storage and computation capacities, the iterative updating of the model generally does not involve old data.^6–9 Hence, incremental learning methods^10–12 can be employed for updating the surrogate model data streams and thus can enable the model to perform ship performance prediction based on new ship data. This is especially important for evaluating ship hydrodynamic performance, because it is not only related to the ship's inherent characteristics but also is influenced by the continuous flow of external information during navigation. Thus, the surrogate model needs to learn and remember relevant knowledge from the dynamic data distribution so that ship performance can be effectively predicted. Taking ship rolling motion as an example, the effectiveness of ship anti-rolling devices typically depends on the ship's current motion state. If short-term predictions of future rolling angles and other motion states can be achieved, this predicted information can be used to optimize the control strategy of the ship's anti-rolling devices. By adjusting the anti-rolling devices in advance, the ship's rolling amplitude can be controlled more precisely, effectively reducing rolling during navigation and thus improving the ship's seaworthiness and safety.

In summary, traditional physical ship model testing involves a complex design process and yields limited data, while CFD numerical simulation methods are computationally intensive and inefficient, and cannot provide timely responses. Furthermore, traditional machine learning methods used to build surrogate models fail to dynamically adapt to real-world scenarios for real-time prediction of ship performance. Therefore, it is of great significance to develop efficient and effective intelligent surrogate models for ship performance prediction by using incremental learning methods. However, there are no such kind of researches in this area.

This work aims to construct an incremental learning based on surrogated model for accurat and real-time ship hydrodynamic performance prediction. To this end, we design an online time series prediction model called Increformer and employ an incremental learning for model training . The novelty of our method is that its combines an improved EWC incremental learning algorithm and the Transformer deep learning framework, aiming to provide more accurate hydrodynamic performance predictions during ship navigation. Sepcificaly, the novelty and the contributions of this work are as follows.

First, the Increformer model, inspired by the encoder-decoder structure of the Transformer model, is an architecture that has achieved tremendous success in the field of natural language processing. However, we have made significant improvements on this model, particularly in the attention mechanism and the normalization methods of the network layers. With these improvements, the Increformer model can not only handle long-range dependencies in ship navigation data but also can adapt to constantly changing marine environments and ship operating conditions, thereby enabling effective incremental learning.

Second, in terms of the attention mechanism, we adopt a continuous multi-head attention mechanism to enable the model to dynamically adjust its focus based on the characteristics of the input data. This allows Increformer to accurately capture the motion state of the ship, which is measured by key performance indicators such as roll, pitch, and heave, regardless of whether the sea conditions are calm or rough. Simultaneously, the model can also predict the resistance encountered by the ship in waves, which is crucial for evaluating the ship's energy efficiency and structural safety.

Third, to further improve the model's performance, we also optimize the normalization method of the network layers. Normalization is a common technique in deep learning, which helps to accelerate training speed and improve the model's generalization ability. In our model, the improved normalization method ensures that the model maintains a stable learning speed and prediction accuracy when new data continuously receives.

Fourth, we incorporate Increformer with an improved Elastic Weight Consolidation algorithm to enable effective incremental training, so that the model is allowed to continuously update and maintain prediction accuracy as new data arrives.

Finally, we conduct experiments on historical performance data of various ship types, including container ships, cargo ships, and tankers. And through comparative analysis with other incremental learning and deep learning baseline models, we verify the effectiveness and practical application value of the Increformer model to ship hydrodynamic performance prediction.

2 Related work

2.1 Batch learning methods

With the rapid development of artificial intelligence technology, machine learning methods, such as neural networks and support vector machines, have shown promising applications prospects in ship performance forecasting. Recently, researchers have begun to explore applying surrogate models based on m-achine learning in the optimization design of ship types. In order to address the high cost of CFD (Computational Fluid Dynamics) calculations, Peri¹ studied the application of surrogate models, such as response surface models, variable precision models, Kriging models, and radial basis function models, in the optimization design of ship types. Pedersen et al.² used a neural network model to predict propulsive power based on actual ship data. In addition, there has been extensive research on performance prediction of ships in waves, for example, forecast of the rolling motion and trajectory of ships during navigation. Hamid et al.³ optimized the underwater profile parameters of hydrofoils by using neural networks and response surface methods. Petersen et al.⁴ established a functional relationship between ship types and actual ship performance by using artificial neural networks and Gaussian process models. Babadi et al.⁵ employed a BP (Backpropagation) neural network to investigate the impact of the waterline face coefficient on seakeeping performance. Wang et al.⁶ proposed a ship heading time series prediction method based on the structure and algorithm of the backpropagation wavelet neural network. Tang et al.,⁷ to address the issue of decreased control performance due to time delays in wave compensation control systems, studied time series analysis modeling methods such as AR (Autoregressive), ARMA (Autoregressive Moving Average), and ARIMA (Autoregressive Integrated Moving Average). Ferrandis et al.⁸ trained a recurrent neural network (RNN) to predict the pitch, heave, and roll motions of ships. Li et al.⁹ proposed a short-term roll motion prediction model based on the convolutional long short-term memory network and attention mechanism. Xu et al.¹³ proposed an online prediction method based on the automatically moving grid search least squares support vector machine. Guan et al.¹⁴ proposed a ship roll motion prediction method based on the extreme learning machine. In,¹⁵ Yin et al. combined the discrete wavelet transform method with the variable structure radial basis function network for the prediction of ship roll motion. Hou et al.¹⁶ used a non-parametric identification method combined with the random decrement technique and support vector regression to predict the nonlinear roll motion of ships at sea. Wei et al.¹⁷ proposed an integrated multi-step forecasting model composed of adaptive quadratic decomposition, multi-input multi-output strategy deep belief network, multi-objective optimization, and adaptive error correction.

2.2 Incremental learning methods

The different models and methods in the above-mentioned work are modeled and trained through a large amount of historical data to achieve higher prediction accuracy. However, these studies have focused on the setting of batch learning, which requires the whole training dataset to be a priori. This means that the relationship between input and output always remains static. However, when the data stream gradually arrives, the data distribution may change over time and thus the model's prediction accuracy decreases. Training a model from scratch can be time-consuming, and historical data can be difficult to obtain due to issues such as privacy and security. Therefore, it is particularly important to employ incremental learning to update the model in real time. However, the problem of catastrophic forgetting can easily occur in this case. In other words, training the model on a new data set can cause the model's performance on the old dataset to deteriorate significantly. The solution to this problem may fall into the “stability-plasticity” dilemma, that is, how to learn the knowledge of new data while maintaining the prediction accuracy of old data. There are three main paradigms to overcome this problem in the field of incremental learning: regularization, replay, and parameter isolation. For example, elastic weight consolidation (EWC)¹⁰ algorithm is a regularization method; experience replay (ER)¹¹ and dark experience replay (DER++)¹² are based on the replay mechanism; Packnet is a representative parameter isolation¹⁸ algorithm. However, current research on incremental learning mainly focuses on incremental tasks, and there is a lack of in-depth research in the field of regression prediction.

Wang¹⁹ proposed to combine extreme learning machine (ELM) with incremental training methods to achieve integrated online learning. In this method, extreme learning machine with kernels(ELMK) is used as a base model. In,²⁰ an incremental learning algorithm with dynamically weighting ensemble learning(DWE-IL) was proposed. This algorithm dynamically updates the weights of the model parameters, making the model have better generalization performance. However, the differences between the errors of different models in the ensemble may increase over time. In addition to ensemble learning, some researchers proposed to modify the network structure of the deep learning models to adapt the models to online learning scenarios. Wang.²¹ proposed an incremental ensemble LSTM model (IncLSTM), which achieves incremental prediction by optimizing the LSTM network structure. Woo²² used the temporal convolutional network (TCN) as the backbone and proposed an online-TCN model, which can maintain the prediction accuracy of time series data while performing incremental updates. Huang²³ employed Bayesian estimation to improve the transformer model, making the model more suitable for processing streaming data.

Recent studies on time series forecasting have mainly focused on improving model architectures and representation learning for long sequences. Chen et al.²⁴ provide a comprehensive survey of deep learning methods for long-sequence forecasting, highlighting the trend toward attention-based and transformer-style models. Wang et al.²⁵ further benchmark a wide range of deep time series models, showing that scalability and generalization remain key challenges. Zeng et al.²⁶ systematically analyze the effectiveness of transformers for time series forecasting, indicating that architectural inductive bias is often more critical than model complexity. Qiu et al.²⁷ propose a dual clustering framework to enhance multivariate forecasting, reflecting a recent trend of exploiting latent structure and correlations across variables. In parallel, incremental and continual learning methods have gained attention to handle evolving data distributions. Lopez-Paz & Ranzato²⁸ propose Gradient Episodic Memory (GEM), which leverages episodic memory to maintain performance on past tasks. Chaudhry et al.²⁹ introduce A-GEM, an efficient approach for lifelong learning that reduces forgetting via gradient projection. Zeng et al.³⁰ explore context-dependent continual learning in neural networks, highlighting mechanisms to preserve previously learned knowledge while adapting to new tasks.

Time series forecasting is increasingly applied through diverse and adaptive architectures to address distribution shifts and complex temporal dependencies.³¹ At the same time, multimodal integration and large pre-trained models enable applications that jointly process time series and textual data, supporting more flexible and intelligent analysis.³² Additionally, hybrid models combined with transfer learning demonstrate strong effectiveness in real-time prediction tasks within complex systems.³³ Overall, these developments highlight the broad applicability of time series forecasting in handling dynamic, multimodal, and real-world scenarios. However, there are no reported studies on the application of incremental learning to surrogate models for ship performance prediction.

3 Methodology

In this work, we propose a method of ship hydrodynamic prediction based on an Incremental Transformer (Increformer). Increformer is a time-series prediction framework that combines deep learning with incremental learning. The model is based on the Transformer. The input encoding module is used to extract temporal features and obtain the input vector of the encoder. The encoder and decoder use a continuous attention mechanism that is more suitable for incremental scenarios. At the same time, the continuous normalization mechanism replaces layer normalization. The encoder extracts features from the input time series, and the decoder uses the extracted feature information to predict the time-series data, which are then fed into a linear fully connected layer to obtain the final prediction output. In addition, the training framework updates the model through an improved elastic weight consolidation algorithm called TS-EWC, enabling the model to continuously learn new knowledge and enhance forecasting accuracy. The framework of the Increformer for time series data forecasting is shown in Figure 1.

Figure 1.

The model structure of increformer.

The overall workflow of the proposed framework can be summarized as follows. The input embedding module maps raw hydrodynamic variables and timestamp information into a common feature space. The encoder-decoder backbone then models temporal dependencies and cross-variable interactions through continual multi-head attention. Continual normalization stabilizes hidden representations when the statistics of the incoming stream change, and TS-EWC constrains parameter updates when the novelty buffer triggers online retraining. In this way, feature extraction, statistical stabilization, and knowledge retention are handled within one unified online framework.

3.1 Input embedding

The input embedding in the model use settings like Informer,³⁴ and it includes three parts: data embedding, positional embedding and timestamp embedding. A one-dimensional convolutional network with a convolution kernel of 3 is used to map sequence $X_{i n}$ from the $d_{i n}$ -dimensional input space to the $d_{m o d e l}$ -dimensional model space to align the data dimensions of the model, with a numerical encoding vector obtained. Meanwhile, in order for the model to capture the relative position information of the input sequence, positional encoding of $X_{i n}$ is added to obtain the sequential features of the input. In addition to the data values, the corresponding timestamp is also part of the input feature, so the encoding of the timestamp is used to introduce time global characteristics into the input. The timestamp of the time series generally contains information such as year, month, day, week, etc. Encode numeric values of the sequence and its position, and after fusion of timestamp encoding, the final input encoding vector is obtained as

x_{e m b} = d a t a_{e m b} + p o s_{e m b} + t i m e_{e m b}

(1)

where

d a t a_{e m b}

p o s_{e m b}

and

t i m e_{e m b}

are the numeric encoding of the sequence, its position encoding, and its timestamp encoding.

3.2 Encoder and decoder

The encoder of Increformer consists of multiple same layers tack composition, and each layer includes three sublayers, respectively, i.e., Continual Multi-Head Attention (ConMHA), feed-forward neural network(FFN) and continuous normalization mechanism (CN). The original input sequence $X_{i n}$ is converted by the input encoding module to obtain the input sequence $X_{e n}$ of the encoder.

The decoder of Increformer, similar to the encoder, is composed of multiple identical layers. However, in the sublayers, a new group has been added with Mask attention module, which assign different weights to the intermediate result Q of the decoder according to the outputs K and V of the encoder to calculate the degree of correlation between K and Q. The decoder also uses continuous normalization (CN) to process the data between the sub-layers of the network model. The input vector to the decoder is:

X_{d e} = C o n c a t (X_{t o k e n}, X_{0})

(2)

where

X_{t o k e n}

is the initial input of the historical sequence of length

T_{t o k e n}

, and

X_{0}

is the sequence placeholder of predicted length H, padded with zeros.

C o n c a t ()

represents the concatenation of elements. After

X_{d e}

is input to the decoder, it passes through a series of network layer calculations, and the output length is finally adjusted through a linear fully connected layer to obtain the final prediction result.

3.3 Continual multi-head attention mechanism

The self-attention mechanism in the Transformer can model relationships among data points and retain important interactions between elements. The self-attention mechanism uses query-key-value(QKV), by which the similarity between each element and other elements is calculated to extract temporal dependencies:

\begin{aligned} A t t (Q, K, V) = D^{- 1} A V \end{aligned}

(3)

\begin{aligned} A & = e x p (Q K^{T} / \sqrt{d}), D = d i a g (A \cdot I_{L}) \end{aligned}

(4)

where

Q \in R^{L \times d_{m o d e l}}, K \in R^{L \times d_{m o d e l}}, V \in R^{L \times d_{m o d e l}}

, and

d_{m o d e l}

denotes the dimension of the hidden layer.

I_{L}

is a column vector whose elements are all 1. It is used here as an element

e x p (\cdot)

. The time complexity of the model is

O (n L^{2})

and the space complexity is

O (L^{2})

. L and n denote the observation length and observation dimension of the time-series data, respectively.

However, conventional multi-head attention is primarily designed for static or offline settings, where the data distribution is assumed to remain relatively stable. In streaming and incremental scenarios, newly arriving data often exhibit evolving patterns and distribution shifts, which cannot be effectively captured by standard attention mechanisms. Moreover, directly applying conventional multi-head attention may lead to an overemphasis on historical information while lacking sufficient adaptability to recent changes. Motivated by these limitations, we propose a continual multi-head attention mechanism that explicitly accounts for temporal dynamics, enabling the model to continuously adapt to new data while preserving previously learned representations.

For online time-series forecasting, Increformer calculates through step-by-step continuous update to adapt to the sequential arrival of the data stream. At each time step, $Q, K, V$ are updated by discarding the oldest $t o k e n$ and adding new $t o k e n$ in a first-in and first-out manner. In this case, the attention score can be calculated from the cached historical results and the latest query, key and value ( $q_{n e w}, k_{n e w}, v_{n e w} \in R^{1 \times n}$ ). The value of the previous time step is cached and is updated as follows, with $d_{m e m}$ set to be $d_{m e m} = A_{p r e}^{(2 : L)} I_{L - 1}$ , and update as:

\begin{aligned} d^{(1 : L - 1)} & = d_{m e m}^{(2 : L)} - e x p (\frac{Q_{m e m} k_{o l d}^{T}}{\sqrt{d_{m o d e l}}}) \end{aligned}

\begin{aligned} + e x p (\frac{Q_{m e m} k_{n e w}^{T}}{\sqrt{d_{m o d e l}}}) \end{aligned}

(5)

\begin{aligned} d^{(L)} & = e x p (q_{n e w} \frac{C o n c a t (K_{m e m}, k_{n e w})}{\sqrt{d_{m o d e l}}}) I_{L} \end{aligned}

(6)

where

Q_{m e m}

and

K_{m e m}

are the query and key value of the previous cache

t o k e n

respectively, and

k_{o l d}

is the

t o k e n

key value of the previous step L.

A V

, the product of A and V, can be updated together based on

A V_{m e m}

as part of the continuous self-attention mechanism, as follows:

\begin{aligned} A V^{(1 : L - 1)} & = A V_{m e m}^{(2 : L)} - e x p (\frac{Q_{m e m} k_{o l d}^{T}}{\sqrt{d_{m o d e l}}}) v_{o l d} \end{aligned}

\begin{aligned} + e x p (\frac{Q_{m e m} k_{n e w}^{T}}{\sqrt{d_{m o d e l}}}) v_{n e w} \end{aligned}

(7)

\begin{aligned} A V^{(L)} & = e x p (q_{n e w} \frac{C o n c a t (K_{m e m}, k_{n e w})}{\sqrt{d_{m o d e l}}}) C o n c a t (V_{m e m}, v_{n e w}) \end{aligned}

(8)

The final continual attention output is described as:

C o n A t t (q_{n e w}, k_{n e w}, v_{n e w}) = d^{- 1} ⊙ A V

(9)

It can be seen that the time complexity and space complexity of the improved continual attention are both $O (n L)$ at each time step. Thus, the improved continual attention mechanism is more suitable for streaming computing and can also greatly reduces the computational overhead.

In the actual operation process, under the multi-head attention mechanism, multiple different and V are input and projected to different subspaces respectively. Such a mechanism can enhance the ability of the network model to capture the dependencies between different dimensions of the sequence. Given a new set of queries-key-values-values, the continual multi-head attention mechanism (ConMHA) can be described as:

\begin{aligned} C o n M H A (q, k, v) & = C o n c a t (h e a d_{1}, h e a d_{2}, \dots, h e a d_{h}) W_{O} \end{aligned}

(10)

\begin{aligned} h e a d_{i} & = C o n A t t (q W_{Q}^{i}, k W_{K}^{i}, v W_{V}^{i}) \end{aligned}

(11)

where

W_{Q}^{i} \in R^{d_{m o d e l} \times d_{k / h}}, W_{K}^{i} \in R^{d_{m o d e l} \times d_{k / h}}, W_{V}^{i} \in R^{d_{m o d e l} \times d_{k / h}}, and W_{O} \in R^{d_{v} \times d_{m o d e l}}

3.4 Continual normalization mechanism

Transformer uses layer normalization (LN), which is suitable for processing sequence data with varying lengths. It operates on different features of a single sample and normalizes each dimension of a single sample. Group normalization (GN)³⁵ normalizes different features of a single sample by grouping them together, and LN is essentially the GN when the group number is 1. Batch normalization (BN)³⁶ normalizes the same feature across different samples, which helps the network training converge faster and achieve better performance. The standard normalization is implemented as follows:

y = γ * \frac{x - μ}{\sqrt{σ^{2} + ϵ}} + β

(12)

where

μ

and

σ^{2}

are the mean and variance of the input features,

ϵ

is a constant added to avoid division by zero, and

γ

and

β

are affine transformation parameters, whose values are learned by the optimization algorithms through back propagation s during the training process. The initial values are set to

γ

= 1,

β

= 0, and their values after the affine transformation are determined by the specific dataset. The calculations of the mean and variance in both BN and GN are given in Equations (13) to (15):

\begin{aligned} μ_{B N} & = \frac{1}{B H W} \sum_{b = 1}^{B} \sum_{ω = 1}^{W} \sum_{h = 1}^{H} x_{b c w h}, \end{aligned}

\begin{aligned} σ_{B N}^{2} & = \frac{1}{B H W} \sum_{b = 1}^{B} \sum_{ω = 1}^{W} \sum_{h = 1}^{H} (x_{b c w h} - μ_{B N})^{2} \end{aligned}

(13)

\begin{aligned} x_{b g k w h}^{^{'}} \leftarrow x_{b c w h}, w h e r e k = * \frac{C}{G} \end{aligned}

(14)

\begin{aligned} μ_{G N}^{(g)} & = \frac{1}{m} \sum_{k = 1}^{K} \sum_{ω = 1}^{W} \sum_{h = 1}^{H} x_{b g k w h}^{^{'}}, \end{aligned}

\begin{aligned} σ_{G N}^{2} & = \frac{1}{m} \sum_{k = 1}^{K} \sum_{ω = 1}^{W} \sum_{h = 1}^{H} (x_{b g k h w}^{^{'}} - μ_{G N}^{(g)})^{2} \end{aligned}

(15)

For time-series scenarios, BNs can easily disrupt the correlations in the temporal dimension. Thus, in this work, the advantages of GN and BN are combined to propose a continual normalization (CN), which is suitable for data incremental scenarios. It works as follows. First, spatial normalization is performed on the feature map using KNN, and the features are then further normalized through BN. The calculation of CN can be described as follows:

x_{G N} \leftarrow G N_{1, 0} (x); x_{C N} \leftarrow γ B N_{1, 0} (x_{G N}) + β

(16)

In the scenarios of incremental learning, since data arrives in sequence, CN replaces the mini-batch mean and variance involved in Equation (17) with estimated global values from the training process. It can be described as follows:

μ \leftarrow μ + η (μ_{B} - μ), σ \leftarrow σ + η (σ_{B} - σ)

(17)

3.5 Incremental learning algorithm TS-EWC

The EWC incremental learning algorithm is a method of selective regularization of network parameters. It uses the Fisher information matrix to determine the directions in parameter space that are critical for performing the learning task. By keeping important parameters close to their historical values, it identifies weights that are particularly important for a given task, and correspondingly reduces their learning rates. Therefore, parameters in other directions can be updated more freely without forgetting the directions of the learned task.

Based on Bayes’ rule, assuming that the input data set is D. With the obtained conditional probability $p (θ | D)$ from the prior probability $p (θ)$ of the parameters and the probability $p (D | θ)$ of the data, we can get:

l o g p (θ | D) = l o g p (D | θ) + l o g p (θ) - \log p (D)

(18)

If the parameters are given, then its log probability can be defined as a negative loss function. Assume that the data is divided into two parts that are independent of each other, denoted by $D_{A}$ and $D_{B}$ , respectively. The above formula can be modified to be:

l o g p (θ | D) = l o g p (D_{B} | θ) + l o g p (θ | D_{A}) - \log p (D_{B})

(19)

In incremental learning scenarios, the left side of Equation (19) describes the posterior probability of the parameters for the entire dataset, while the right side depends solely on the loss function of Task B. All the information from Task A must be absorbed into the posterior distribution $p (θ | D_{A})$ . However, the posterior probability is difficult to calculate accurately. Thus, we can approximate the true posterior probability by using the parameters $θ_{A}^{*}$ and the precision from the diagonal of the Fisher Information Matrix, which describes a Gaussian distribution. The posterior probability can be used to represent each sub-task. Given an approximate value of the posterior probability, the loss function $L (θ)$ is minimized in EWC as follows:

L (θ) = L_{B} (θ) + \sum_{i} \frac{λ}{2} F_{i} (θ_{i} - θ_{A, i}^{*})^{2}

(20)

where

L_{B} (θ)

it is only the loss of the task B, and the label of each parameter i is

λ

, which indicates the importance of the old task compared to the new task.

In the Increformer framework, the model update method and triggering conditions play a crucial role. Due to the high noise and non-stationary characteristics of ship motion data, the model needs to incrementally learn knowledge from new data when the data distribution changes. For online time series forecasting tasks, we improved and optimized the EWC algorithm and propose in this work a time series elastic weight consolidation algorithm TS-EWC). Specifically, every time when the data distribution changes, the new data is treated as a new task for learning. Over time, EWC maintains a penalty term for each historical task generated by the data stream. The number of penalties increases linearly with the number of tasks, thus resulting in significant computational overhead. Each new task is derived by applying a penalty term to the previous task. Therefore, when updating the model, we only need to maintain the previous penalty term and perform a weighted summation of the Fisher matrices generated from the preceding historical tasks. The loss function in TS-EWC is given by:

L (θ) = L_{D} (θ) + \sum_{i} \frac{λ}{2} \sum_{d < D} λ_{d} F_{t, i} (θ_{i} - θ_{D - 1, i}^{*})^{2}

(21)

where

L_{D} (θ)

represents the loss for the current task. The triggering condition for updating the model is controlled by a hyperparameter. Two buffers are established to mark the novelty and familiarity of the data, after the data is compared with the threshold, i.e., mean squared error. The novelty buffer detects changes in the probability distribution of the data stream and triggers the training and updating of the model. After the update, the threshold is dynamically adjusted. The familiarity buffer, on the other hand, contains information about the model's familiarity with the data, allowing for testing whether the model retains old knowledge after updates, and enabling fast and accurate predictions if repetitive patterns appear in future data streams.

4 Results and analysis

To comprehensively evaluate the ability of the model proposed in this study to forecast ship performance under wave conditions, experiments were conducted on three different ship navigation time-series datasets. The selected ship types include the widely recognized Wigley-III model ship in the shipbuilding field, the KCS container ship, and the C60 cargo ship. The hydrodynamic performance data for the Wigley-III and C60 ship types originate from physical model experiments conducted at the Iowa Institute of Hydraulic Research (IIHR). In contrast, data for the KCS ship type were obtained through simulation. Initially, a three-dimensional model was created in the Maxsurf simulation software to acquire detailed ship parameters and lines drawings. Subsequently, these ship data were imported into the CFD simulation software Fluent for motion simulation and drag calculation of the ship in oblique waves. The experiment collected time-series data of the ship under oblique wave navigation conditions, including ship parameters and environmental factors such as wind speed, wave height, and wave period, with a data collection frequency set to once every 5 s, over a continuous duration of 10 h. Based on this historical data, the model is capable of predicting key performance indicators of the ship in ocean waves, such as the roll angle, pitch angle, heave value, and total drag coefficient. Taking the KCS as an example, the KCS is a 3600 TEU (Twenty-foot Equivalent Unit) container ship, which is a standard ship type commonly used internationally. The geometric model of the ship's hull is depicted in Figure 2. Table 1 lists the main dimensional parameters of the KCS ship type. Through these experiments, the aim is to validate the model's capability to effectively predict ship performance in complex marine environments.

Figure 2.

Geometric schematic of the KCS container ship.

Table 1.

Main Dimensions of the KCS Container Ship.

Main dimensional parameters	Value
Length Between Perpendiculars (m)	230
Beam (m)	32.2
Depth (m)	19.0
Draft (m)	10.8
Wetted Surface Area (m2)	9556
Block Coefficient	0.6505
Froude Number	0.26

To eliminate the influence of different units of measurement across the various feature dimensions of the data, the sequential data were standardized. The dataset was divided into those for warm-up phase, validation phase, and online prediction phases in a ratio of 2:1:7. The model was pre-trained during the warm-up and validation phases, while during the online prediction phase, the dataset is sequentially input into the model to simulate a continuously growing data stream. The review window length was set to 60, and the prediction window was set with a prediction step of 12. After each round of prediction, the actual values were compared with the predicted values. The experimental model was built and calculated on an Nvidia Tesla T4 GPU. The Increformer consists of an encoder with N = 2 and a decoder with M = 1. An Adam optimizer with a learning rate of 0.0001 was used during the warm-up phase, and the mean squared error (MSE) was chosen as the loss function. During the online prediction phase, the TS-EWC algorithm is used to compute the loss function and perform incremental updates.

In regression tasks and time series forecasting, the following evaluation metrics are commonly used. Mean Squared Error (MSE) is a statistical measure that assesses the accuracy of predictions. It is calculated as the average of the squares of the differences between observed and predicted values. Minimizing MSE enhances the precision and reliability of forecasts, so it is a crucial metric for evaluating a model's predictive performance. The MSE is given by:

M S E = \frac{1}{N} \sum_{n = 1}^{N} ({\hat{y}}_{n} - y_{n})^{2}

(22)

where

y_{n}

is the actual value,

{\hat{y}}_{n}

is the predicted value, and n is the number of observations.

Mean Absolute Error (MAE), also referred to as the L1 norm loss, is a widely utilized metric for measuring the error in regression prediction. It evaluates the model's predictive performance by averaging the absolute differences between the actual and predicted values. The MAE is given by:

M A E = \frac{1}{N} \sum_{n = 1}^{N} | {\hat{y}}_{n} - y_{n} |

(23)

where

y_{n}

and

{\hat{y}}_{n}

are the same as defined above.

The smaller the values of the evaluation metrics MAE and MSE, the smaller the difference between the predicted and actual values, indicating that the model's predictive results are more accurate.

4.1 Ship performance forecast analysis

To validate the effectiveness of the method proposed in this paper, a comparative analysis was conducted on time-series datasets of ship navigation in ocean waves for three types of ship models: the Wigley-III, KCS container ship, and C60 cargo ship. Several mainstream deep learning and incremental learning prediction methods were selected for comparative analysis in the context of ship performance prediction. Historical data, including the main dimensions of the ship and external environmental conditions during navigation, were selected as input sequences for the model to predict the roll angle, pitch angle, heave displacement, and total resistance coefficient of the ship during its oscillatory motion in waves.

In this section of the experiment, we employed a variety of advanced deep learning methods for comparison to evaluate the performance of the proposed model. These methods include: LSTM, which effectively addresses long-term dependency issues with its gated mechanism, preventing gradient disappearance or explosion; TCN, which utilizes a causal convolutional structure to capture local patterns, ensuring that predicted future data is based solely on past information; and Transformer, which employs an encoder-decoder architecture and attention mechanism to capture global dependency relationships, using masks during the decoding process to prevent information leakage. Additionally, several ensemble incremental learning methods were also tested, such as IncLSTM and DWE-IL, which enhance the accuracy and computational efficiency of predictions by integrating multiple models. IncLSTM is based on LSTM and employs the Tradaboost method for ensemble learning, while DWE-IL is based on ELMK and utilizes an RBF Kernel as the kernel function. These methods are capable of continuously updating the model during the online prediction phase to adapt to changes in new data. Lastly, ER and DER++ incremental learning methods based on the replay mechanism were also employed. To ensure the robustness of the experimental results, 20 independent prediction experiments were conducted for each model, and the mean values of MSE and MAE metrics were calculated.

In the experimental setup, we selected appropriate loss functions and optimizers, and configured suitable network structures and parameters for each model. For instance, the LSTM model was configured with 50 hidden layer units and a ReLU activation function, while the TCN consisted of residual modules and a fully connected layer, with the number of filters and the size of the convolutional kernel carefully chosen. To compare with the algorithms presented above, the ER and DER++ methods used in the experiment both employs Transformer as the backbone network for online prediction. Both methods maintain the same memory buffer size as Increformer to review old samples. This design helps the model to retain previously learned knowledge while continuously receiving new data.

During the online prediction phase, the accuracy of different models for ship performance prediction was compared. These models include not only traditional LSTM, TCN, and Transformer but also ensemble incremental learning methods such as IncLSTM and DWE-IL, as well as online learning baseline models ER and DER++. These methods handle and forecast the hydrodynamic performance of ships under wave conditions through different mechanisms, providing us with a comprehensive benchmark for comparison. The specific experimental results of the aforementioned models for ship performance forecasting are shown in Tables 2 –4.

Table 2.
Comparative experimental results on wigley-III.

Performance Model MSE MAE

Roll pitch LSTM 0.9461 0.4349

TCN 0.9592 0.4062

Transformer 0.8946 0.3954

IncLSTM 0.8153 0.3984

DWE-IL 0.7468 0.3845

Transformer-ER 0.7196 0.3634

Transformer-DER++ 0.6682 0.3572

Increformer + 0 . 6527 0.3329

Pitch LSTM 1.0461 0.5389

TCN 0.9892 0.5042

Transformer 0.9446 0.4261

IncLSTM 0.9164 0.3973

DWE-IL 0.8461 0.3915

Transformer-ER 0.7644 0.3544

Transformer-DER++ 0.7389 0.3536

Increformer + 0.7041 0.3421

Heave LSTM 1.1261 0.4494

TCN 0.9916 0.4647

Transformer 0.9748 0.3914

IncLSTM 0.9148 0.3959

DWE-IL 0.8618 0.3846

Transformer-ER 0.7647 0.3564

Transformer-DER++ 0.7044 0.3345

Increformer + 0.6913 0.3247

Total Resistance Coefficient LSTM 1.1056 0.4794

TCN 0.9162 0.4648

Transformer 0.8941 0.3941

IncLSTM 0.9116 0.4016

DWE-IL 0.8419 0.3912

Transformer-ER 0.8275 0.3748

Transformer-DER++ 0.7648 0.3644

Increformer + 0.7583 0.3567

Performance	Model	MSE	MAE
Roll pitch	LSTM	0.9461	0.4349
TCN	0.9592	0.4062
Transformer	0.8946	0.3954
IncLSTM	0.8153	0.3984
DWE-IL	0.7468	0.3845
Transformer-ER	0.7196	0.3634
Transformer-DER++	0.6682	0.3572
Increformer +	0 . 6527	0.3329
Pitch	LSTM	1.0461	0.5389
TCN	0.9892	0.5042
Transformer	0.9446	0.4261
IncLSTM	0.9164	0.3973
DWE-IL	0.8461	0.3915
Transformer-ER	0.7644	0.3544
Transformer-DER++	0.7389	0.3536
Increformer +	0.7041	0.3421
Heave	LSTM	1.1261	0.4494
TCN	0.9916	0.4647
Transformer	0.9748	0.3914
IncLSTM	0.9148	0.3959
DWE-IL	0.8618	0.3846
Transformer-ER	0.7647	0.3564
Transformer-DER++	0.7044	0.3345
Increformer +	0.6913	0.3247
Total Resistance Coefficient	LSTM	1.1056	0.4794
TCN	0.9162	0.4648
Transformer	0.8941	0.3941
IncLSTM	0.9116	0.4016
DWE-IL	0.8419	0.3912
Transformer-ER	0.8275	0.3748
Transformer-DER++	0.7648	0.3644
Increformer +	0.7583	0.3567

The best results are highlighted in bold, and the same notation is adopted in the following tables.

Table 3.

Comparative experimental results on KCS.

Performance	Model	MSE	MAE
Roll pitch	LSTM	0.8615	0.3914
	TCN	0.8137	0.3617
	Transformer	0.7429	0.3293
	IncLSTM	0.6776	0.3376
	DWE-IL	0.7191	0.3266
	Transformer-ER	0.6825	0.3127
	Transformer-DER++	0.6407	0.3088
	Increformer +	0 . 6312	0.2974
Pitch	LSTM	1.0448	0.5347
	TCN	0.9878	0.5017
	Transformer	0.9618	0.4292
	IncLSTM	0.9516	0.3916
	DWE-IL	0.9101	0.3845
	Transformer-ER	0.8847	0.3625
	Transformer-DER++	0.8756	0.3498
	Increformer +	0.8573	0.3456
Heave	LSTM	1.1245	0.4476
	TCN	0.9945	0.4379
	Transformer	0.9725	0.4391
	IncLSTM	0.9149	0.4259
	DWE-IL	0.8636	0.3946
	Transformer-ER	0.7624	0.3418
	Transformer-DER++	0.7158	0.3334
	Increformer +	0.6974	0.3268
Total Resistance Coefficient	LSTM	1.2405	0.4471
	TCN	0.9842	0.4248
	Transformer	0.8961	0.3986
	IncLSTM	0.8421	0.3951
	DWE-IL	0.8109	0.3734
	Transformer-ER	0.7835	0.3645
	Transformer-DER++	0.7832	0.3461
	Increformer +	0.7643	0.3328

Table 4.

Comparative experimental results on C60.

Performance	Model	MSE	MAE
Roll pitch	LSTM	0.9407	0.4314
	TCN	0.9544	0.4045
	Transformer	0.8927	0.3927
	IncLSTM	0.8127	0.3965
	DWE-IL	0.7446	0.3829
	Transformer-ER	0.7267	0.3476
	Transformer-DER++	0.6595	0.3404
	Increformer +	0 . 6503	0.3328
Pitch	LSTM	1.0405	0.5335
	TCN	0.9887	0.5023
	Transformer	0.9431	0.4246
	IncLSTM	0.9151	0.3961
	DWE-IL	0.8450	0.3903
	Transformer-ER	0.7852	0.3537
	Transformer-DER++	0.7444	0.3456
	Increformer +	0.7317	0.3394
Heave	LSTM	1.1461	0.4476
	TCN	0.9861	0.4617
	Transformer	0.9776	0.3949
	IncLSTM	0.9134	0.3967
	DWE-IL	0.8619	0.3894
	Transformer-ER	0.7448	0.3516
	Transformer-DER++	0.7357	0.3384
	Increformer +	0.7254	0.3273
Total Resistance Coefficient	LSTM	1.1147	0.4774
	TCN	0.9642	0.4628
	Transformer	0.8792	0.3934
	IncLSTM	0.9015	0.4062
	DWE-IL	0.8478	0.3946
	Transformer-ER	0.7847	0.3647
	Transformer-DER++	0.7629	0.3623
	Increformer +	0.7447	0.3489

Based on the experimental results illustrated in Table 2, it can be observed that on the Wigley-III ship model dataset, the proposed Increformer model has demonstrated significant performance enhancements in predicting key parameters of ship motion—roll angle, pitch angle, heave value, and total resistance coefficient—compared to the other tested models. Specifically, compared to the ER model, the Increformer showed reductions of 9.2%, 7.8%, 9.5%, and 8.3% in the MSE metric, respectively. These improvements significantly highlight the role of the continuous normalization mechanism in facilitating stable model learning. The continuous attention mechanism of the Increformer model enables rapid adaptation to new data during the online prediction phase, while the TS-EWC method ensures the long-term consolidation and memory of new knowledge, effectively addressing the characteristics of time-series data and thus performing admirably in ship performance forecasting. When compared with the DER++ method, the Increformer also exhibited superiority in the MSE metric on the Wigley-III ship model dataset, with reductions of 2.3%, 4.8%, 1.8%, and 0.8%, respectively. The performance of the Increformer was particularly outstanding in predicting the roll angle and total resistance coefficient. This indicates that the incremental learning strategy of the Increformer is more effective overall, especially in the prediction of roll motion, where the continuous attention mechanism effectively captures short-term data fluctuations and extracts temporal dependencies, achieving precise forecasting of roll motion. Additionally, in terms of rapid ship performance forecasting, the model can effectively extract features of new data, predict the long-term trend of the total resistance coefficient, and demonstrate better predictive performance.

Furthermore, the Increformer model, by integrating the Transformer architecture and the EWC (Elastic Weight Consolidation) method, has further enhanced its capability to process time-series data. The introduction of the continuous attention mechanism and continuous normalization mechanism allows the model to more effectively capture trends and periodicity in time-series data related to ship performance. The application of the time-series Elastic Weight Consolidation algorithm, TS-EWC, an improvement based on EWC, enables the model to better balance the learning of new and old knowledge during the incremental learning process, dynamically adjust model weights, and efficiently and accurately forecast the hydrodynamic performance of ships. These advantages have increased the efficiency and accuracy of the Increformer model in ship performance forecasting tasks.

Based on the results in Table 3, it can be observed that on the KCS ship model dataset, the proposed Increformer model achieved superior results in predicting ship motion in waves, including key performance indicators such as the roll angle, pitch angle, heave value, and total resistance coefficient. Compared with the benchmark Transformer model, the Increformer model demonstrated significant improvements in the MSE metric, with reductions of 15%, 10.8%, 28.2%, and 14.7%, respectively. These results reflect the effectiveness of the incremental learning strategy.

Furthermore, the Increformer model demonstrated more significant performance prediction capabilities for the KCS ship type. This is attributed to the bulbous bow structure of the KCS, which effectively mitigates the impact of waves on the hull, leading to more stable resistance fluctuations. At the same time, due to the anti-rolling devices such as the bilge keel, the amplitude of the rolling motion is also smaller compared to the Wigley-III type. Therefore, the periodicity of the ship's hull shape data is more pronounced, the data noise is smaller, and the model can extract features more effectively. The Models based on the Transformer framework outperformed those based on LSTM and TCN frameworks in terms of predictive performance. This is because the Transformer's attention mechanism has superior feature extraction capabilities for time series data, effectively capturing both long-term trends and short-term fluctuations. On the KCS dataset, due to the high stability of the data, the difference in performance prediction between incremental learning methods and traditional deep learning methods is not significant. This may be because, with limited data in the new mode, the advantages of the incremental learning strategy have not yet been fully demonstrated. However, overall, models based on incremental learning strategies still outperform traditional deep learning models in terms of prediction performance.

As shown in Table 4, the Increformer model has a significant advantage over the Transformer baseline model. Other incremental learning models based on the Transformer architecture, such as ER and DER++, also performed well, indicating that Transformer-based models are more effective than TCN for time-series forecasting. The proposed model effectively leverages the representation capability of the Transformer and integrates incremental learning strategies to enhance the ability to capture dependencies within sequential data, continuously absorbing new trends and fluctuation patterns from incoming data. On the C60 ship model dataset, Increformer achieved the best forecasting accuracy across all evaluated metrics.

Figure 3 displays the roll motion prediction results of the proposed model and the comparative models on the KCS ship model. It can be observed that deep learning methods performed poorly when predicting new data with different distributions, sometimes showing opposite trends or large fluctuations at inflection points. In contrast, incremental learning methods can improve model weights and structure through timely learning, leading to better predictions for new patterns. Among them, the Increformer shows the best curve fitting effect, with predictions closest to the actual values and better fits than those of the other models.

Figure 3.

Prediction results of ship roll motion on the KCS dataset.

4.2 Ablation experimental results

To verify the role and effectiveness of each functional module of the proposed Increformer model in ship performance forecasting, ablation experiments were conducted on the KCS ship model dataset. Variant models were obtained by removing different modules from the model, and comparative analysis was performed based on the calculation of evaluation metrics MAE (Mean Absolute Error) and MSE (Mean Squared Error). Initially, the continuous attention module was replaced with other advanced attention modules, including LogSparse attention and ProbSparse attention, and the corresponding comparison results are presented in Table 5. Subsequently, the continuous normalization (CN) module was replaced with layer normalization (LN), and the comparison results are depicted in Table 6. Finally, the incremental training method was switched to EWC, ER, and DER++, with the experimental comparison results shown in Table 7. During the process of replacing the various modules, all other parts were kept constant.

Table 5.
Ablation experimental results of different attention mechanisms on the KCS ship model.

Performance Attention MSE MAE

Roll LogSparse attention 0.6462 0.2988

ProbSparse attention 0.6335 0.2998

Continual attention 0 . 6312 0.2974

Pitch LogSparse attention 0.8602 0.3493

ProbSparse attention 0.8591 0.3472

Continual attention 0.8573 0.3456

Heave LogSparse attention 0.7121 0.3299

ProbSparse attention 0.7039 0.3285

Continual attention 0.6974 0.3268

Total Resistance Coefficient LogSparse attention 0.7769 0.3467

ProbSparse attention 0.7696 0.3441

Continual attention 0.7643 0.3328

Performance	Attention	MSE	MAE
Roll	LogSparse attention	0.6462	0.2988
ProbSparse attention	0.6335	0.2998
Continual attention	0 . 6312	0.2974
Pitch	LogSparse attention	0.8602	0.3493
ProbSparse attention	0.8591	0.3472
Continual attention	0.8573	0.3456
Heave	LogSparse attention	0.7121	0.3299
ProbSparse attention	0.7039	0.3285
Continual attention	0.6974	0.3268
Total Resistance Coefficient	LogSparse attention	0.7769	0.3467
ProbSparse attention	0.7696	0.3441
Continual attention	0.7643	0.3328

Table 6.

Ablation experimental results of different normalization methods on the KCS ship model.

Performance	Normalization	MSE	MAE
Roll	LN	0.6364	0.2988
Roll	CN	0 . 6312	0.2974
Pitch	LN	0.8583	0.3476
Pitch	CN	0.8573	0.3456
Heave	LN	0.6989	0.3279
Heave	CN	0.6974	0.3268
Total Resistance Coefficient	LN	0.7668	0.3354
Total Resistance Coefficient	CN	0.7643	0.3328

Table 7.

Ablation experimental results of different incremental strategies on the KCS ship model.

Performance	Model	MSE	MAE
Roll	Increformer-ER	0.6360	0.3079
	Increformer-DER++	0.6365	0.3058
	Increformer-EWC	0.6348	0.3045
	Increformer	0 . 6312	0.2974
Pitch	Increformer-ER	0.8621	0.3533
	Increformer-DER++	0.8590	0.3381
	Increformer-EWC	0.8576	0.3364
	Increformer	0.8573	0.3456
Heave	Increformer-ER	0.7041	0.3329
	Increformer-DER++	0.7019	0.3304
	Increformer-EWC	0.6984	0.3277
	Increformer	0.6974	0.3268
Total Resistance Coefficient	Increformer-ER	0.7735	0.3427
	Increformer-DER++	0.7706	0.3381
	Increformer-EWC	0.7657	0.3341
	Increformer	0.7643	0.3328

The results for the attention mechanisms presented in Table 5 indicate that the continuous attention mechanism used in the Increformer model outperformed the sparse attention mechanisms in predicting the roll angle, pitch angle, heave value, and total resistance coefficient, with the MSE decreasing by 10.1%, 8.0%, 6.0%, and 8.4% respectively, and the MAE decreasing by 2.2%, 0.3%, 1.8%, and 2.7% respectively. According to the experimental results displayed in Table 7, it is clear that the continuous attention mechanism adopted in the Increformer model surpassed other attention mechanisms in terms of performance. This significant improvement confirms that the proposed continuous attention module is particularly suitable for online prediction environments and the processing of streaming data. The mechanism can effectively capture new trends and features in the data, thereby enhancing the response speed of the model to new information and the accuracy of its predictions.

Additionally, the experimental results presented in Table 6 show an increase in the error metrics of the model when the continuous normalization mechanism is removed. This result further confirms the significant role of the continuous normalization module in data stabilization and in facilitating the continuous learning of new knowledge by the model. By mitigating the impact of noise and outliers in the data, this module contributes to the model's more stable learning from continuously incoming data.

According to Table 6, regarding the substitution of incremental learning methods, it is evident that both EWC (Elastic Weight Consolidation) and TS-EWC (Time-Series Elastic Weight Consolidation) exhibit favorable predictive performance. However, TS-EWC demonstrates lower computational overhead and higher model updating efficiency. Compared to EWC, TS-EWC performs better as the occurrence of new data patterns increases, that is, when the number of tasks for network parameters to learn is continuously growing. In summary, the various modules of the Increformer model reduce computational costs and enhance the model's ability to capture associations in both the temporal and feature dimensions of the data. This effectively improves the timeliness and accuracy of ship performance forecasting.

Furthermore, the ablation results reveal that the performance gains arise not only from individual modules but also from their strong interdependence. Continuous normalization stabilizes feature distributions, enabling continuous attention to better capture evolving temporal patterns, while TS-EWC regulates parameter updates to balance adaptation and knowledge retention. When any component is replaced, this coordination is weakened, leading to degraded performance.

5 Conclusion

To enhance the accuracy of ship performance forecasting under wave conditions, an online prediction model called Increformer, which is based on incremental learning algorithms and the Transformer framework, has been proposed to address the issue of deteriorating online forecasting performance for ship hydrodynamics as prediction time increases and new data emerges. The model incorporates a continuous self-attention mechanism to explore the temporal dependencies between feature variables and employs a continuous normalization mechanism to provide a data stabilization scheme for the model under incremental scenarios. Meanwhile, the model ensures dynamic updates based on the TS-EWC (Time-Series Elastic Weight Consolidation) algorithm for more effective forecasting of new data. Experimental results demonstrate that Increformer significantly outperforms existing classic deep learning and incremental learning methods such as LSTM, Transformer, Informer, IncLSTM, DWE-IL, OnlineTCN, ER, and DER++ in terms of predictive performance and fit. Additionally, the model's universality is validated through experimental results on various ship type datasets.

Future work will focus on several directions to further improve Increformer. First, additional influential factors, such as wind speed, wave height, and wave period, will be incorporated to enhance predictive accuracy. Second, the parameter configuration and computational efficiency of the model will be optimized to improve scalability and support real-world deployment. Finally, interpretable optimization techniques will be integrated to provide more practical insights for ship performance forecasting.

Footnotes

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China, (grant number 61672263, 62272202) and National Science and Technology Major Project of China (No. 2025ZD1600600).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Jianhai Jin

Xing Wang

Chao Zhou

Jun Sun

References

Peri

Campana

. Variable Fidelity and Surrogate Modeling in Simulation-based Design[C]. In: 27th Symposium on Naval Hydrodynamics. Seoul, Korea. 2008.

Pedersen

Larsen

. Prediction of full-scale propulsion power using artificial neural networks[C]. In: 8th International Conference of Computer and IT Applications in the Maritime Industries, Budapest. Budapest, Hungary, 2009, pp.10–12.

Hefazi

Mizine

Schmitz

, et al. Multidisciplinary synthesis optimization process in multihull ship design [J]. Nav Eng J 2010; 122: 29–47.

Petersen

Winther

. Mining of Ship Operation Data for Energy Conservation[D]. Technical University of Denmark, 2011.

Babadi

Ghassemi

. Effect of hull form coefficients on the vessel seakeeping performance[J]. J Mar Sci Technol 2013; 21: 594–604.

Wang

Liu

. Ship rolling motion prediction based on wavelet neural network[J]. Applied Mechanics and Materials 2012; 1923: 724–728.

Tang

Yao

, et al. Prediction about the vessel’s heave motion under different sea states based on hybrid PSO_ARMA model [J]. Ocean Eng 2022; 263: 112247. ISSN 0029-8018.

Ferrandis

Triantafyllou

Chryssostomidis

. Learning functionals via LSTM neural networks for predicting vessel dynamics in extreme sea states. Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences 2021; 477: 20190897.

Ren

. Ship roll motion prediction using ConvLSTM with attention mechanism[C]. In: 2022 41st Chinese Control Conference (CCC). IEEE, 2022, pp.5616–5620.

10.

Kirkpatrick

Pascanu

Rabinowitz

, et al. Overcoming catastrophic forgetting in neural networks[J]. Proc Natl Acad Sci USA 2017; 114: 3521–3526.

11.

Aljundi

Kelchtermans

Tuytelaars

. Task-free continual learning[C]. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019, pp.11254–11263.

12.

Buzzega

Boschini

Porrello

, et al. Dark experience for general continual learning: a strong, simple baseline[J]. Adv Neural Inf Process Syst 2020; 33: 15920–15930.

13.

C-Z

Zou

Z-J

. Online prediction of ship roll motion in waves based on auto-moving gird search-least square support vector machine[J]. Math Probl Eng 2021; 2021: 366–372.

14.

Guan

Yang

Wang

, et al. Ship roll motion prediction based on ℓ1 regularized extreme learning machine[J]. PLOS ONE 2018; 13: e0206476.

15.

Yin

J-C

Perakis

Wang

. A real-time ship roll motion prediction using wavelet transform and variable RBF network[J]. Ocean Eng 2018; 160: 10–19.

16.

Hou

X-R

Zou

Z-J

Liu

. Nonparametric identification of nonlinear ship roll motion by using the motion response in irregular waves[J]. Appl Ocean Res 2018; 73: 88–89.

17.

Wei

Chen

Zhao

, et al. An ensemble multi-step forecasting model for ship roll motion under different external conditions: a case study on the South China Sea[J]. Measurement ( Mahwah N J) 2022; 201: 86–89.

18.

Mallya

Lazebnik

. Packnet: adding multiple tasks to a single network by iterative pruning[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp.7765–7773.

19.

Wang

Han

. Online sequential extreme learning machine with kernels for nonstationary time series prediction[J]. Neurocomputing 2014; 145: 90–97.

20.

Dai

. Dwe IL: a new incremental learning algorithm for non-stationary time series prediction via dynamically weighting ensemble learning[J]. Applied Intelligence 2022; 52: 174–194.

21.

Wang

Yue

. IncLSTM: incremental ensemble LSTM model towards time series data[J]. Comput Electr Eng 2021; 92: 107156.

22.

Woo

Liu

Sahoo

, et al. CoST: Contrastive learning of disentangled seasonal-trend representations for time series forecasting[J]. arXiv preprint arXiv:2202.01575, 2022.

23.

Huang

Chen

Zhang

, et al. A transferable time series forecasting service using deep transformer model for online systems[C]. In: Proceedings of the 37th IEEE/ACM international conference on automated software engineering. 2022, pp.1–12.

24.

Chen

, et al. Long sequence time-series forecasting with deep learning: a survey[J]. Inf Fus 2023; 97: 101819.

25.

Wang

Dong

, et al. Deep time series models: A comprehensive survey and benchmark[J]. arXiv preprint arXiv:2407.13278, 2024.

26.

Zeng

Chen

Zhang

, et al. Are transformers effective for time series forecasting?[C]. Proceedings of the AAAI Conference on Artificial Intelligence 2023; 37: 11121–11128.

27.

Qiu

Lin

, et al. Duet: dual clustering enhanced multivariate time series forecasting[C]. In: Proceedings of the 31st ACM SIGKDD conference on knowledge discovery and data mining V. 1. 2025, pp.1185–1196.

28.

Lopez-Paz

Ranzato

. Gradient episodic memory for continual learning[J]. Adv Neural Inf Process Syst 2017: 30.

29.

Chaudhry

Ranzato

Rohrbach

, et al. Efficient lifelong learning with a-gem[J]. arXiv preprint arXiv:1812.00420, 2018.

30.

Zeng

Chen

Cui

, et al. Continual learning of context-dependent processing in neural networks[J]. Nature Machine Intelligence 2019; 1: 364–372.

31.

Kim

, et al. A comprehensive survey of deep learning for time series forecasting: architectural diversity and open challenges[J]. Artif Intell Rev 2025; 58: 216.

32.

Wang

, et al. Chattime: a unified multimodal time series foundation model bridging numerical and textual data[C]. Proceedings of the AAAI Conference on Artificial Intelligence 2025; 39: 12694–12702.

33.

Zhao

Wang

, et al. Predictive pretrained transformer (PPT) for real-time battery health diagnostics[J]. Appl Energy 2025; 377: 124746.

34.

Zhou

Zhang

Peng

, et al. Informer: beyond efficient transformer for long sequence time-series forecasting[C]. In: Proceedings of the 2021 AAAI conference on artificial intelligence. Menlo Park: AAAI, 2021, pp.11106–11115.

35.

. Group normalization[C]. In: Proceedings of the European conference on computer vision (ECCV). 2018, pp.3–19.

36.

Ioffe

Szegedy

. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]. In: International conference on machine learning. PMLR, 2015, pp.448–456.