Active control of vehicle interior noise via attention mechanism and prioritized experience replay

Abstract

Although Deep Reinforcement Learning (DRL) has shown potential in Active Noise Control (ANC), its effectiveness is often limited by the random and non-stationary sound field inside the vehicle. The current methods generally have insufficient feature extraction, and important training samples cannot be utilized effectively. In order to solve the above shortcomings, this paper proposes an enhanced generative filter ANC method, which combines the Channel-Spatial Attention Mechanism with Prioritized Experience Replay (GFANC-RL-CP). Specifically, the system integrated with Channel-Spatial Attention Mechanism (CBAM) is used to achieve the extraction of key channel and temporal features, so as to improve the system’s ability to identify complex noise features. In addition, by deploying the Prioritized Experience Replay (PER) mechanism to improve the learning process, high-value samples can be purposefully reused. Through the verification of real vehicle data, it is found that the performance was clearly improved. The proposed method reduced the training time by about 50%. Compared with traditional FxLMS and GFANC-RL, it achieves improvements of 8–16 dB and 4–8 dB respectively, and has a good noise suppression effect in the entire frequency range of 0–8000 Hz.

Keywords

vehicle interior noise active noise control attention mechanism deep reinforcement learning prioritized experience replay

1. Introduction

With the advancement of automobile technology and people’s increasing requirements for ride comfort, noise control has become an important issue in the field of vehicle engineering. Traditional passive control technology generally has inherent physical limitations, such as bulky structure, high implementation cost and poor low-frequency performance. Therefore, it is difficult for the above methods to meet the requirements of modern vehicles for lightweight design, acoustic comfort and other aspects. Due to the above problems, Active Noise Control (ANC) is considered to be an effective solution (Kuo and Morgan, 1996; Nelson and Elliott, 1991). ANC works according to the principle of acoustic superposition and generates a secondary anti-noise signal to suppress interference. The magnitude of the signal is equal to the primary noise, but the phase is opposite. Facts prove that this technology is more effective in suppressing low-frequency noise.

In the adaptive ANC system, due to its simple structure and high computational efficiency, the Filtered-x Least Mean Square (FxLMS) algorithm (Morgan, 1980) has become the standard solution. However, the use of a fixed step size will cause a conflict between the convergence speed and the steady-state error, thus limiting the tracking ability for non-stationary noise. In order to solve the above problems, a series of revised strategies have been formulated. For example, Variable Step Size (VSS) FxLMS (Aboulnasr and Mayyas, 1997) takes into account convergence and system stability. The algorithm of Volterra series (Tan and Jiang, 2001) is used to deal with nonlinear distortion. However, most of the above methods adopt linear or shallow nonlinear models. When responding to the rapidly changing strong nonlinear sound field, the optimal parameters generally cannot be updated in real-time, so it is difficult to achieve the purpose of noise reduction (Lu et al., 2021).

Due to its good nonlinear modeling ability (Bianco et al., 2019), the deep neural network (DNN) is increasingly used in ANC to solve the problem that traditional algorithms cannot handle complex, nonlinear, and stochastic control environments well (Yao et al., 2026). Zhang and Wang (2020) proposed the deep ANC framework, which uses a Convolutional Recurrent Network (CRN) to model secondary paths and noise signals, which has achieved obvious results in broadband noise elimination. Due to the inherent strong time correlation of acoustic signals, the architecture of Long Short-Term Memory (LSTM) network has also attracted a lot of attention. These architectures use memory units to handle time-varying on-board noise well (Kwon et al., 2022; Park et al., 2019). Building on these capabilities, recent studies have further expanded the application of deep learning to complex and non-stationary acoustic environments. For instance, to mitigate highly nonlinear and transient noises, Mostafavi and Cha (2023) proposed a high-performance feedforward ANC controller incorporating LSTM and attention mechanisms, while Cha et al. (2023) developed DNoiseNet, a robust deep learning-based feedback ANC system. However, although they improve the ability of nonlinear fitting, most deep learning models will cause a large computational burden and require high storage capacity (Le and Mai, 2024). Therefore, for latency-sensitive embedded ANC systems, deploying such intensive algorithms on resource-limited hardware is still a major problem, and strict real-time performance is hard to guarantee.

In order to achieve a balance between real-time performance and nonlinear control on hardware with limited resources, fixed filter strategies are increasingly adopted owing to their computational efficiency and stability. Because traditional fixed filters often fail in complex acoustic fields, Shi et al. (2022) proposed the framework of Selective Fixed-Filter ANC (SFANC). The architecture uses a Convolutional Neural Network (CNN) to find the optimal controller from a set of pre-trained filters to achieve the purpose of reducing latency. However, the effectiveness of SFANC is limited by a finite number of pre-trained filters. Therefore, when there is a significant difference between the actual noise and the noise in training, the performance will decline rapidly. In order to solve the above problems, Luo et al. (2023, 2024b) proposed the Generative Fixed-Filter ANC (GFANC) method. This method is not based on fixed pre-trained filters, but generates corresponding control filters according to different noise types, and adopts Bayesian filtering and no-delay structure to improve the accuracy and real-time generation. However, GFANC still relies on CNN-based supervised learning. If the noise label is not accurate, it will degrade the accuracy of the filter generation and have an adverse impact on noise reduction. Inspired by the superior adaptability of Reinforcement Learning (RL) in dynamic, nonlinear acoustic environments (Li et al., 2025), Luo et al. (2024a) proposed an RL-based GFANC (GFANC-RL) strategy. The method relies on the exploration properties of RL to eliminate the requirements for the labeling process, which effectively solves the problem of non-differentiability caused by the binary combination weights in the original GFANC.

Although the GFANC-RL strategy solves the problem of label dependence well, there are still problems in complex acoustic environments. Relying on the lightweight CNN feature extraction module alone cannot adaptively capture the key characteristics of noise signals. In addition, the traditional random experience replay mechanism cannot distinguish the different learning values between samples. A large amount of simple data is trained indiscriminately, which will inevitably reduce the convergence efficiency of the algorithm.

In order to solve the problems of poor adaptability, weak anti-interference capability and slow convergence speed of the GFANC-RL method in the complex in-vehicle acoustic environment, this paper proposes an enhanced method that combines the Channel-Spatial Attention Mechanism (CBAM) (Woo et al., 2018) and Prioritized Experience Replay (PER) (Schaul et al., 2016). Specifically, the CBAM module is added to the front end of the network to focus on the important channel and temporal features, so as to improve the model’s ability to perceive and represent the features of non-stationary noise. The PER mechanism is adopted in the training stage. Unlike the traditional random replay, this mechanism prioritizes learning from key samples, that is, those that play a decisive role in policy update, thus accelerating the speed of convergence and enhancing the anti-interference capability in complex environments. During the noise suppression process, taking the feature representations enhanced by the CBAM module as input, the neural network outputs a binary weight vector. Then, the vector is used to weight and combine the pre-trained sub-filters to obtain a control filter suitable for the current operating conditions, so as to achieve the purpose of suppressing non-stationary noise in the car. Finally, real vehicle data is used to simulate and systematically verify the noise reduction effect, robustness and generalization ability of the proposed method.

2. Algorithm architecture design of GFANC-RL-CP

Figure 1 shows the overall structure of the proposed GFANC-RL-CP method. The reference signal is first processed by an initial 1D convolution layer to generate multiple feature channels, which are subsequently fed into the CBAM module. After obtaining the feature representations, they are fed into the 1D CNN to generate a binary weight vector, which is then used to weight and combine the pre-trained sub-filters to obtain a control filter suitable for the current noise environment.

Figure 1.

Overall architecture of the GFANC-RL-CP method.

A frame-level processing strategy is employed with a sampling rate $f_{s}$ of 16 kHz. The length of each frame is set to 1 s, corresponding to 16,000 sampling points. In this notation, $t$ denotes the frame index, and $x_{t}$ represents the reference noise at the $t$ -th frame, while $n$ designates the sample index within each frame.

2.1. Construction of sub-filters

The construction of sub-filters is the basis for realizing the GFANC-RL-CP method. In this study, a filter decomposition technique based on the perfect reconstruction filter bank theory is adopted. For the broadband primary noise $x (n)$ containing target frequency components, the FxLMS algorithm is employed to perform adaptive iterative control. Using the Discrete Fourier Transform (DFT), this broadband filter is decomposed in the frequency domain into 15 sub-filters.

Figure 2(a) and (b) illustrate the frequency responses of the broadband control filter and sub-filters, respectively. The order of the control filters is set to 1024, with a sampling rate of 16 kHz. Furthermore, bandpass filters with a frequency range of 20 Hz–7980 Hz are used to model the primary and secondary paths.

Figure 2.

Frequency responses of the pre-trained broadband control filter and sub-filters.

The use of 15 sub-filters is a practical choice to balance system performance and training feasibility. It not only provides 2¹⁵ possible combinations to cover the target frequency range (such as the distinct harmonic profiles of electric and internal combustion engine vehicles), but also prevents the exponential expansion of the action space, ensuring that the reinforcement learning training can converge effectively.

2.2. Feature extraction based on channel-spatial attention mechanism

For the ANC system, the ability to extract effective features largely determines the final noise reduction effect. The main disadvantage of the GFANC-RL is that it relies on the lightweight 1D CNN, which treats all input features indiscriminately. It does not have a mechanism to give higher importance to specific features. In order to overcome the above problems, the CBAM module is applied to the process of feature extraction, so as to achieve the purpose of automatically enhancing the input features. It achieves this by dynamically adjusting the feature weights, allowing the network to primarily focus on the important channel and temporal features. Figure 3 shows the internal structure of the CBAM used, which consists of two sub-modules: Channel Attention Module (CAM) and Spatial Attention Module (SAM). Here, the current noise frame vector $x_{t}$ is first processed by an initial 1D convolution layer to generate multiple feature channels, which serve as the input to the CBAM. After double weighting, the system outputs the enhanced state vector $s_{t}$ .

Figure 3.

Structure of the CBAM module.

2.2.1. Channel attention module

To aggregate temporal information, Global Average Pooling and Global Max Pooling operations are first applied to the multi-channel input feature $F$ (generated by the initial 1D convolution layer). The resulting vectors are then input into a shared Multi-Layer Perceptron (MLP). This MLP is constructed with two fully connected layers. The outputs from this network are summed element-wise and subsequently activated by a sigmoid function to yield the channel attention weights $M_{c}$ . The specific calculation is expressed as

\begin{array}{l} M_{c} (F) = σ (MLP (AvgPool (F)) + MLP (MaxPool (F))) \\ = σ (W_{1} (ReLU (W_{0} (F^{avg}))) + W_{1} (ReLU (W_{0} (F^{\max})))) \end{array}

(1)

here

σ

represents the sigmoid function, and

W_{0}

and

W_{1}

correspond to the weight parameters of the shared MLP. The channel-refined intermediate feature

F^{'}

is calculated as

F^{'} = M_{c} (F) \otimes F

(2)

where

\otimes

denotes element-wise multiplication with broadcasting.

Through this process, the CAM calculates a channel weight vector. These weights represent the relative importance of different feature channels, enabling the network to adaptively focus on critical acoustic representations.

2.2.2. Spatial attention module

Distinct from the channel-wise focus, the SAM targets the temporal distribution within the feature sequence. Its primary function is to assign weights to critical time steps, thereby precisely localizing significant noise regions.

First, Average Pooling and Max Pooling operations are applied along the channel axis of the intermediate feature $F^{'}$ . The resulting features are concatenated and processed by a convolution layer with a kernel size of 7 to generate the spatial attention weights $M_{s}$ . The selection of a kernel size of ( $k = 7$ ) serves to broaden the receptive field in the temporal domain, ensuring the effective extraction of the highest target frequency (8000 Hz) without over-smoothing. The computation of spatial attention weights is formulated as

\begin{array}{l} M_{s} (F^{'}) = σ (f^{7} ([AvgPool (F^{'}); MaxPool (F^{'})])) \\ = σ (f^{7} ([F_{avg}^{'}; F_{\max}^{'}])) \end{array}

(3)

where

f^{7}

represents the convolution operation. Finally, the computed spatial attention weights are element-wise multiplied with the intermediate feature

F^{'}

to yield the final enhanced state vector

s_{t}

s_{t} = M_{s} (F^{'}) \otimes F^{'}

(4)

2.3. Improved RL training strategy

When facing non-stationary complex noise, the traditional DRL algorithm often encounters bottlenecks caused by low sample efficiency and convergence instability. In order to systematically model the interaction between agent and acoustic environment, ANC task is framed as a Markov Decision Process (MDP).

Within this framework, the Soft Actor-Critic (SAC) algorithm (Haarnoja et al., 2018) has become the main optimization framework. In addition, the PER mechanism replaces the original uniform sampling. This modification improves the speed and stability of convergence in dynamic acoustic scenarios.

2.3.1. Modeling of Markov decision process

In this study, the active noise control task is modeled as an MDP, represented by the tuple $(S, A, P, R, γ)$ . The state space $S$ comprises the set of reference signal vectors, denoted as $x_{t}$ . The action space $A$ is constituted by the binary weight vectors generated by the Actor network, expressed as $g_{t} = [g_{t, 1} \dots g_{t, m} \dots g_{t, M}]$ . The $P (x_{t + 1} | x_{t}, g_{t})$ describes the probability of the system transitioning to the next state $x_{t + 1}$ given the state $x_{t}$ and action $g_{t}$ . Since the generation of the subsequent noise frame occurs independently of the current control action, this probability simplifies to $P (x_{t + 1} | x_{t}, g_{t}) = P (x_{t + 1} | x_{t})$ . The $R (x_{t}, g_{t})$ quantifies the immediate feedback from the environment after the agent executes action $g_{t}$ in state $x_{t}$ . In this work, the Noise Reduction (NR) metric is utilized to quantify the immediate reward $r_{t}$ :

r_{t} = R (x_{t}, g_{t}) = 10 \log_{10} \frac{\sum_{n = 1}^{L} d_{t}^{2} (n)}{\sum_{n = 1}^{L} e_{t}^{2} (n)}

(5)

where

d_{t} (n)

and

e_{t} (n)

correspond to the disturbance signal and error signal associated with the reference signal

x_{t}

, respectively. A higher reward value

r_{t}

signifies that the anti-noise generated by the current policy neutralizes the disturbance signal more thoroughly, resulting in superior noise reduction performance. Specifically, since the residual error energy is located in the denominator of equation (5), a reduction in the error energy will increase the reward value

r_{t}

. This mathematically ensures consistency between noise reduction performance and reward maximization.

The discount factor $γ \in [0, 1)$ . In this work, $γ$ is set to 0.99 to ensure the agent prioritizes long-term system stability and cumulative noise reduction effects.

Based on the MDP model, the discounted cumulative return $G_{t}$ :

G_{t} = r_{t} + \sum_{k = 1}^{\infty} γ^{k} r_{t + k}

(6)

The optimal policy $π^{*}$ can be obtained by maximizing the expectation of the cumulative return:

π^{*} = \underset{π}{\arg \max} \underset{g_{t} \sim π (\cdot | x_{t})}{E} [G_{t}]

(7)

2.3.2. SAC algorithm

Conventional RL algorithms typically focus on the maximization of cumulative returns. However, in non-stationary active noise control environments, this deterministic optimization strategy often causes the agent to get trapped in local optima, failing to adapt to rapid acoustic field variations.

To address this limitation, the SAC algorithm is adopted. Its distinct advantage lies in incorporating the maximum entropy principle into the optimization goal.

The objective function $J (π)$ for maximum entropy RL is formulated as

J (π) = \sum_{t = 1}^{\infty} γ^{t - 1} \underset{g_{t} \sim π (\cdot | x_{t})}{E} [r (x_{t}, g_{t}) + α H (π (\cdot | x_{t}))]

(8)

where

α

is the temperature parameter.

H (π (\cdot | x_{t}))

represents the entropy of the policy at state

x_{t}

, which is calculated as

H (π (\cdot | x_{t})) = \underset{g_{t} \sim π (\cdot | x_{t})}{E} [- \log π (g_{t} | x_{t})]

(9)

Based on this framework, the optimal policy $π^{*}$ is derived by maximizing the entropy-regularized objective:

π^{*} = \underset{π}{\arg \max} \sum_{t = 1}^{\infty} γ^{t - 1} \underset{g_{t} \sim π (\cdot | x_{t})}{E} [r (x_{t}, g_{t}) + α H (π (\cdot | x_{t}))]

(10)

2.3.3. Network update and prioritized experience replay

To obtain the optimal policy $π^{*}$ that maximizes the objective function, an Actor-Critic deep neural network architecture is constructed. Furthermore, the PER mechanism is adopted to iteratively update the network parameters. Unlike traditional uniform sampling, PER assigns sampling priorities based on the Temporal Difference (TD) error of the samples, enabling the system to effectively address sudden non-stationary disturbances, such as impulse noises, which typically generate high TD errors. Let a batch of size $B$ be sampled from the replay buffer. For the $i$ -th sample tuple $(x_{i}, g_{i}, r_{i}, x_{i + 1})$ , an importance sampling weight $ω_{i}$ is introduced to correct the distribution bias and prevent over-fitting by assigning smaller update weights to frequently sampled outliers:

ω_{i} = {(\frac{1}{N} \cdot \frac{1}{P (i)})}^{β}

(11)

where

N

represents the capacity of the replay buffer,

P (i)

denotes the sampling probability of the sample, and

β

is the bias correction coefficient.

To alleviate the issue of value function overestimation common in DRL, the Clipped Double Q-learning technique (Fujimoto et al., 2018) is incorporated. Specifically, two Critic networks, $Q_{ϕ_{1}}$ and $Q_{ϕ_{2}}$ , with identical structures but independent parameters ( $ϕ_{1}$ and $ϕ_{2}$ , respectively), are constructed (Figures 4 and 5).

Figure 4.

Critic parameter update.

Figure 5.

Actor parameter update.

First, to calculate the target value $y_{i}$ for the $i$ -th sample, the next action $g_{i + 1}^{π}$ is sampled from the Actor network based on the next state $x_{i + 1}$ :

g_{i + 1}^{π} \sim π_{θ} (\cdot | x_{i + 1})

(12)

Subsequently, the target value is computed by incorporating the target Critic network parameters $ϕ_{j}^{'}$ :

y_{i} = r_{i} + γ (1 - d_{i}) (\min_{j = 1, 2} Q_{ϕ_{j}^{'}} (x_{i + 1}, g_{i + 1}^{π}) - α \log π_{θ} (g_{i + 1}^{π} | x_{i + 1}))

(13)

where

d_{i}

represents the terminal signal (

d_{i} = 1

x_{i + 1}

is a terminal state; otherwise,

d_{i} = 0

The objective of the Critic network is to estimate the action value by minimizing the discrepancy between the predicted value and the target value. By introducing the importance sampling weight $ω_{i}$ into the loss function, the parameters $ϕ_{j} (j = 1, 2)$ of the two Critic networks are updated by minimizing the weighted Bellman residual. The loss function is defined as

L (ϕ_{j}) = \frac{1}{B} \sum_{i = 1}^{B} ω_{i} {(Q_{ϕ_{j}} (x_{i}, g_{i}) - y_{i})}^{2}

(14)

Finally, the Critic parameters $ϕ_{j}$ are updated using the gradient descent method:

ϕ_{j} \leftarrow ϕ_{j} - λ_{Q} \nabla_{ϕ_{j}} L (ϕ_{j})

(15)

After each update of the Critic network parameters $ϕ_{j}$ , the target network parameters $ϕ_{j}^{'}$ are synchronized using the soft update mechanism to ensure training stability. The soft update formula is as follows:

ϕ_{j}^{'} \leftarrow τ ϕ_{j} + (1 - τ) ϕ_{j}^{'}

(16)

where

τ

denotes the soft update coefficient.

The Actor network comprises a sigmoid layer and a binarization layer to generate discrete binary weight vectors $g_{i}^{π}$ . The update of the policy parameters $θ$ aims to simultaneously maximize the action value and the policy entropy. The objective function $J (π_{θ})$ is defined as follows:

J (π_{θ}) = \frac{1}{B} \sum_{i = 1}^{B} (\min_{j = 1, 2} Q_{ϕ_{j}} (x_{i}, g_{i}^{π}) - α \log π_{θ} (g_{i}^{π} | x_{i}))

(17)

where

\min_{j = 1, 2} Q_{ϕ_{j}} (x_{i}, g_{i}^{π})

denotes the minimum output value of the two Critic networks.

Finally, the Actor parameters $θ$ are updated using the gradient ascent method:

θ \leftarrow θ + λ_{π} \nabla_{θ} J (π_{θ})

(18)

2.3.4. Network training

To enable the agent to learn the optimal control policy in complex noise environments, the proposed GFANC-RL-CP method is trained offline. The synthetic noise dataset from Luo et al. (2024a) is adopted.

Based on the MDP formulation, the agent receives the reference signal vector $x_{t}$ , and generates the weight vector $g_{t}$ via the current Actor network. The environment provides a reward value $r_{t}$ based on the noise reduction performance. All generated interaction data tuples $(x_{t}, g_{t}, r_{t}, x_{t + 1})$ are stored in the experience replay buffer. Utilizing the SAC algorithm and the PER mechanism, high-value samples are selected from the buffer. The Critic network parameters $ϕ_{j}$ are updated via gradient descent, while the Actor network parameters $θ$ are updated via gradient ascent to maximize the cumulative expected reward and policy entropy. The Adam optimizer (Kingma and Ba, 2015) is used for the optimization and update of network parameters throughout the process.

To verify the improvement in algorithm convergence efficiency brought by the PER mechanism, a comparative analysis of model training performance before and after introducing this mechanism is conducted. The appropriate learning rate parameters ensuring convergence stability were searched for both methods. As shown in Figure 6 (where solid lines represent smoothed convergence curves and the faint traces represent raw training data), benefitting from the effective feature extraction of the CBAM module, both methods eventually converge to a high reward level. The proposed algorithm achieves stability rapidly, reaching a steady state at approximately 100,000 steps. This performance contrasts sharply with the uniform sampling counterpart, which lags behind and requires roughly 190,000 steps to stabilize. Furthermore, without the PER mechanism, the method relying on uniform sampling exhibits visible fluctuations at high learning rates, necessitating a more conservative configuration. The proposed GFANC-RL-CP method, on the other hand, sustains higher learning rate settings without significant fluctuations, which directly accelerates convergence. These results confirm that the PER mechanism boosts training efficiency by nearly 50%.

Figure 6.

Training reward convergence curves.

Upon completion of training, the Actor network with the optimal parameters $θ^{*}$ is saved. For each reference signal $x_{t}$ , the network generates the weight vector $g_{t}$ :

g_{t} = Actor (x_{t}; θ^{*})

(19)

For this framework, the computationally intensive training processes (including SAC algorithm iterations and the PER mechanism) are conducted entirely offline. During the online control phase, the system only involves the forward inference of the Actor network to compute equation (19). Since this inference process is conducted at the frame-level rather than sample-by-sample, the online execution frequency is effectively controlled. Furthermore, the 1D convolutions utilized by the network primarily involve basic multiply-accumulate operations. Therefore, assessed from the perspective of algorithmic complexity, the online computational load of this method is relatively limited, which is expected to meet future deployment requirements on resource-constrained hardware.

3. Performance simulation and analysis

In this section, an offline ANC simulation environment is established based on real-world in-vehicle noise data. The proposed GFANC-RL-CP is compared with the GFANC-RL and the FxLMS algorithm to demonstrate that the proposed algorithm outperforms the comparative methods in terms of noise reduction performance, robustness, and generalization ability.

3.1. In-vehicle noise data acquisition

To ensure the simulation experiments closely reflect the real acoustic field environment, a standard passenger vehicle was selected as the test vehicle, and a high-precision acoustic data acquisition system was established. The experimental setup and hardware equipment are illustrated in Figure 7. The core hardware of the acquisition system primarily consists of a microphone array comprising 8 Ono Sokki MI-1271 microphones, an acoustic data acquisition unit, and an experimental computer. The software platform used is HEAD Recorder 9.1.

Figure 7.

Experimental setup. (a) Experimental vehicle; (b) Experimental microphone; (c) HEADlab V12 data acquisition system; (d) Laptop with HEAD recorder software.

A total of eight microphones were installed on both sides of the seat headrests to collect noise data. The specific measurement positions included FL-in, FL-out, FR-in, FR-out, RL-in, RL-out, RR-in, and RR-out. Here, FL, FR, RL, and RR denote the front-left, front-right, rear-left, and rear-right seats. The suffixes “in” and “out” refer to the inner and outer sides of the headrest.

To investigate the influence of different road surfaces on vehicle noise, data acquisition was conducted at a typical urban cruising speed of 60 km/h on both smooth and rough urban roads, forming a complex acoustic environment with highly coupled noise sources. The sampling rate of the noise signals was set to 16 kHz. We normalized the collected raw data to a range of [0, 1] via Min-Max normalization:

x (n) = \frac{x_{raw} (n) - \min [x_{raw} (n)]}{\max [x_{raw} (n)] - \min [x_{raw} (n)]}

(20)

where

x_{raw} (n)

denotes the raw noise signal sequence, while

\max [x_{raw} (n)]

and

\min [x_{raw} (n)]

represent the maximum and minimum values of the signal in the time domain, respectively.

3.2. Simulation with real-world noise

Using the policy network obtained from the offline training stage, the GFANC-RL-CP method is applied to real-world in-vehicle noise for ANC simulation. We selected four typical noise data samples obtained from the right front (FR) seat for testing, reflecting the noise reduction performance, stability and generalization under complex non-stationary conditions. There are complex sound environment characteristics in real-world data, which are very different from the synthetic noise dataset used in the training stage.

The GFANC-RL-CP method is compared with the GFANC-RL and the FxLMS to verify the performance of the proposed method. For the FxLMS algorithm, the step size is set to 0.0001.

The experimental results given in Figures 8 –11 and Table 1 show that the GFANC-RL-CP method has a good control effect in the case of complex and non-stationary sound sources. This method, on the basis of ensuring stability, shows a faster convergence speed compared to the FxLMS. Although the GFANC-RL exhibits serrated oscillation under complex conditions (such as rough road surfaces), the improved method uses the PER mechanism to prioritize learning key samples, thus effectively suppressing this instability. The proposed method maintains a consistent average noise reduction (NR) of 25–30 dB under all operating conditions, 4–8 dB higher than the GFANC-RL method. The consistent behavior under different road surfaces and spatial positions demonstrates the effectiveness and generalization ability of the improved method.

Figure 8.

Comparison of noise suppression effects at the FR-in position for smooth road scenarios.

Figure 9.

Comparison of noise suppression effects at the FR-out position for smooth road scenarios.

Figure 10.

Comparison of noise suppression effects at the FR-in position for rough road scenarios.

Figure 11.

Comparison of noise suppression effects at the FR-out position for rough road scenarios.

Table 1.

Comparison of average noise reduction levels (dB) under different operating conditions.

ANC algorithm	Smooth-FR-in	Smooth-FR-out	Rough-FR-in	Rough-FR-out
GFANC-RL-CP	26.81	27.24	25.56	25.77
GFANC-RL	19.22	20.94	20.81	21.57
FxLMS	10.42	12.10	14.61	17.33

Figure 12 presents the power spectral density (PSD) comparisons across the four test scenarios, providing a detailed view of the control performance and energy distribution in the frequency domain.

Figure 12.

Comparison of power spectral density (PSD) under different operating conditions.

Although the traditional FxLMS algorithm weakens the amplitude in the low-frequency range, its PSD curve is consistent with that of high-frequency primary noise, exposing the shortcomings of suppressing high-frequency noise. From the whole spectrum, GFANC-RL-CP can effectively suppress noise. It should be noted that the PSD of the proposed GFANC-RL-CP is generally lower than that of the GFANC-RL and does not generate abnormal high-frequency peaks. The reduced energy level in the whole frequency band demonstrates that the improved algorithm has a good noise reduction effect in complex environments and can more effectively attenuate the noise energy in the vehicle, ensuring that the numerical noise reduction genuinely translates into improved practical acoustic comfort for passengers.

4. Conclusions

This paper studies the active noise control strategy of RL-based generative fixed filter to solve the complex and non-stationary characteristics of the acoustic environment in the vehicle. Therefore, an improved GFANC-RL-CP method based on CBAM module and PER mechanism is proposed. The main conclusions are as follows:

(1) The integrated CBAM module improves the ability of the network to extract features in complex sound fields. It utilizes the dual attention weighting of channel and spatial dimensions to better identify key noise characteristics. It overcomes the shortcomings of traditional methods regarding insufficient feature extraction in the dynamic environment, and lays a solid foundation for generating optimal control filter parameters.

(2) The PER mechanism can improve the training efficiency of the algorithm. It adopts the method of assigning larger sampling weights to samples with high TD errors to overcome the inefficiency caused by traditional uniform sampling. The experimental results show that the training speed has been increased by about 50%.

(3) Based on real vehicle data, the proposed method exhibits good noise reduction, strong robustness, and high adaptability to various environments. In various cases, the average NR of the GFANC-RL-CP is 4–8 dB and 8–16 dB higher than that of the GFANC-RL and FxLMS. Having the same performance under different conditions demonstrates its good generalization ability. Frequency domain analysis proves that in the entire frequency band from 0 to 8000 Hz, the GFANC-RL-CP achieves superior broadband noise suppression and robustness.

Footnotes

ORCID iDs

Yancui Jiang

Shiyao Hu

Rongyi Li

Ethical considerations

This article does not contain any studies with human or animal participants.

Author contributions

Yancui Jiang: Conceptualization, Methodology, Supervision, Writing - review & editing. Shiyao Hu: Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Visualization. Rongyi Li: Resources, Funding acquisition, Supervision, Project administration, Writing - review & editing. Deming Li: Investigation, Writing - review & editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Heilongjiang Natural Science Foundation Outstanding Youth Project [grant number YQ2024E041]; the National Natural Science Foundation of China [grant number 52575487]; the “Chunyan” Team Support Plan Project of Heilongjiang [grant number CYCX24001]; and the Opening Project of the Key Laboratory of Advanced Processing Technology and Intelligent Manufacturing (Heilongjiang Province), Harbin University of Science and Technology [grant number KFKT202201].

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.*

References

Aboulnasr

Mayyas

(1997) A robust variable step-size LMS-type algorithm: analysis and simulations. IEEE Transactions on Signal Processing 45(3): 631–639.

Bianco

Gerstoft

Traer

, et al. (2019) Machine learning in acoustics: theory and applications. Journal of the Acoustical Society of America 146(5): 3590–3628. https://doi.org/10.1121/1.5133944

Cha

Mostafavi

Benipal

(2023) DNoiseNet: deep learning-based feedback active noise control in various noisy environments. Engineering Applications of Artificial Intelligence 121: 105971.

Fujimoto

van Hoof

Meger

(2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th international conference on machine learning (ICML), Stockholm, Sweden, 10–15 July 2018, Vol. 80, pp. 1587–1596. PMLR.

Haarnoja

Zhou

Abbeel

, et al. (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning (ICML), Stockholm, Sweden, 10–15 July 2018, Vol. 80, 1861–1870. PMLR.

Kingma

(2015) Adam: a method for stochastic optimization. In: 3rd International conference on learning representations (ICLR). San Diego, CA, USA. 7–9 May 2015.

Kuo

Morgan

(1996) Active Noise Control Systems: Algorithms and DSP Implementations. John Wiley & Sons, Inc.

Kwon

Kim

Park

(2022) Active noise reduction with filtered least-mean-square algorithm improved by long short-term memory models for radiation noise of diesel engine. Applied Sciences 12(20): 10248.

Mai

(2024) Efficient implementation of the functional links artificial neural networks with cross-terms for nonlinear active noise control. International Journal of Electrical and Computer Engineering 14(4): 3922–3930.

10.

Bai

(2025) Reinforcement learning algorithm for secondary path identification in active noise control systems. AIP Advances 15(8): 085021.

11.

Yin

de Lamare

, et al. (2021) A survey on active noise control in the past decade–Part II: nonlinear systems. Signal Processing 181: 107929.

12.

Luo

Shi

Shen

, et al. (2023) Deep generative fixed-filter active noise control. In: ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023, pp. 1–5. IEEE.

13.

Luo

Shi

, et al. (2024a) GFANC-RL: reinforcement learning-based generative fixed-filter active noise control. Neural Networks 180: 106687.

14.

Luo

Shi

Gan

, et al. (2024b) Delayless generative fixed-filter active noise control based on deep learning and Bayesian filter. IEEE/ACM Transactions on Audio, Speech and Language Processing 32: 1048–1060.

15.

Morgan

(1980) An analysis of multiple correlation cancellation loops with a filter in the auxiliary path. IEEE Transactions on Acoustics, Speech, & Signal Processing 28(4): 454–467.

16.

Mostafavi

Cha

(2023) Deep learning-based active noise control on construction sites. Automation in Construction 151: 104885.

17.

Nelson

Elliott

(1991) Active Control of Sound. Academic Press.

18.

Park

Patterson

Baum

(2019) Long short-term memory and convolutional neural networks for active noise control. In: 2019 5th International conference on frontiers of signal processing (ICFSP), Marseille, France, 18–20 September 2019, pp. 121–125. IEEE.

19.

Schaul

Quan

Antonoglou

, et al. (2016) Prioritized experience replay. In: 4th International conference on learning representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016.

20.

Shi

Lam

Ooi

, et al. (2022) Selective fixed-filter active noise control based on convolutional neural network. Signal Processing 190: 108317.

21.

Tan

Jiang

(2001) Adaptive volterra filters for active control of nonlinear noise processes. IEEE Transactions on Signal Processing 49(8): 1667–1676.

22.

Woo

Park

Lee

, et al. (2018) CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018, pp. 3–19. Springer.

23.

Yao

Yuan

Chen

, et al. (2026) Adaptive neural control for a class of random nonlinear Markov jump multi-agent systems with full state constraints. Journal of Vibration and Control 32(3–4): 630–645.

24.

Zhang

Wang

(2020) A deep learning approach to active noise control. In: Interspeech 2020. ISCA, pp. 1141–1145. 25–29 October 2020.