Abstract
Although Deep Reinforcement Learning (DRL) has shown potential in Active Noise Control (ANC), its effectiveness is often limited by the random and non-stationary sound field inside the vehicle. The current methods generally have insufficient feature extraction, and important training samples cannot be utilized effectively. In order to solve the above shortcomings, this paper proposes an enhanced generative filter ANC method, which combines the Channel-Spatial Attention Mechanism with Prioritized Experience Replay (GFANC-RL-CP). Specifically, the system integrated with Channel-Spatial Attention Mechanism (CBAM) is used to achieve the extraction of key channel and temporal features, so as to improve the system’s ability to identify complex noise features. In addition, by deploying the Prioritized Experience Replay (PER) mechanism to improve the learning process, high-value samples can be purposefully reused. Through the verification of real vehicle data, it is found that the performance was clearly improved. The proposed method reduced the training time by about 50%. Compared with traditional FxLMS and GFANC-RL, it achieves improvements of 8–16 dB and 4–8 dB respectively, and has a good noise suppression effect in the entire frequency range of 0–8000 Hz.
Keywords
1. Introduction
With the advancement of automobile technology and people’s increasing requirements for ride comfort, noise control has become an important issue in the field of vehicle engineering. Traditional passive control technology generally has inherent physical limitations, such as bulky structure, high implementation cost and poor low-frequency performance. Therefore, it is difficult for the above methods to meet the requirements of modern vehicles for lightweight design, acoustic comfort and other aspects. Due to the above problems, Active Noise Control (ANC) is considered to be an effective solution (Kuo and Morgan, 1996; Nelson and Elliott, 1991). ANC works according to the principle of acoustic superposition and generates a secondary anti-noise signal to suppress interference. The magnitude of the signal is equal to the primary noise, but the phase is opposite. Facts prove that this technology is more effective in suppressing low-frequency noise.
In the adaptive ANC system, due to its simple structure and high computational efficiency, the Filtered-x Least Mean Square (FxLMS) algorithm (Morgan, 1980) has become the standard solution. However, the use of a fixed step size will cause a conflict between the convergence speed and the steady-state error, thus limiting the tracking ability for non-stationary noise. In order to solve the above problems, a series of revised strategies have been formulated. For example, Variable Step Size (VSS) FxLMS (Aboulnasr and Mayyas, 1997) takes into account convergence and system stability. The algorithm of Volterra series (Tan and Jiang, 2001) is used to deal with nonlinear distortion. However, most of the above methods adopt linear or shallow nonlinear models. When responding to the rapidly changing strong nonlinear sound field, the optimal parameters generally cannot be updated in real-time, so it is difficult to achieve the purpose of noise reduction (Lu et al., 2021).
Due to its good nonlinear modeling ability (Bianco et al., 2019), the deep neural network (DNN) is increasingly used in ANC to solve the problem that traditional algorithms cannot handle complex, nonlinear, and stochastic control environments well (Yao et al., 2026). Zhang and Wang (2020) proposed the deep ANC framework, which uses a Convolutional Recurrent Network (CRN) to model secondary paths and noise signals, which has achieved obvious results in broadband noise elimination. Due to the inherent strong time correlation of acoustic signals, the architecture of Long Short-Term Memory (LSTM) network has also attracted a lot of attention. These architectures use memory units to handle time-varying on-board noise well (Kwon et al., 2022; Park et al., 2019). Building on these capabilities, recent studies have further expanded the application of deep learning to complex and non-stationary acoustic environments. For instance, to mitigate highly nonlinear and transient noises, Mostafavi and Cha (2023) proposed a high-performance feedforward ANC controller incorporating LSTM and attention mechanisms, while Cha et al. (2023) developed DNoiseNet, a robust deep learning-based feedback ANC system. However, although they improve the ability of nonlinear fitting, most deep learning models will cause a large computational burden and require high storage capacity (Le and Mai, 2024). Therefore, for latency-sensitive embedded ANC systems, deploying such intensive algorithms on resource-limited hardware is still a major problem, and strict real-time performance is hard to guarantee.
In order to achieve a balance between real-time performance and nonlinear control on hardware with limited resources, fixed filter strategies are increasingly adopted owing to their computational efficiency and stability. Because traditional fixed filters often fail in complex acoustic fields, Shi et al. (2022) proposed the framework of Selective Fixed-Filter ANC (SFANC). The architecture uses a Convolutional Neural Network (CNN) to find the optimal controller from a set of pre-trained filters to achieve the purpose of reducing latency. However, the effectiveness of SFANC is limited by a finite number of pre-trained filters. Therefore, when there is a significant difference between the actual noise and the noise in training, the performance will decline rapidly. In order to solve the above problems, Luo et al. (2023, 2024b) proposed the Generative Fixed-Filter ANC (GFANC) method. This method is not based on fixed pre-trained filters, but generates corresponding control filters according to different noise types, and adopts Bayesian filtering and no-delay structure to improve the accuracy and real-time generation. However, GFANC still relies on CNN-based supervised learning. If the noise label is not accurate, it will degrade the accuracy of the filter generation and have an adverse impact on noise reduction. Inspired by the superior adaptability of Reinforcement Learning (RL) in dynamic, nonlinear acoustic environments (Li et al., 2025), Luo et al. (2024a) proposed an RL-based GFANC (GFANC-RL) strategy. The method relies on the exploration properties of RL to eliminate the requirements for the labeling process, which effectively solves the problem of non-differentiability caused by the binary combination weights in the original GFANC.
Although the GFANC-RL strategy solves the problem of label dependence well, there are still problems in complex acoustic environments. Relying on the lightweight CNN feature extraction module alone cannot adaptively capture the key characteristics of noise signals. In addition, the traditional random experience replay mechanism cannot distinguish the different learning values between samples. A large amount of simple data is trained indiscriminately, which will inevitably reduce the convergence efficiency of the algorithm.
In order to solve the problems of poor adaptability, weak anti-interference capability and slow convergence speed of the GFANC-RL method in the complex in-vehicle acoustic environment, this paper proposes an enhanced method that combines the Channel-Spatial Attention Mechanism (CBAM) (Woo et al., 2018) and Prioritized Experience Replay (PER) (Schaul et al., 2016). Specifically, the CBAM module is added to the front end of the network to focus on the important channel and temporal features, so as to improve the model’s ability to perceive and represent the features of non-stationary noise. The PER mechanism is adopted in the training stage. Unlike the traditional random replay, this mechanism prioritizes learning from key samples, that is, those that play a decisive role in policy update, thus accelerating the speed of convergence and enhancing the anti-interference capability in complex environments. During the noise suppression process, taking the feature representations enhanced by the CBAM module as input, the neural network outputs a binary weight vector. Then, the vector is used to weight and combine the pre-trained sub-filters to obtain a control filter suitable for the current operating conditions, so as to achieve the purpose of suppressing non-stationary noise in the car. Finally, real vehicle data is used to simulate and systematically verify the noise reduction effect, robustness and generalization ability of the proposed method.
2. Algorithm architecture design of GFANC-RL-CP
Figure 1 shows the overall structure of the proposed GFANC-RL-CP method. The reference signal is first processed by an initial 1D convolution layer to generate multiple feature channels, which are subsequently fed into the CBAM module. After obtaining the feature representations, they are fed into the 1D CNN to generate a binary weight vector, which is then used to weight and combine the pre-trained sub-filters to obtain a control filter suitable for the current noise environment. Overall architecture of the GFANC-RL-CP method.
A frame-level processing strategy is employed with a sampling rate
2.1. Construction of sub-filters
The construction of sub-filters is the basis for realizing the GFANC-RL-CP method. In this study, a filter decomposition technique based on the perfect reconstruction filter bank theory is adopted. For the broadband primary noise
Figure 2(a) and (b) illustrate the frequency responses of the broadband control filter and sub-filters, respectively. The order of the control filters is set to 1024, with a sampling rate of 16 kHz. Furthermore, bandpass filters with a frequency range of 20 Hz–7980 Hz are used to model the primary and secondary paths. Frequency responses of the pre-trained broadband control filter and sub-filters.
The use of 15 sub-filters is a practical choice to balance system performance and training feasibility. It not only provides 215 possible combinations to cover the target frequency range (such as the distinct harmonic profiles of electric and internal combustion engine vehicles), but also prevents the exponential expansion of the action space, ensuring that the reinforcement learning training can converge effectively.
2.2. Feature extraction based on channel-spatial attention mechanism
For the ANC system, the ability to extract effective features largely determines the final noise reduction effect. The main disadvantage of the GFANC-RL is that it relies on the lightweight 1D CNN, which treats all input features indiscriminately. It does not have a mechanism to give higher importance to specific features. In order to overcome the above problems, the CBAM module is applied to the process of feature extraction, so as to achieve the purpose of automatically enhancing the input features. It achieves this by dynamically adjusting the feature weights, allowing the network to primarily focus on the important channel and temporal features. Figure 3 shows the internal structure of the CBAM used, which consists of two sub-modules: Channel Attention Module (CAM) and Spatial Attention Module (SAM). Here, the current noise frame vector Structure of the CBAM module.
2.2.1. Channel attention module
To aggregate temporal information, Global Average Pooling and Global Max Pooling operations are first applied to the multi-channel input feature
Through this process, the CAM calculates a channel weight vector. These weights represent the relative importance of different feature channels, enabling the network to adaptively focus on critical acoustic representations.
2.2.2. Spatial attention module
Distinct from the channel-wise focus, the SAM targets the temporal distribution within the feature sequence. Its primary function is to assign weights to critical time steps, thereby precisely localizing significant noise regions.
First, Average Pooling and Max Pooling operations are applied along the channel axis of the intermediate feature
2.3. Improved RL training strategy
When facing non-stationary complex noise, the traditional DRL algorithm often encounters bottlenecks caused by low sample efficiency and convergence instability. In order to systematically model the interaction between agent and acoustic environment, ANC task is framed as a Markov Decision Process (MDP).
Within this framework, the Soft Actor-Critic (SAC) algorithm (Haarnoja et al., 2018) has become the main optimization framework. In addition, the PER mechanism replaces the original uniform sampling. This modification improves the speed and stability of convergence in dynamic acoustic scenarios.
2.3.1. Modeling of Markov decision process
In this study, the active noise control task is modeled as an MDP, represented by the tuple
The discount factor
Based on the MDP model, the discounted cumulative return
The optimal policy
2.3.2. SAC algorithm
Conventional RL algorithms typically focus on the maximization of cumulative returns. However, in non-stationary active noise control environments, this deterministic optimization strategy often causes the agent to get trapped in local optima, failing to adapt to rapid acoustic field variations.
To address this limitation, the SAC algorithm is adopted. Its distinct advantage lies in incorporating the maximum entropy principle into the optimization goal.
The objective function
Based on this framework, the optimal policy
2.3.3. Network update and prioritized experience replay
To obtain the optimal policy
To alleviate the issue of value function overestimation common in DRL, the Clipped Double Q-learning technique (Fujimoto et al., 2018) is incorporated. Specifically, two Critic networks, Critic parameter update. Actor parameter update.

First, to calculate the target value
Subsequently, the target value is computed by incorporating the target Critic network parameters
The objective of the Critic network is to estimate the action value by minimizing the discrepancy between the predicted value and the target value. By introducing the importance sampling weight
Finally, the Critic parameters
After each update of the Critic network parameters
The Actor network comprises a sigmoid layer and a binarization layer to generate discrete binary weight vectors
Finally, the Actor parameters
2.3.4. Network training
To enable the agent to learn the optimal control policy in complex noise environments, the proposed GFANC-RL-CP method is trained offline. The synthetic noise dataset from Luo et al. (2024a) is adopted.
Based on the MDP formulation, the agent receives the reference signal vector
To verify the improvement in algorithm convergence efficiency brought by the PER mechanism, a comparative analysis of model training performance before and after introducing this mechanism is conducted. The appropriate learning rate parameters ensuring convergence stability were searched for both methods. As shown in Figure 6 (where solid lines represent smoothed convergence curves and the faint traces represent raw training data), benefitting from the effective feature extraction of the CBAM module, both methods eventually converge to a high reward level. The proposed algorithm achieves stability rapidly, reaching a steady state at approximately 100,000 steps. This performance contrasts sharply with the uniform sampling counterpart, which lags behind and requires roughly 190,000 steps to stabilize. Furthermore, without the PER mechanism, the method relying on uniform sampling exhibits visible fluctuations at high learning rates, necessitating a more conservative configuration. The proposed GFANC-RL-CP method, on the other hand, sustains higher learning rate settings without significant fluctuations, which directly accelerates convergence. These results confirm that the PER mechanism boosts training efficiency by nearly 50%. Training reward convergence curves.
Upon completion of training, the Actor network with the optimal parameters
For this framework, the computationally intensive training processes (including SAC algorithm iterations and the PER mechanism) are conducted entirely offline. During the online control phase, the system only involves the forward inference of the Actor network to compute equation (19). Since this inference process is conducted at the frame-level rather than sample-by-sample, the online execution frequency is effectively controlled. Furthermore, the 1D convolutions utilized by the network primarily involve basic multiply-accumulate operations. Therefore, assessed from the perspective of algorithmic complexity, the online computational load of this method is relatively limited, which is expected to meet future deployment requirements on resource-constrained hardware.
3. Performance simulation and analysis
In this section, an offline ANC simulation environment is established based on real-world in-vehicle noise data. The proposed GFANC-RL-CP is compared with the GFANC-RL and the FxLMS algorithm to demonstrate that the proposed algorithm outperforms the comparative methods in terms of noise reduction performance, robustness, and generalization ability.
3.1. In-vehicle noise data acquisition
To ensure the simulation experiments closely reflect the real acoustic field environment, a standard passenger vehicle was selected as the test vehicle, and a high-precision acoustic data acquisition system was established. The experimental setup and hardware equipment are illustrated in Figure 7. The core hardware of the acquisition system primarily consists of a microphone array comprising 8 Ono Sokki MI-1271 microphones, an acoustic data acquisition unit, and an experimental computer. The software platform used is HEAD Recorder 9.1. Experimental setup. (a) Experimental vehicle; (b) Experimental microphone; (c) HEADlab V12 data acquisition system; (d) Laptop with HEAD recorder software.
A total of eight microphones were installed on both sides of the seat headrests to collect noise data. The specific measurement positions included FL-in, FL-out, FR-in, FR-out, RL-in, RL-out, RR-in, and RR-out. Here, FL, FR, RL, and RR denote the front-left, front-right, rear-left, and rear-right seats. The suffixes “in” and “out” refer to the inner and outer sides of the headrest.
To investigate the influence of different road surfaces on vehicle noise, data acquisition was conducted at a typical urban cruising speed of 60 km/h on both smooth and rough urban roads, forming a complex acoustic environment with highly coupled noise sources. The sampling rate of the noise signals was set to 16 kHz. We normalized the collected raw data to a range of [0, 1] via Min-Max normalization:
3.2. Simulation with real-world noise
Using the policy network obtained from the offline training stage, the GFANC-RL-CP method is applied to real-world in-vehicle noise for ANC simulation. We selected four typical noise data samples obtained from the right front (FR) seat for testing, reflecting the noise reduction performance, stability and generalization under complex non-stationary conditions. There are complex sound environment characteristics in real-world data, which are very different from the synthetic noise dataset used in the training stage.
The GFANC-RL-CP method is compared with the GFANC-RL and the FxLMS to verify the performance of the proposed method. For the FxLMS algorithm, the step size is set to 0.0001.
The experimental results given in Figures 8–11 and Table 1 show that the GFANC-RL-CP method has a good control effect in the case of complex and non-stationary sound sources. This method, on the basis of ensuring stability, shows a faster convergence speed compared to the FxLMS. Although the GFANC-RL exhibits serrated oscillation under complex conditions (such as rough road surfaces), the improved method uses the PER mechanism to prioritize learning key samples, thus effectively suppressing this instability. The proposed method maintains a consistent average noise reduction (NR) of 25–30 dB under all operating conditions, 4–8 dB higher than the GFANC-RL method. The consistent behavior under different road surfaces and spatial positions demonstrates the effectiveness and generalization ability of the improved method. Comparison of noise suppression effects at the FR-in position for smooth road scenarios. Comparison of noise suppression effects at the FR-out position for smooth road scenarios. Comparison of noise suppression effects at the FR-in position for rough road scenarios. Comparison of noise suppression effects at the FR-out position for rough road scenarios. Comparison of average noise reduction levels (dB) under different operating conditions.



Figure 12 presents the power spectral density (PSD) comparisons across the four test scenarios, providing a detailed view of the control performance and energy distribution in the frequency domain. Comparison of power spectral density (PSD) under different operating conditions.
Although the traditional FxLMS algorithm weakens the amplitude in the low-frequency range, its PSD curve is consistent with that of high-frequency primary noise, exposing the shortcomings of suppressing high-frequency noise. From the whole spectrum, GFANC-RL-CP can effectively suppress noise. It should be noted that the PSD of the proposed GFANC-RL-CP is generally lower than that of the GFANC-RL and does not generate abnormal high-frequency peaks. The reduced energy level in the whole frequency band demonstrates that the improved algorithm has a good noise reduction effect in complex environments and can more effectively attenuate the noise energy in the vehicle, ensuring that the numerical noise reduction genuinely translates into improved practical acoustic comfort for passengers.
4. Conclusions
This paper studies the active noise control strategy of RL-based generative fixed filter to solve the complex and non-stationary characteristics of the acoustic environment in the vehicle. Therefore, an improved GFANC-RL-CP method based on CBAM module and PER mechanism is proposed. The main conclusions are as follows: (1) The integrated CBAM module improves the ability of the network to extract features in complex sound fields. It utilizes the dual attention weighting of channel and spatial dimensions to better identify key noise characteristics. It overcomes the shortcomings of traditional methods regarding insufficient feature extraction in the dynamic environment, and lays a solid foundation for generating optimal control filter parameters. (2) The PER mechanism can improve the training efficiency of the algorithm. It adopts the method of assigning larger sampling weights to samples with high TD errors to overcome the inefficiency caused by traditional uniform sampling. The experimental results show that the training speed has been increased by about 50%. (3) Based on real vehicle data, the proposed method exhibits good noise reduction, strong robustness, and high adaptability to various environments. In various cases, the average NR of the GFANC-RL-CP is 4–8 dB and 8–16 dB higher than that of the GFANC-RL and FxLMS. Having the same performance under different conditions demonstrates its good generalization ability. Frequency domain analysis proves that in the entire frequency band from 0 to 8000 Hz, the GFANC-RL-CP achieves superior broadband noise suppression and robustness.
Footnotes
Ethical considerations
This article does not contain any studies with human or animal participants.
Author contributions
Yancui Jiang: Conceptualization, Methodology, Supervision, Writing - review & editing. Shiyao Hu: Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Visualization. Rongyi Li: Resources, Funding acquisition, Supervision, Project administration, Writing - review & editing. Deming Li: Investigation, Writing - review & editing.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Heilongjiang Natural Science Foundation Outstanding Youth Project [grant number YQ2024E041]; the National Natural Science Foundation of China [grant number 52575487]; the “Chunyan” Team Support Plan Project of Heilongjiang [grant number CYCX24001]; and the Opening Project of the Key Laboratory of Advanced Processing Technology and Intelligent Manufacturing (Heilongjiang Province), Harbin University of Science and Technology [grant number KFKT202201].
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
