Abstract
Vibration control in civil engineering is often challenging due to the nonlinear nature of structures. Traditional control strategies have limitations in terms of modeling accuracy and scalability, especially when analyzing complex nonlinear systems. To solve this problem, this study proposes a model-free active vibration control technique specifically for nonlinear systems, which employs deep reinforcement learning (DRL) to train a neural network controller. The effectiveness and practicality of the proposed method have been validated on a shallow, simply supported buckled beam. The results prove that DRL can significantly increase the safety margin and effectively mitigate buckling under high load levels without requiring extra energy. Compared with conventional model-based linear and polynomial controllers, the proposed control strategy demonstrates excellent adaptability and ease of implementation. This research aims to supplement and expand the existing understanding of DRL applications in structural control, pointing towards a promising direction for future technological advancements and real-world applications.
Keywords
1. Introduction
Structural control is pivotal in mitigating vibrations of civil structures (e.g., buildings and bridges) induced by external dynamic loads (such as wind and seismic loads). These vibrations may jeopardize structural serviceability and integrity and pose safety risks to the public. As architectural trends lean towards more slender structural elements, nonlinear effects, particularly geometric nonlinearities, become increasingly important (Ali and Moon, 2018). While increasing the stiffness of a structure can directly reduce displacements and mitigate nonlinear effects, it also inevitably increases structural weight and the associated costs. Achieving a balance between minimizing risk and avoiding excessive material use necessitates the development of advanced, adaptive, and scalable control strategies to manage nonlinear vibration challenges effectively.
The emergence of active control methods has marked a significant advancement in vibration control in the last few decades (Datta, 2003; Spencer and Nagarajaiah, 2003). Passive control systems are normally tuned to specific resonant frequencies and have fixed parameters, while active control systems can control multiple vibration modes simultaneously, resulting in a wide adaptive range for disturbances (Saaed et al., 2015; Soong and Costantinou, 2014). Although semi-active systems have stronger adaptability than passive systems, they are still incomparable to active systems (Casciati et al., 2012; Song and Gu, 2007). In comparison, passive systems can react to dissipate energy; semi-active systems can merely modulate energy dissipation; while active control systems can actively inject or dissipate energy (Cai and Zhu, 2022). Despite concerns related to power dependency and system failure still remain, the effectiveness of active control systems can be improved with advanced technologies (Li and Zhu, 2022).
While active control methods have achieved considerable success in linear systems, extending their applications to nonlinear systems still faces challenges. Nonlinear systems are inherently more complex and sensitive to initial conditions (Abdel-Rohman and Leipholz, 1978; Astolfi et al., 2008; Iqbal et al., 2017; Nayfeh et al., 1980; Xing et al., 2017). Direct implementation of active linear controllers to nonlinear systems relies on linearizing nonlinear systems around specific operating points (Agrawal et al., 1998). However, this approach has limitations because errors are introduced whenever deviations from these predetermined points occur (Tomasula et al., 1996). Historically, the methods for solving nonlinear feedback control problems can be classified into two main categories (Li and Todorov, 2004), based on Bellman’s optimality principle and Pontryagin’s maximum principle, respectively (Bryson and Ho, 1975; Lewis et al., 2012). While groundbreaking in their time, these conventional methods necessitate a precise understanding of the system’s mathematical model and parameters. These methods typically aim to minimize a polynomial performance index, and by incorporating higher-order performance indices based on the linear quadratic regulator (LQR) problem, a nonlinear control force can be deduced through cost function minimization, yielding a control force that becomes a nonlinear function of the state vector (Agrawal et al., 1998; Shefer and Breakwell, 1987; Suhardjo et al., 1992; Wu et al., 1995). However, these strategies encounter limitations when applied to high-dimensional systems, as the Hamilton–Jacobi–Bellman equation, central to their optimization process, becomes increasingly difficult to solve. Sliding mode control (SMC) (Mahmoudi et al., 2019) and its variant, fuzzy sliding mode control (FSMC), enrich the nonlinear control toolkit by offering robustness against external disturbances from their capacity to enforce a specific control law once the system’s state attains the sliding surface, thereby guaranteeing stability under diverse conditions. Nevertheless, they may cause chattering phenomena or increasing computational overhead (Allen et al., 2000; Yang et al., 1996; Zhao et al., 2000). Additional nonlinear control techniques include the pulse control (Pantelides and Nelson, 1995), perturbation method (Jansen, 2008), stochastic dynamic programming (Lü et al., 2020), and iterative LQR (Li and Todorov, 2004). The performance of these controllers degrades when the modeling assumptions cannot accurately capture the nonlinear dynamics.
Development in deep reinforcement learning (DRL) presents paradigm-shifting solutions for nonlinear vibration control problems. DRL-based controllers can generate optimal control strategies through direct interaction with environments without the need for predefined mathematical models (Li, 2017). Researchers have extensively utilized DRL to solve problems across various domains, including robotics, finance, and healthcare. The deep Q-network (DQN) proved the potential of DRL in discrete control problems (Mnih et al., 2013). Later, deep deterministic policy gradients (DDPG) performed well in high-dimensional, continuous action spaces (Lillicrap et al., 2015). Trust region policy optimization (Schulman et al., 2015) and its variant, proximal policy optimization (Schulman et al., 2017), demonstrated robustness and stability in complex robotic tasks. Twin delayed DDPG (TD3) further reduced overestimation bias as an improvement of DDPG (Fujimoto et al., 2018). At the same time, soft actor-critic (Haarnoja et al., 2018) optimizes the trade-off between exploration and exploitation, making it effective for complex, unknown environments. In addition, studies on hierarchical reinforcement learning (Nachum et al., 2018) and inverse reinforcement learning (Arora and Doshi, 2021) facilitated more complex decision-making processes and enabled systems to learn from human behavior, respectively (Barto and Mahadevan, 2003).
The feasibility of using RL in structural control was validated on a full-scale tensegrity structure in the earlier days (Adam and Smith, 2008). While DQN has been proven successful in generating discrete control signals in previous applications (Rahmani et al., 2019), more research has been conducted to explore the implementation of DRL under different conditions. By considering uncertainties in seismic inputs, the Q-learning RL algorithm was utilized to tune the gains of a fuzzy proportional derivative controller in order to achieve the stability of high-rise building vibration (Khalatbarisoltani et al., 2019). An actor-only RL method was used for semi-active control of a simply-supported aluminum beam (Pisarski and Jankowski, 2023). This method showed relatively low computational complexity in real-time (Panda et al., 2024). Besides, gradient descent algorithms were used in some research for active control (Nagendra et al., 2017; Eshkevari et al., 2023). Previous work by the authors (Zhang and Zhu, 2023) has also shown promising results in controlling vibrations of a linear shear-building by adopting DDPG. However, there is still a lack of research in nonlinear vibration control applications using DRL (Xie et al., 2020).
This study proposed a model-free active control strategy for nonlinear vibration control. A neural network (NN) controller is trained by DRL to eliminate uncertainties and complexities of the target nonlinear system. The proposed strategy was tested on a shallow, simply-supported buckled beam with nonlinearities up to the 3rd order. Under certain loading conditions, the beam may lose stability at load levels below the inherent strength of the material and suffer snap-through buckling (Pinto and Gonçalves, 2000). The DRL-based methodology demonstrated superior performance through comparison with prevailing model-based polynomial controllers. The strategy not only attenuated vibrational responses but also significantly enhanced the safety margins of the structural element, effectively reducing the risk of snap-through buckling while maintaining a conservative energy consumption.
The key contributions of this research can be summarized as follows: (a) Development of an advanced active control methodology for nonlinear vibration control, which outperforms conventional model-based polynomial controllers. (b) Introduction of a highly adaptable and easily implementable active nonlinear control strategy with customizable training parameters to suit various application scenarios.
The structure of this paper is organized in the following manner: Section 2 elaborates on the DRL controller that we propose. Section 3 provides a detailed description of the nonlinear system utilized for testing. Section 4 discloses the implementation details and the simulation results under various test conditions. Lastly, Section 5 offers a summary and conclusion of the work presented.
2. Nonlinear vibration control
2.1. Classical model-based optimal control
For a fully observable and controllable dynamical system, the state-space representation can be expressed as
The efficacy of linear control gains, however, diminishes when addressing nonlinear systems unless these systems are linearized around specific operational points. This limitation necessitates the exploration of nonlinear control strategies. As detailed by Suhardjo et al. (1992), a polynomial controller offers one such alternative. Recursive equations can be formulated and solved by expressing the state equation and control force as polynomial functions. Although these equations can extend to any polynomial degree, it is essential to note that as the order increases, so does the computational complexity; at the same time, the incremental contribution to control performance becomes marginal. This research compares the polynomial controller up to the 3rd order with the DRL-based control methodology proposed in the following subsection.
2.2. DRL-based control
To address the abovementioned problems, the current study introduced DRL to train an optimal NN controller without directly solving the ARE. The control problem is systematically formalized into a DRL paradigm, wherein the system’s dynamics are discretized and encapsulated within a Markov decision process (MDP) characterized by (
For the specific problem of vibration suppression under variable excitation conditions, the step reward is formulated as a negative quadratic cost
Thus, as the expected reward represents the future control performance, maximizing the total return in equation (5) is equivalent to minimizing the quadratic cost function in equation (3). In this way, the controller’s performance is optimized.
By utilizing a model-free DRL algorithm, the study leverages the benefit of not requiring any precise knowledge of system dynamics, thereby enhancing the controller’s robustness to adapt to uncertainties prevalent in real-world control applications. The illustration of training a control policy for a target nonlinear plant by DRL is shown in Figure 1. DRL structure.
In the present investigation, the system can oscillate freely or under external excitation for a predetermined duration. The control policy is subsequently updated based on the data recorded. This data-driven update mechanism utilizes the TD3 algorithm, obviating the need for solving complex nonlinear equations traditionally associated with model-based controllers. TD3 is a model-free, online, and off-policy algorithm that employs a deterministic policy for action selection. The TD3 agent comprises two key components, namely the “actor” and the “critic,” both implemented using deep neural networks (DNN). The actor is responsible for generating the control actions based on the current system state or observations. In comparison, the critic estimates the Q-value of state-action pairs, thereby guiding the policy update mechanism. TD3 represents an advanced variation of the DDPG algorithm and incorporates three key improvements to mitigate the overestimation bias in the Q-value function. The algorithm employs double Q-learning by instantiating two separate critic networks and adopting the lower of the two Q-value estimates for policy evaluation. The policy is updated at a reduced frequency compared with the Q-function updates, contributing to more stable learning dynamics. A small amount of clipped random noises is averaged among the mini-batches and added to the selected action when updating the value function to make the regularization smoother. In TD3, exploration is facilitated by incorporating noise into the actions determined by the policy network throughout the training phase, thereby enriching the agent’s experience by promoting the discovery of a broad range of strategies within the action space.
Upon completing the training process outlined in Algorithm 1, the trained actor alone will serve as an offline controller. It can generate control forces in response to observed system states, like a traditional model-based controller, but with the additional benefits of enhanced adaptability and robustness.
3. Nonlinear control problem formulation
3.1. Buckled beam
To evaluate the efficacy of the proposed DRL-based control methodology, this section investigates the nonlinear oscillations of a shallow, simply supported buckled beam subjected to transversal dynamic loads. The single-degree-of-freedom (SDOF) model derivation is aligned with the work by Pinto and Gonçalves (2002) who employed polynomial controllers up to the third order for the same problem.
Consider a homogeneous elastic simply supported beam, as shown in Figure 2. The section properties are defined by length L, Young’s modulus E, and moment of inertia I. Assuming the response Buckled beam setup.

The control mechanism of the buckled beam consists of applying moment couples at specific points along the beam axis. The dimensionless system equation of motion of the controlled beam is
3.2. Static behavior
The behavior of the buckled beam under static loading is initially analyzed. Considering a time-independent transversal load v, the static equilibrium equation could be obtained from equation (6) as
For p = 1.005 and Snap-through behavior of the bulked beam.
3.3. Dynamic behavior
If a linear viscous damping equivalent to 3.54% of critical damping of the linearized system is assumed, and two actuators are located at L/3 and 2L/3, the equation of motion for the uncontrolled system becomes Free vibration response of the uncontrolled system.

With the weighting matrix in equation (3) chosen to be Free vibration response of the system with (a) the first order controller; (b) the second and third order controllers.
By employing higher-order controllers based on the polynomial strategies proposed by Suhardjo et al. (1992), the following observations were made. For the 2nd order control, the system invariably settles to the first equilibrium point
When subjected to a sudden step load, the system has the possibility to cross the energy barrier, resulting in dynamic buckling even at loads lower than the static critical load. The load needed for this dynamic transition increases with the order of the control; however, the increase becomes marginal beyond third-order control. In the case of an uncontrolled system, the escape load stands at 0.31, which is significantly lower (by 18.4%) than the static critical load of 0.38.
Figure 6 shows the response time history of the beam subject to a step load of infinite duration (v = 0.42) under the regulation of the active controllers of varying orders. At this load level, the 1st order controller proves insufficient in averting snap-through buckling. The beam transitions to a post-buckling state, indicative of the limitations of linear control schemes for systems exhibiting nonlinear behaviors. In contrast, the 2nd and 3rd order controllers are competent in retaining the system within the first potential well, thereby preventing buckling. In the following section, the performance of the DRL-trained controller is presented and compared to the model-based polynomial controllers. Time response of different orders of control under a step load of infinite duration (v = 0.42).
4. Results and discussions
4.1. DRL training
The subsequent section will delve into the performance and characteristics of the controller trained using DRL. Based on the behavior of the system mentioned above, the DRL training is performed by applying a step load (0.48) of infinite duration slightly higher than the escape load of the 3rd order (0.47) controller. The training allows the agents to explore during the interaction with the environment. Each training episode lasts for 25 s. The training methodology employs the TD3 algorithm, featuring a training duration of 800 episodes. The training was conducted using MATLAB on an Intel i7 9700 CPU. From the onset, exploration is initiated by implementing a noise-added action policy, with the exploration rate starting at an initial action noise variance of 0.001 and a decay rate of
The observation includes the system state
During the training process, the DRL-trained controller’s performance was analyzed with an initial system state at The evolution of the cost function 
Notably, although the DRL training is done under a single step load (v = 0.48), the same trained DRL-based controller will be tested under different loading conditions in this section, including step loads with different amplitudes and random excitations.
4.2. Escape point
The DRL-trained controller’s performance is further assessed by monitoring the system’s free vibration response. Results obtained using the same weighting matrix are visualized in Figure 8. The initial displacement is set at Time response of TD3 agent controlled free vibration of the buckled beam.
4.3. Safety region
The system’s displacement responses under the effect of different controllers are computed and compared under the step load v = 0.45 in Figure 9. After about 3 s, the 1st and 2nd order controllers could not keep the beam in the pre-buckling position, while the displacement of the TD3 agent controller remains in the safe region and converges quickly. Increasing the step load greater than the escape load will lead the system to experience dynamic buckling. The variation of escape load v
e
is detailed in Table 1, where Δv
e
is the improvement in comparison with the uncontrolled case. Apparently, the trained agent increased the escape load of the system considerably by 22.58%, compared to the 3rd order polynomial controller. Response time histories of the controlled buckled beam system under a step load. Variation of the escape load.
Figures 10(a) and (b) display the control force history and the force-displacement relationship under the chosen step load. For the 1st and 2nd order controllers, the control force rapidly escalates after buckling, while the DRL-based controller requires slightly more peak force than the 3rd order controller but achieves significantly better performance in terms of the cost function J. The DRL-trained agent tends to generate a larger control force at lower displacement levels, resulting in a smaller resting displacement. This aspect is readily observable from the force-displacement plot in Figure 10(b). The peak and root-mean-square (RMS) values of displacement, velocity, and force for different controllers are compared in Table 2. In terms of system safety and robustness, the DRL-trained controller significantly outperforms the traditional controllers. Its capability to maintain the system within a safe region under varied loading conditions, along with its ability to do so more efficiently (in terms of control force applied and energy used), underscores its suitability for complex nonlinear systems. Response of the controlled buckled beam under step load v = 0.45 (a) Control force history; (b) Control force-displacement relationship. Responses and control forces of the buckled beam system under step load v = 0.45 with different controllers.
4.4. Performance cost
An important measure of the controller’s performance is the performance cost Performance cost 
4.5. Random excitation
To evaluate the efficacy of the controller trained through DRL under uncertain conditions, an imposed random excitation lasting 1000 s was applied to the system. The displacement response under random white-noise excitation, as illustrated in Figure 12, makes it evident that neither the linear controller nor the 2nd order controller could prevent the beam from experiencing snap-through buckling. In contrast, both the 3rd order polynomial controller and the TD3 agent effectively mitigated the risk of buckling under the applied level of random excitations. This performance attests to a notable robustness in both controllers when dealing with unpredictable external influences. Time history of random excitation.
Evaluation criteria for different controllers under random excitation.
Increasing the order of a polynomial controller can indeed enhance control performance, but this improvement is associated with significantly escalated computational demands as the controller’s complexity rises. In summary, the proposed DRL control strategy outperforms traditional model-based control methods in terms of adaptability and efficiency. Since DRL training can balance the trade-off between control performance and operation costs, the proposed control strategy shows excellent flexibility. For instance, a higher escape load and broader safety margin can be achieved by setting a larger step load into the training process. In addition, the DRL method learns from the data directly without requiring any model information. This model-free approach can avoid the problems associated with system uncertainties and model inaccuracies, leading to robust control performance and great adaptability in broad vibration scenarios.
5. Conclusions
This study proposed an innovative control strategy to realize model-free nonlinear active vibration control by using DRL, which was proved more flexible and adaptable than the traditional model-based polynomial control strategies.
An NN controller was trained by the TD3 algorithm to realize optimal control performance. The tests were conducted on a simply supported buckled beam in various vibration scenarios. Superior performances of the TD3-trained NN controller were demonstrated under both static loadings and random excitations. Compared with the system controlled by the 3rd order polynomial controller, the DRL-controlled system improved the safety margin and the escaped load was increased by nearly 20%. Moreover, such peformance improvement did not increase the control force correspondingly, making it more energy efficient than the conventional controllers. Although being trained under step loads, the NN controller could reduce displacement and control force more effectively than the 3rd order polynomial controller under random excitations. This reveals the robustness and adaptability of the proposed method.
Even though DRL-based methods can offer robust and scalable solutions for nonlinear vibration control, real-world applications involve more complex characteristics, including uncertainty, noise, and operational variability of the control target. How to train agents to adapt practical vibration scenarios effectively remains a question. Additional training and restructuring of NN is required in future research to enhance the robustness of the controller and solve problems of uncertainties.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Research Grants Council of Hong Kong (Project Nos. CRS_PolyU503/23, 15214620, PolyU R5006-23), and the Hong Kong Branch of the National Rail Transit Electrification and Automation Engineering Technology Research Center (K-BBY1).
