Semiactive suspension control using magnetorheological dampers via proximal policy optimization

Abstract

With the increasing demand for ride comfort, safety, and performance in modern vehicles, semiactive suspension systems based on magnetorheological dampers (MRDs) have attracted significant attention because of their fast responses, low energy consumption levels, and controllable damping forces. However, the traditional control strategies often rely on precise mathematical models, making it difficult to handle the inherent nonlinearities and parametric uncertainties of such systems. To address this challenge, a reinforcement learning-based semiactive control framework using proximal policy optimization (PPO) is proposed in this study; the framework enables autonomous control policy optimization through continuous interactions with the environment. First, an MRD model is established based on a phenomenological model with an intelligent optimization algorithm. This model is integrated into a PPO control framework, allowing the agent to generate control currents through environmental interactions, thereby adjusting the output damping force of the damper. The proposed method does not require an explicit system model during training and acquires the optimal control policy through trial and error. The simulation results obtained under stochastic road excitations demonstrate that the designed PPO controller significantly outperforms both the passive and Skyhook controllers in terms of vibration reduction. Moreover, even under 20% parametric uncertainty, the retrained controller maintains good stability and control performance, indicating that the randomized training enhances the generalization ability and adaptability of the learned policy to system parameter variations, thereby providing the potential to develop a truly robust control strategy in future work. This study highlights the potential of reinforcement learning (RL) for use in semiactive suspension control tasks with MRDs and offers a scalable control strategy for complex nonlinear systems.

Keywords

proximal policy optimization reinforcement learning magnetorheological damper semiactive suspension parametric uncertainty

Introduction

With the continuous development of the automotive industry, increasing demands have been placed on the performance, safety, and ride comfort of vehicles, imposing stricter standards on the dynamic performance of suspension systems. Semiactive control technology, which was first introduced by Crosby and Karnopp (1973), effectively bridges the gap between passive and active suspension systems by offering an adjustable damping method with a low energy consumption level (Caponetto et al., 2003). This topic has become a significant research focus in the field of vehicle suspension control. As some of the most representative actuators in semiactive systems, magnetorheological dampers (MRDs) are widely used because of their rapid responses and broad dynamic ranges. However, MRDs inherently exhibit strong nonlinearities and hysteresis behaviors, making it essential to develop accurate models that can capture their dynamic characteristics as a foundation for achieving high-performance suspension control (McLaughlin et al., 2014). In addition, the design of control strategies based on MRDs represents a key approach for improving the performance of suspension systems.

To accurately describe the nonlinear and hysteretic behaviors of MRDs, researchers have commonly adopted parameterized modeling approaches, constructing mathematical models by combining spring–damper elements with hysteresis operators (Saufi et al., 2025). Yan et al. (2021) utilized the Dahl model to characterize hysteresis behaviors, whereas Hu et al. (2022) developed an MRD model on the basis of the Bouc–Wen model. Jiang et al. (2023) proposed a phenomenological model that accounts for insufficient fluid flows, demonstrating superior accuracy to that of the prior methods in terms of capturing dynamic characteristics. The establishment of MRD models provides essential support for designing control strategies; however, the overall performance of suspension systems depends more critically on the effectiveness of the utilized control methods. A variety of semiactive control strategies, including skyhook control, the linear quadratic regulator (LQR), H∞/H₂ control, and fuzzy radial basis function-based sliding mode control (FRBF-SMC) (Ding et al., 2022; Han et al., 2024; Li et al., 2024; Liu et al., 2019; Margolis et al., 1975; Savaresi and Spelta, 2007), have been proposed. Nevertheless, the classic control strategies lack precision and adaptability in complex dynamic environments, whereas modern control strategies, although more capable of performing optimization, depend heavily on accurate models and remain limited in terms of robustness.

In recent years, reinforcement learning (RL) has been increasingly applied to semiactive suspension control owing to its independence from precise system models and its capability for autonomous policy learning (Li et al., 2019; Wang et al., 2019; Zhao et al., 2019). Beyond these early studies, more recent works have evaluated RL-based suspension controllers under more realistic settings. For example, Lee et al. (2022) modeled the MR damper actuator as a first-order system, thereby partially accounting for actuator dynamics, and demonstrated that their RL controller outperformed the conventional SH–ADD method in terms of ride comfort. Yong et al. (2023) proposed a switching SAC-based control framework and validated its effectiveness under real-road conditions, further confirming the practical feasibility of RL in automotive applications. Ultsch et al. (2024) incorporated actuator dynamics and conducted systematic real-vehicle experiments, providing more direct evidence of the applicability of RL methods to vertical vehicle dynamics control. In addition, DDPG-based approaches (Liu et al., 2020; Tan et al., 2023) and PPO-based strategies (Han and Liang, 2022; Kim et al., 2023) have also been explored for quarter-vehicle suspension vibration control.

Despite the progress achieved by these studies, most existing RL-based suspension controllers are trained exclusively on nominal-parameter models and fail to adequately account for the parameter uncertainties inherently present in real vehicles, which may degrade policy performance under varying operating conditions. Moreover, the nonlinear current–force characteristics of MRDs are seldom explicitly incorporated into the training environment, potentially limiting the deploy ability of the learned policy on real actuators.

To address these limitations, this study develops a physically consistent RL training framework in which a nonlinear MRD model—constructed through experimental data and parameter identification—is integrated into the RL environment, ensuring that the agent’s output current corresponds to a realizable damping force. During the training stage, random perturbations are applied to key suspension parameters, including sprung mass, suspension stiffness, and tire stiffness, thereby constructing a training environment that reflects system parameter variations and enables the learned policy to achieve better generalization ability and potential robustness under different physical operating conditions. As a representative on-policy deep RL method, PPO offers desirable training stability and implementation simplicity through its clipped surrogate objective and relatively stable policy update mechanism (Han and Liang, 2022; Schulman et al., 2017), making it well suited for the semiactive suspension control scenario considered in this study. Accordingly, PPO is adopted as the learning framework in this work, and two agents are developed: Agent A, trained under nominal parameters, and Agent B, trained under parameter-randomized conditions. Simulation results show that, compared with passive control and Skyhook control, the proposed method significantly enhances vibration-reduction performance, and Agent B maintains stable control behavior under parameter variations, demonstrating stronger policy generalization ability and adaptability to system parameter changes, thereby exhibiting greater potential robustness under real operating conditions.

Modeling

Modeling of a magnetorheological damper

In this study, a laboratory-developed MRD is used as the primary research object. As shown in Figure 1, mechanical characteristic tests are conducted to obtain damping force data under various displacement excitations and input currents.

Figure 1.

Photograph of the magnetorheological damper.

The phenomenological model, derived from the Bouc–Wen model, is improved by incorporating serially connected damping and stiffness components (Gao et al., 2023). In the context of suspension vibration reduction, the primary focus is on the operating conditions that are characterized by low-frequency and large-displacement excitations, under which the influence of the stiffness term becomes negligible. To simplify the subsequent model training process, building upon the work of Bai et al. (2015), the original phenomenological model is refined in this study by omitting the stiffness component to simplify the dynamic representation, resulting in a structure that is more suitable for control design. The modified model structure is illustrated in Figure 2.

Figure 2.

Modified phenomenological model.

Accordingly, the expression for the damping force is given by equation (1), where the identifiable model parameters are simplified to [ $c_{0}$ , $c_{1}$ , $α$ , $γ$ , $β$ , $A$ , $n$ ], thereby reducing their total number from ten (Bai et al., 2015) to seven.

\begin{array}{l} F = c_{1} \dot{y} \\ \dot{z} = - γ | \dot{x} - \dot{y} | {| z |}^{n - 1} z - β (\dot{x} - \dot{y}) {| z |}^{n} + A (\dot{x} - \dot{y}) \\ \dot{y} = \frac{1}{c_{0} + c_{1}} [α z + c_{0} \dot{x}] \end{array}

(1)

where

x

is the piston displacement,

y

is the internal displacement,

c_{0}

is the damping coefficient,

c_{1}

is the high-speed damping coefficient,

z

denotes the hysteresis displacement,

γ

controls the width of the hysteresis loop,

n

is the smoothness factor,

β

is the loop height parameter,

A

is a proportional coefficient, and

α

is the hysteresis force scaling factor.

To identify the unknown parameters in the modified phenomenological model, the Manta ray foraging optimization (MRFO) algorithm is employed in this study to fit the model-calculated damping force to the experimental data (Omotoso et al., 2022). The MRFO algorithm simulates three typical foraging behaviors exhibited by manta rays: chain foraging, spiral foraging, and somersault foraging. It demonstrates strong global search capabilities and effective local convergence performance (Abdullahi et al., 2023).

Among these parameters, $c_{0}$ , $c_{1}$ and $α$ primarily affect the amplitude of the damping force, whereas $n$ , $γ$ , $β$ and $A$ determine the geometric characteristics of the hysteresis loop. To ensure both strong modeling accuracy and high optimization efficiency, $n$ , $γ$ , $β$ and $A$ are set as constants in this study, and it is assumed that $c_{0}$ , $c_{1}$ and $α$ vary linearly with the current, as described in equation (2). After completing this treatment, the total number of parameters to be optimized is reduced, which is denoted as $[c_{0_{a}}, c_{0_{b}}, c_{1_{a}}, c_{1_{b}}, α_{a}, α_{b}, γ, β, A, n]$ .

\begin{array}{l} α = α (I) = α_{a} + α_{b} I \\ c_{0} = c_{0} (I) = c_{0 a} + c_{0 b} I \\ c_{1} = c_{1} (I) = c_{1 a} + c_{1 b} I \end{array}

(2)

where

I

is the input current.

The root mean square (RMS) value of the difference between the model-calculated damping force and the experimentally measured damping force is adopted as the fitness function

f_{o b j}

for the MRFO algorithm. The associated expression is given by equation (3).

f_{obj} = {[\frac{1}{N} \sum_{i = 1}^{N} {(F_{i}^{\mod e l} - {F_{i}}^{\exp})}^{2}]}^{0.5}

(3)

In this study, four sets of experimental data with displacement amplitudes of 20 mm and excitation frequencies of 0.5 Hz, 1 Hz, 1.5 Hz, and 2 Hz are selected for the identification of the ten-dimensional model parameters. For the MRFO algorithm, the population size is set to 200, and the search space dimensionality is 10, corresponding to the number of parameters to be identified. The maximum number of iterations is set to 5000, and the optimization process terminates either when the maximum number of iterations is reached or when no improvement is observed in the position of the best individual over 200 consecutive iterations. The identified model parameters are summarized in Table 1.

Table 1.

Parameter identification results obtained for the phenomenological model.

Parameter	Value	Parameter	Value
$c_{0 a}$	0.0130	$α_{b}$	0.1034
$c_{0 b}$	0.00496	$γ$	0.0355
$c_{1 a}$	0.0534	$β$	0.1075
$c_{1 b}$	0.19953	$A$	28.01
$α_{a}$	0.03912	$n$	1.392

On the basis of the identified parameters, an MRD model is constructed according to the modified phenomenological model. A comparison between the model-predicted damping force and the experimentally measured force is shown in Figure 3. The results indicate that the model can accurately capture the dynamic responses of the damper under varying displacement amplitudes and velocity conditions. The generated hysteresis loops closely match the experimental data, exhibiting good agreement across the entire operating range. The computed fitness function value is 0.241 kN, accounting for approximately 4% of the maximum measured damping force, which confirms the effectiveness and accuracy of the model in terms of representing the mechanical behavior. This validated MRD model serves as a reliable foundation for constructing the training environment of the proposed RL-based control system.

Figure 3.

Comparison between the model-produced and experimental results.

Quarter-car suspension modeling

In the absence of coupling effects between wheels, the vertical vibration responses induced by road excitation can be analyzed using a quarter-car suspension model. As shown in Figure 4, this model primarily consists of a sprung mass, a suspension spring, a controllable damping element, an unsprung mass, and a tire stiffness element (Verros et al., 2005).

Figure 4.

Two-degree-of-freedom semiactive suspension model with MRDs.

The dynamic equations of the system can be derived from Newton’s second law, as expressed in equation (4).

{\begin{cases} m_{b} {\ddot{x}}_{b} + c_{s} ({\dot{x}}_{b} - {\dot{x}}_{w}) + k_{s} (x_{b} - x_{w}) - F = 0 \\ m_{w} {\ddot{x}}_{w} - c_{s} ({\dot{x}}_{b} - {\dot{x}}_{w}) - k_{s} (x_{b} - x_{w}) + F + k_{t} (x_{w} - x_{g}) = 0 \end{cases}

(4)

where

m_{w}

and

m_{b}

are the unsprung mass and sprung mass, respectively;

x_{w}

and

x_{b}

are the displacements of the unsprung and sprung masses, respectively;

k_{s}

and

k_{t}

denote the suspension stiffness and tire stiffness, respectively;

c_{s}

is the fixed damping coefficient;

F

is the damping force generated by the magnetorheological damper; and

x_{g}

represents the road excitation.

Road excitation modeling

To represent the road excitation term in equation (4), a simple and efficient time-domain modeling method based on filtered Gaussian white noise is adopted in this study. According to ISO 8608 (Múčka, 2018), the statistical properties of road surface roughness can be characterized by a spatial power spectral density (PSD) function, as given in equation (5).

G_{q} (n) = G_{q} (n_{0}) {(\frac{n}{n_{0}})}^{- ω} (n_{\min} < n < n_{\max})

(5)

where

n

is the spatial frequency,

n_{0}

is the reference spatial frequency, and

G_{q} (n_{0})

is the road PSD value at the reference spatial frequency.

Based on the displacement power spectral densities of various road surfaces as defined in ISO 8608, road surface roughness is typically categorized into eight grades ranging from Class A to Class H. The geometric mean values of the road surface roughness coefficients corresponding to each class are listed in Table 2.

Table 2.

Classification standards for eight levels of road surface roughness.

Road level	$\begin{array}{l} G_{q} (n_{0}) / 1 0^{- 6} m^{3} \\ n_{0} = 0.1 m^{- 1} \end{array}$	$\begin{array}{l} G_{q} / 1 0^{- 3} m \\ 0.011 m^{- 1} < n < 2.83 m^{- 1} \end{array}$
Road level	Geometric mean	Geometric mean
A	16	3.81
B	64	7.61
C	256	51.23
D	1024	30.45
E	4096	60.90
F	16384	121.80
G	65536	243.61
H	262144	487.22

Standard Gaussian white noise is filtered to generate the road excitation model, whose time-domain expression is shown in equation (6).

{\dot{x}}_{g} (t) = 2 π n_{0} \sqrt{G_{q} (n_{0}) v} W (t) - 2 π f_{\min} x_{g} (t)

(6)

where

v

is the vehicle speed,

f_{\min}

is the cut-off frequency,

x_{g} (t)

is the road displacement input, and

W (t)

is the white noise sequence.

The model is developed in accordance with the ISO 8608 standard and serves as a time-domain road excitation model for control algorithm simulation and analysis.

PPO-based control design

Principle of PPO

PPO introduces a clipped surrogate objective to constrain the update step between the current and previous policies, effectively mitigating the risk of excessively large policy deviations during the optimization scheme. This mechanism enhances the stability and sample efficiency of the training process. As a policy gradient method, PPO seeks to improve the utilized policy by maximizing the expected cumulative return, thereby enabling continuous refinement of action selection behaviors throughout the training cycle. In this study, the standard PPO framework (Schulman et al., 2017) is employed. Its algorithmic architecture is illustrated in Figure 5, and its implementation process is summarized as follows.

Figure 5.

Overview of the architecture of the PPO algorithm.

The agent interacts with the environment over multiple episodes under the current policy, collecting trajectories consisting of states $s_{t}$ , actions $a_{t}$ , immediate rewards $r_{t}$ , and action probabilities $π_{θ} (a_{t} | s_{t})$ . These trajectories are then fed into both actor (policy) and critic (value) networks. In the value network, the advantage estimates are computed using the generalized advantage estimation (GAE) method, as defined in equation (7). Concurrently, the value function is updated by minimizing the loss value, whose formulation is given in equation (8).

A_{t} = r_{t} + γ V_{ϕ} (s_{t + 1}) - V_{ϕ} (s_{t}) + (γ λ) A_{t + 1}

(7)

L^{V} (ϕ) = E_{t} [{(V_{ϕ} (s_{t}) - R_{t})}^{2}]

(8)

where $A_{t}$ denotes the advantage function used to evaluate the relative quality of an action, $V_{ϕ} (s_{t})$ denotes the state value function at time step t, $γ$ is the discount factor, $λ$ is the GAE hyperparameter that is used to balance bias and variance, $r_{t}$ is the reward at time step t, and $R_{t}$ represents the cumulative reward.

In the policy network, to constrain the magnitudes of policy updates, PPO introduces a clipping mechanism for constructing a surrogate objective function, as shown in equation (9). The probability ratio between the new and old policies for taking the same action under the same state is defined in equation (10).

L^{C L I P} (θ) = E_{t} [\min (ρ_{t} (θ) A_{t}, c l i p (ρ_{t} (θ), 1 - ε, 1 + ε) A_{t})]

(9)

ρ_{t} (θ) = \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{old}} (a_{t} | s_{t})}

(10)

where

A_{t}

is the advantage function,

ε

is the clipping coefficient, and

π_{θ} (a_{t} | s_{t})

represents the action probability under the updated policy,

π_{θ old} (a_{t} | s_{t})

corresponds to that under the previous policy.

In addition, to encourage exploration and prevent premature convergence, an entropy regularization term is incorporated into the PPO objective function. Its formulation is given in equation (11).

L^{ENT} (θ) = E_{t} [H (π_{θ} (a_{t} | s_{t}))]

(11)

where

H (π_{θ} (a_{t} | s_{t}))

denotes the entropy of the policy.

By combining all the components described above, the final objective function is formulated as shown in equation (12). This objective is maximized through iterative optimization, enabling the policy of the agent to gradually converge toward the optimum. As a result, the training process converges with improved long-term cumulative rewards.

L (θ, ϕ) = L^{CLIP} (θ) - c_{1} L^{V} (ϕ) + c_{2} L^{ENT} (θ)

(12)

where

c_{1}

and

c_{2}

are weighting coefficients that are used to balance the contributions of the value function regression and the entropy regularization processes in the total loss.

PPO-based semiactive control using MRDs

In the RL-based semiactive control scheme of the MRD suspension system, the PPO agent functions as the semiactive controller, whereas its interactive environment is composed of a suspension system integrated with an MRD model. To accurately capture the vertical dynamic characteristics of the vehicle, the vertical acceleration of the body, vertical velocity of the body, vertical velocity of the wheel, and suspension deflection effect are selected as the state inputs. The formulation of the scheme is given in equation (13).

s = {{\ddot{x}}_{b}, {\dot{x}}_{b}, {\dot{x}}_{w}, x_{b} - x_{w}}

(13)

The action space is defined as the control current applied to the MRD, with discrete values ranging from 0 to 3 A. In the PPO algorithm, the agent optimizes the control policy by maximizing the cumulative reward, where the instantaneous reward $r_{t}$ in equation (7) quantifies the system performance at each time step. To reflect the control objectives of the suspension system, this study adopts a multi-objective optimization framework inspired by previous works on semi-active suspension control (Han and Liang, 2022; Lee D, Jin and Lee C, 2022). The reward function is formulated based on critical performance indices namely, body acceleration, suspension deflection, and tire deflection (Hrovat, 1997) as given in equation (14).

r = C - K_{1} {\ddot{x}}_{b}^{2} - K_{2} {(x_{b} - x_{w})}^{2} - K_{3} {(x_{w} - x_{g})}^{2}

(14)

where

C

is the reward scaling factor and

K_{1}

K_{2}

, and

K_{3}

are the weighting coefficients corresponding to the three performance metrics.

This design comprehensively accounts for ride comfort, suspension safety, and tire–road contact performance (International Organization for Standardization,1997), enabling the PPO agent to achieve balanced control among multiple conflicting objectives.

The PPO-based control framework for the MRD-based semiactive suspension system is illustrated in Figure 6. The agent receives state observations from the suspension system, which, on the basis of the current policy, generates a control current that is applied to the MRD to regulate its output damping force. This force, in turn, acts on the suspension system, updating its dynamic state. The value network evaluates the received reward on the basis of the current state and action, providing feedback for assessing the quality of the chosen action and guiding the continuous policy improvement procedure.

Figure 6.

Semiactive vibration control framework for an MRD suspension system based on PPO.

Through repeated interactions with the environment, the agent maximizes the cumulative reward instead of minimizing a single performance metric, thereby autonomously learning to balance multiple performance objectives. This process enables coordinated optimization across competing objectives, and as rewards accumulate, the agent progressively converges toward an optimal control policy, thereby achieving efficient suspension system control.

On the basis of the above principles, the soft-code implementation of the PPO-based semiactive control algorithm for the MRD suspension system is summarized as follows.

Training procedure

The RL training environment for the MRD-based semiactive suspension control system is developed on the MATLAB/Simulink platform. The parameters of the suspension system model used in this study are listed in Table 3.

Table 3.

Parameters of the suspension system model.

Parameter	Value	Units
$m_{b}$	287	$Kg$
$m_{w}$	40	$Kg$
$k_{s}$	15800	$N / m$
$k_{t}$	158000	$N / m$

The PPO agent consists of two neural network structures, namely, a policy network and a value network (Schulman et al., 2017). The policy network receives a 4-dimensional state vector from the environment as input. It comprises three fully connected layers (ActorFC1, ActorFC2, and ActorFC3), each consisting of 64 neurons activated by the ReLU function. The final output layer (ActorFC4) adopts a Softmax activation to generate the probability distribution over discrete actions. The critic network shares a similar architecture: its input layer also processes a 4-dimensional state vector, followed by three fully connected layers (CriticFC1, CriticFC2, and CriticFC3) with 64 ReLU-activated neurons each. The output layer provides the estimated state value for the current state.

The training parameters are configured as follows: the learning rates for both the value network and the policy network are set to 0.0005, and the adaptive moment estimation (Adam) optimizer is used for both networks. The gradient clipping threshold is set to 1. In the reward function, the parameters are configured as follows: the constant term C = 5, and the weighting coefficients are K1 = 1, K2 = 0.5, and K3 = 0.5. In the PPO algorithm, the clipping factor is set to 0.2, the discount factor is set to 0.99, and the GAE parameter is set to 0.95. The minibatch size per training epoch is set to 256, and the maximum number of training episodes is 5000.

The PPO agent is trained using the framework described above, and the resulting training curve is shown in Figure 7. As illustrated in the figure, the agent begins to converge after approximately 1000 episodes, indicating that it has effectively learned a stable control policy.

Figure 7.

Convergence curve produced by the PPO agent during training.

Simulations

Simulation implemented under random road excitations

To verify the effectiveness of the proposed PPO-based control strategy, simulations were conducted using the trained agent under different road conditions, and the results were compared with those of passive control and Skyhook control. The damping coefficient of the passive suspension is set to $C_{p} = 1500 N \cdot s / m$ , referencing the equivalent damping parameters of a production vehicle series. This value falls within the typical passenger car damping range (1000–2000 N·s/m) reported in previous studies (Han and Liang, 2022; Savaresi et al., 2011) and thus serves as an ideal benchmark for passive suspension modeling. The Skyhook control strategy follows the standard control law widely adopted in semiactive suspension research (Abebaw et al., 2021), as shown in equation (15).

F_{s k y} = {\begin{cases} C_{\max} (\dot{x_{b}} - \dot{x_{w}}), \dot{x_{b}} (\dot{x_{b}} - \dot{x_{w}}) \geq 0 \\ C_{\min} (\dot{x_{b}} - \dot{x_{w}}), \dot{x_{b}} (\dot{x_{b}} - \dot{x_{w}}) < 0 \end{cases}

(15)

where

C_{\max}

and

C_{\min}

denote the maximum and minimum damping coefficients, respectively.

Based on the MRD model developed in this study, the maximum and minimum damping coefficients are set to $C_{\max} = 3021 N \cdot s / m$ and $C_{\min} = 175.5 N \cdot s / m$ , respectively. This setup is consistent with the standard Skyhook control logic and accurately captures the intrinsic damping characteristics, thereby offering a fair and engineering-relevant benchmark for PPO-based control evaluation.

The trained PPO agent was utilized as the suspension controller, and simulations were conducted at a constant vehicle speed of 20 m/s. Considering that grade-A road excitations are minimal and yield negligible response differences among control strategies, randomly generated grade-B, grade-C, and grade-D road profiles were used as excitations. Table 4 presents the RMS values of the body acceleration, suspension deflection, and dynamic tire load measures under the three road roughness levels. Table 5 summarizes the percentage improvements achieved by the Skyhook and PPO controllers over passive control across all performance indicators.

Table 4.

RMS values of three performance metrics produced under different road roughness levels.

Road level	Body acceleration (m/s^2)			Suspension deflection (m)			Tire dynamic load (N)
Road level	Passive	Skyhook	PPO	Passive	Skyhook	PPO	Passive	Skyhook	PPO
B	0.361	0.282	0.198	0.017	0.019	0.013	160	162	100
C	0.463	0.373	0.269	0.043	0.039	0.031	318	332	205
D	1.043	0.887	0.523	0.069	0.074	0.047	638	639	438

Table 5.

Performance improvements (%) achieved by the Skyhook and PPO controllers over passive control.

Road level	Body acceleration		Suspension deflection		Tire dynamic load
Road level	Skyhook (%)	PPO (%)	Skyhook (%)	PPO (%)	Skyhook (%)	PPO (%)
B	21.9	45.1	−11.7	23.5	−2.1	37.2
C	19.4	41.9	9.3	27.9	−4.5	35.4
D	14.9	49.8	−7.2	31.8	−0.02	31.3

The results demonstrate that the PPO-based control strategy consistently outperforms both the passive and Skyhook control schemes in terms of vibration attenuation under various road conditions.

The body acceleration, suspension deflection, and dynamic tire load comparison results of produced under C-class road excitations are shown in Figure 8. The PPO controller outperformed both the passive and Skyhook control schemes in terms of all performance metrics, demonstrating significantly enhanced vibration attenuation.

Figure 8.

Comparison among the body acceleration, suspension deflection, and dynamic tire load results produced under C-class road excitations.

In addition, Figure 9 compares the control forces of the passive, Skyhook, and PPO control strategies under C-class road excitations, while Figure 10 depicts the variations in control current outputs of the Skyhook and PPO controllers under the same conditions.

Figure 9.

Comparison of control forces for the three control strategies under C-class road excitations.

Figure 10.

Comparison of control current responses between the Skyhook and PPO control strategies.

Simulation implemented under system uncertainty

To simulate real-world system parameter variations while ensuring the feasibility of training across a wide range of operating conditions, a 20% parameter uncertainty rate was introduced during the training process. Specifically, in each training episode, key suspension parameters—including sprung mass, unsprung mass, suspension stiffness, and tire stiffness—were randomly perturbed within ± 20% of their nominal values, resulting in the training of PPOAgentB. This agent is compared with PPOAgentA, which was trained under nominal parameters.

Figure 11 compares the body acceleration, suspension deflection, and dynamic tire load responses of the two agents under C-class random road excitations. The RMS values of these three performance indices under different road classes are summarized in Table 6. It should be noted that all results in Figure 11 and Table 6 are obtained using the nominal-parameter plant model to ensure consistent testing conditions across the control strategies. Although PPOAgentB was trained with ± 20% parameter randomization, its performance under nominal conditions remains very close to that of PPOAgentA. This suggests that the randomized training prevents the learned policy from becoming specialized to a single nominal parameter set and instead improves its generalization and adaptability to parameter variations. While PPOAgentB shows a slight compromise in fine-tuning optimal actions, it exhibits stronger adaptability to system deviations that may arise in real operation, thereby demonstrating improved potential for practical engineering applications.

Figure 11.

Comparison of body acceleration, suspension deflection, and dynamic tire load under C-class road excitations with parameter uncertainties considered.

Table 6.

RMS values of three performance metrics for different control strategies under various road roughness levels.

Road level	Body acceleration (m/s^2)				Suspension deflection (m)				Tire dynamic load (N)
Road level	Passive	Skyhook	AgentA	AgentB	Passive	Skyhook	AgentA	AgentB	Passive	Skyhook	AgentA	AgentB
B	0.361	0.282	0.198	0.145	0.017	0.019	0.013	0.012	160	162	100	105
C	0.463	0.373	0.269	0.303	0.043	0.039	0.031	0.029	318	332	205	211
D	1.043	0.887	0.523	0.524	0.069	0.074	0.047	0.046	638	639	438	425

Conclusion

A semiactive suspension control strategy based on the PPO algorithm for MRDs was proposed in this study. An RL training environment was constructed by developing an MRD model and a quarter-car suspension model, within which a PPO control framework composed of policy and value networks was established. During the training process, the control action was defined as a discrete current ranging from 0 to 3 A. The agent selected an appropriate current according to the learned policy, which was then applied to the MRD to adjust its damping force and achieve vibration attenuation. This method avoids the need for an inverse MRD model and instead directly integrates the MRD model into the control system, significantly simplifying the overall structure.

Simulation results under B-, C-, and D-class random road excitations indicate that the proposed PPO controller, compared with passive and Skyhook control strategies under optimal operating conditions, achieves superior performance in body acceleration, suspension deflection, and dynamic tire load. Specifically, under D-class road excitations, the PPO controller reduced the RMS value of body acceleration by 49.8% compared with that of passive control and by 22.4% compared with that of Skyhook control. Furthermore, by introducing 20% system parameter uncertainty during training, the trained agent maintained strong control performance, demonstrating its potential robustness and generalizability.

This study implements the standard PPO algorithm within a physically constrained training environment incorporating an MRD model and parameter uncertainties. The proposed framework enables the agent to learn realistic semiactive suspension control strategies, demonstrating the potential of RL for complex nonlinear suspension systems. Although robust control can effectively handle system uncertainties, its performance relies on accurate modeling, which is difficult to ensure in practical applications. The PPO controller proposed in this study exhibits comparable potential robustness without requiring an explicit model. Future work will integrate robust control concepts with RL and focus on controller compression and hardware deployment to advance the practical implementation of RL-based semiactive suspension control.

Footnotes

ORCID iDs

Xinglong Jia

Longlei Dong

Jianguo Ma

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by application Research and Industrialization of High-Reliability Vibration Dampers Based on Magnetorheological Fluid Smart Sensing Materials, China (2019JZZY020215HZ).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Abdullahi

HassanAbdullahi

IHMD

Abdullahi

, et al. (2023) Manta ray foraging optimization algorithm: modifications and applications. IEEE Access 11: 53315–53343.

Abebaw

You

W-H

Lee

, et al. (2021) Semi-active control of a nonlinear quarter-car model of hyperloop capsule vehicle with skyhook and mixed skyhook–acceleration driven damper controller. Advances in Mechanical Engineering 13: 168781402199952.

Bai

Chen

Qian

(2015) Principle and validation of modified hysteretic models for magnetorheological dampers. Smart Materials and Structures 24: 085014.

Caponetto

Diamante

Fargione

, et al. (2003) A soft computing approach to fuzzy sky-hook control of semi-active suspension. IEEE Transactions on Control Systems Technology 11(6): 786–798.

Crosby

Karnopp

(1973) The active damper: a new concept for shock and vibration control. Shock and Vibration Bulletin 43: 119–133.

Ding

Wang

Meng

, et al. (2022) Research on time-delay-dependent H∞/H2 optimal control of magnetorheological semi-active suspension with response delay. Journal of Vibration and Control 29: 1447–1458.

Gao

Wong

P-K

Zhao

, et al. (2023) Design of compensatory backstepping controller for nonlinear magnetorheological dampers. Applied Mathematical Modelling 114: 318–337.

Han

S-Y

Liang

(2022) Reinforcement-learning-based vibration control for a vehicle semi-active suspension system via the PPO approach. Applied Sciences 12: 3078.

Han

Zeng

, et al. (2024) Vehicle attitude control of magnetorheological semi-active suspension based on multi-objective intelligent optimization algorithm. Actuators 13(12): 466.

10.

Hrovat

(1997) Survey of advanced suspension developments and related optimal control applications. Automatica 33(10): 1781–1817.

11.

Gou

(2022) Optimization of neural network inverse model of magnetorheological damper. Journal of Chongqing Jiaotong University (Natural Science) 41(6): 140–146.

12.

International Organization for Standardization (1997) Mechanical Vibration and Shock – Evaluation of Human Exposure to whole-body Vibration – Part 1: General Requirements (ISO 2631-1:1997). ISO. (Confirmed 2010).

13.

Jiang

Rui

Wei

, et al. (2023) A phenomenological model of magnetorheological damper considering fluid deficiency. Journal of Sound and Vibration 562: 117851.

14.

Kim

Shin

, et al. (2023) Deep reinforcement learning for semi-active suspension: a feasibility study In: 2023 International Conference on Electronics, Information, and Communication (ICEIC), 5–8 February 2023, Singapore, pp.1–5. DOI: 10.1109/ICEIC57457.2023.10049850.

15.

Lee

Jin

Lee

(2022) Deep reinforcement learning of semi-active suspension controller for vehicle ride comfort. IEEE Transactions on Vehicular Technology 72(1): 327–339.

16.

Chu

Kalabić

(2019) Dynamics-enabled safe deep reinforcement learning: case study on active suspension control In: 2019 IEEE Conference on Control Technology and Applications (CCTA), Hong Kong, China, 19–21 August 2019, 585–591. DOI: 10.1109/CCTA.2019.8920696.

17.

Gan

Liu

, et al. (2024) Performance analysis of vehicle magnetorheological semi-active air suspension based on S-QFSMC control. Frontiers in Materials 11: 1358319.

18.

Liu

Chen

Yang

, et al. (2019) General theory of skyhook control and its application to semi-active suspension control strategy design. IEEE Access 7: 101552–101560.

19.

Liu

Ren

, et al. (2020) Semi-active suspension control based on deep reinforcement learning. IEEE Access 8: 9978–9986.

20.

Margolis

Tylee

Hrovat

(1975) Heave mode dynamics of a tracked air cushion vehicle with semiactive airbag secondary suspension. Journal of Dynamic Systems, Measurement, and Control 97(4): 399–407.

21.

McLaughlin

Wereley

(2014) Advanced magnetorheological damper with a spiral channel bypass valve. Journal of Applied Physics 115(17): 17B532.

22.

Múčka

(2018) Simulated road profiles according to ISO 8608 in vibration analysis. Journal of Testing and Evaluation 46: 20160265.

23.

Omotoso

Al-Shamma’a

Farh

HMH

, et al. (2022) Parameter extraction of solar photovoltaic modules using manta ray foraging optimization (MRFO) algorithm IEEE 16th International Conference on Compatibility, Power Electronics, and Power Engineering (CPE-POWERENG), 29 June–1 July 2022, Birmingham, UK, pp. 1–6. DOI: 10.1109/CPE-POWERENG54966.2022.9880899.

24.

Saufi

Talib

Md Zain

(2025) The application of machine learning and optimisation algorithm for magnetorheological damper dynamics behaviour: a review. Journal of Vibration Engineering and Technologies 13: 231.

25.

Savaresi

Spelta

(2007) Mixed sky-hook and ADD: approaching the filtering limits of a semi-active suspension. Journal of Dynamic Systems, Measurement, and Control 129(4): 382–392.

26.

Savaresi

Poussot-Vassal

Spelta

, et al. (2011) Semi-Active Suspension Control Design for Vehicles. Butterworth-Heinemann (Elsevier). ISBN 978-0-08-096678-6. DOI: 10.1016/C2009-0-63839-3.

27.

Schulman

Wolski

Dhariwal

, et al. (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. DOI: 10.48550/arXiv.1707.06347.

28.

Tan

Wen

Pan

, et al. (2023) Control of a nonlinear active suspension system based on deep reinforcement learning and expert demonstrations. Proceedings of the Institution of Mechanical Engineers - Part D: Journal of Automobile Engineering 238(13): 4093–4113.

29.

Ultsch

Pfeiffer

Ruggaber

, et al. (2024) Reinforcement learning for semi-active vertical dynamics control with real-world tests. Applied Sciences 14(16): 7066.

30.

Verros

Natsiavas

Papadimitriou

(2005) Design optimization of quarter-car models with passive and semi-active suspensions under random road excitation. Journal of Vibration and Control 11: 581–606.

31.

Wang

Turner

Mann

, et al. (2019) Constrained attractor selection using deep reinforcement learning. arXiv preprint arXiv:1909.10500.

32.

Yan

Dong

Han

, et al. (2021) A general inverse control model of a magnetorheological damper based on neural network. Journal of Vibration and Control 28(7–8): 952–963.

33.

Yong

Seo

Kim

, et al. (2023) Suspension control strategies using switched soft actor-critic models for real roads. IEEE Transactions on Industrial Electronics 70(1): 824–832.

34.

Zhao

Jiang

You

, et al. (2019) Setpoint tracking for the suspension system of medium-speed maglev trains via reinforcement learning 2019 IEEE International Conference on Control and Automation (ICCA). Edinburgh, UK, 16–19 July 2019: 1620–1625. DOI: 10.1109/ICCA.2019.8900006.