Abstract
This paper presents a learning-based generalized dynamic predictive control (GDPC) approach for a permanent magnet synchronous motor (PMSM) position servo system, aiming to achieve intelligent and fine-grained operation under uncertain environments. Considering the limitations of the human-configured generalized predictive control (GPC) scheme with the fixed horizon to cope with diverse operating conditions, the investigated strategy further integrates a deep reinforcement learning (DRL)-based horizon online-updating mechanism. In particular, an extended state observer (ESO) is first constructed for the estimation of lumped disturbance to modify the deviation of the prediction model. The analytical solution of the benchmark GPC algorithm is then obtained by solving an optimization problem of the designed performance index. To optimize the prediction horizon of GPC, a DRL agent is then trained offline. Real-time horizon adjustment is finally implemented on an experimental setup that combines a digital signal processor (DSP) and a Beckhoff controller. A series of simulations and experiments validate the efficacy of the proposed control approach. The proposed GDPC method utilizes a deep deterministic policy gradient (DDPG) algorithm to optimize the prediction horizon in real time, achieving improved control performance under varying operating conditions.
Keywords
Introduction
Permanent magnet synchronous motors (PMSMs) play a crucial role as primary components in numerous industrial applications, demanding precise and rapid position servo response to fulfill accuracy and speed prerequisites (Ding et al., 2023; Wang et al., 2025; Xu et al., 2024). The distinct advantage of PMSMs lies in their remarkable control capabilities across a range of applications. However, widespread utilization of PMSMs presents substantial challenges in achieving high-performance control standards, which primarily attributed to uncertainties stemming from parameter perturbations, friction torque, external disturbances, and variations in different operating conditions (Mandra et al., 2019; Tian et al., 2022).
Nowadays, conventional linear control strategies have been extensively employed in motion control systems. However, their capability to meet the precision and adaptability demands of modern industrial applications is often deemed insufficient. This is due to the complexities of PMSMs, which are often characterized by nonlinear behaviors and uncertainties, hindering the establishment of precise models required for traditional control techniques. Consequently, various nonlinear control theories such as model predictive control (MPC) (Cao et al., 2024; Jia et al., 2024; Zhang et al., 2025), active disturbance rejection control (ADRC) (Hou et al., 2023; Yang et al., 2025; Zuo et al., 2021), and sliding mode control (SMC) (Yue et al., 2023; Zhang et al., 2021) have been thoroughly investigated and successfully implemented to the high-performance PMSM control applications. Among these approaches, MPC stands out for its real-time optimization capability, making it a promising method to tackle nonlinear and complex systems while effectively managing multiple constraints (Belda and Vosmik, 2016; Song et al., 2023).
Generalized predictive control (GPC) is considered notable because it effectively handles system constraints and exhibits high robustness by employing a continuous-time model, which avoids the performance degradation associated with sampling period selection in traditional discrete MPC (Yang et al., 2015; Zheng et al., 2021). We analyze how ADRC and SMC focus on disturbance rejection, while GPC provides a more systematic optimization framework for position servo tasks. Deep reinforcement learning (DRL) refers to a class of machine learning algorithms where an agent learns to make decisions by interacting with an environment to maximize cumulative rewards. DRL is employed in this work because it can autonomously learn optimal control policies without requiring explicit system models, making it suitable for complex, nonlinear systems like PMSM drives.
It is knowable that the effectiveness of the standard MPC strategy based on a discrete-time model is significantly influenced by the chosen sampling period in controller design (Yang et al., 2015). Generally speaking, employing a small sampling period alongside a relatively large horizon may result in heightened computational demands, while the extension of the sampling period unavoidably diminishes both the dynamic response and anti-interference capabilities of the system to a certain extent. As an alternative, GPC based on continuous-time models stands out as a notable predictive control method, which not only features robustness, rapid dynamic response, and easy handling of system constraints but also excels in computational efficiency and straightforward engineering implementation (Zheng et al., 2021). Nonetheless, since the traditional GPC relies on nominal system modeling, it encounters challenges when dealing with load disturbances, model parameter drift, and unmodeled dynamics (Yang et al., 2017). Furthermore, employing a fixed control gain poses challenges in attaining optimal control performance across diverse operating conditions (Dong et al., 2024; Zhang et al., 2023). This emphasizes the demands for the predictive control algorithm to exhibit both robustness and adaptability simultaneously. The prediction horizon
As an intelligent algorithm, the DRL possesses the capability of autonomous learning and solving complex problems (Li et al., 2025; Ma et al., 2023). By virtue of this promising characteristic, the DRL algorithm has gained attention in the traditional controller design, such as applying parameter optimization to PI differential controllers, the robust stabilization strategy (Lee et al., 2020), and MPC methodologies (Cui et al., 2024; Pan et al., 2024). In contrast to some DRL methods, such as adaptive dynamic programming (ADP) techniques that generally require specific model information to accurately capture system dynamics (Liu et al., 2021), or Q-learning methods which show greater suitability in discrete action spaces (Liu and Su, 2022), the deep deterministic policy gradient (DDPG) algorithm (Ouyang et al., 2025) stands out as a well-known actor-critic approach appreciated for its effectiveness in handling continuous control tasks; therefore, it is suitable for the data-driven feedback control system (Liu et al., 2024). However, it should be noted that although the above studies have showcased effective control in semi-physical simulations, their utilization in optimizing controller design for physical electrical drive systems remains limited.
Recently, Dong et al. (2024) proposed an adaptive GPC method for PMSM speed regulation, demonstrating effective disturbance rejection under mismatched disturbances. While their work focuses on speed control, our paper addresses the more demanding position servo control problem, which requires higher steady-state accuracy and different dynamic response characteristics. This distinction highlights the need for a control framework capable of handling the additional complexities of position tracking under uncertain dynamics. To address the limitations of fixed-horizon GPC and extend the GPC paradigm from speed regulation to precision position control, this work integrates a DRL-based adaptive horizon tuning mechanism. The DRL agent operates as a high-level parameter optimizer, dynamically adjusting the prediction horizon
Motivated by the above analysis, a DRL-based generalized dynamic predictive control (GDPC) approach is put forward for the PMSM position servo system, which seamlessly integrates the benchmark GPC and DDPG algorithm, along with a variable horizon. To implement the proposed strategy, the robust GPC is first obtained from the rotor position prediction model calibrated by an extended state observer (ESO), which is deployed in a digital signal processor (DSP). Meanwhile, the DRL training results are introduced into the horizon adjustment of GPC, and the agent is further deployed in a Beckhoff controller. Compared with the existing results, the main contributions of this paper are summarized as follows:
The proposed DRL-based GDPC method incorporates a real-time horizon adjustment mechanism, achieving an order-of-magnitude improvement in computational efficiency while maintaining comparable control performance. This enables intelligent control for PMSMs under a wide range of operating conditions.
Compared with existing observer-based robust GPC, the proposed method not only improves adaptability to diverse operating conditions via DRL-based horizon tuning, but also enhances disturbance rejection under simultaneous load torque and inertia variations, as evidenced by reduced position fluctuation.
Different from existing semi-physical simulation validations for DRL-based control algorithms, the synergy between DSP and Beckhoff controllers delivers a practical dual-processor architecture that decouples optimization from real-time control, facilitating reliable deployment in real power electronic applications without additional hardware complexity.
The remainder of this paper is organized as follows: Problem formulation is presented in section “Problem formulation.” Section “DRL-based GDPC design” provides a detailed design process of the proposed GDPC method. Simulation and experimental tests under various scenarios are introduced in sections “Simulation verification” and “Experimental verification,” respectively. Section “Conclusion” summarizes this article. Finally, the rigorous stability analysis is provided in Appendix 1.
Problem formulation
This section presents a typical physical model of the PMSM servo system, followed by an outline of the control objective.
PMSM servo model
The dynamic equations of a PMSM servo system are generally expressed as Yan et al. (2018)
where
To proceed, the stator current
where
Starting from the PMSM mechanical dynamics and differentiating the position gives the second-order relationship. Defining
From the perspective of industrial application, it is reasonable to make the following assumption.
The assumption
In practical applications, PMSM servo systems frequently operate in uncertain environments, which involve perturbations in load torque
Control objective
Considering the application requirements of the position servo system, the control objectives of this article can be concisely summarized as follows:
Precise position control: The primary objective is to attain precise position control for the PMSM servo system. This is achieved through the implementation of a GPC approach, ensuring that the rotor position
Adaptation to operating conditions: The control system is designed to exhibit adaptability to various operating conditions and environmental factors, even in the presence of perturbations and uncertainties. To this end, a DRL strategy is introduced to facilitate parameter online updating, which can be expressed as
where μ is the policy determined by the DRL agent and
DRL-based GDPC design
This section presents the main procedure for designing the DRL-based GPC approach, which includes the design of the benchmark GPC and the horizon tuning mechanism.
Benchmark GPC design
Position prediction model construction
To proceed, the rotor position
To facilitate the implementation on DSP, the control order r is set to zero. Consequently, (5) can be simplified into
where
Using Taylor series expansion, the future position can be approximated as
setting control order
It is apparent from (6) that the rotor angular velocity
Unknown signal observation
First, by defining
For system (8), an ESO is employed to estimate the angular velocity and disturbance simultaneously as follows
where
The ESO gains are designed using the bandwidth parameterization technique, which places all observer poles at
which yields
From system (8) and observer (9), the estimation error dynamics system can be calculated as
where
It can be observed from (10) that since
where
Receding-horizon optimization
For the purpose of achieving the optimal control of the PMSM servo system, the performance index is defined as
where
The current optimal control input is calculated according to the defined performance index at each sampling time. Therefore, substituting (13) into (12) yields
where
Separate the matrix
From the cost function
Receding-horizon optimization
Furthermore, the optimal control law
where
Note that the dimensional consistency of the gains can be verified as follows:
The GPC control law in (16) fundamentally differs from a conventional PID controller in several key aspects. First, the gains
DRL-based variable horizon of GPC
In this subsection, we enhance the classical GPC by integrating the DDPG-based learning algorithm for the horizon tuning mechanism design. The detailed design framework of the proposed DRL-based GDPC is depicted in Figure 1. The system operates as follows: the encoder measures rotor position

Block diagram of the proposed DRL-based GDPC for PMSM servo system, illustrating the interaction among the ESO, GPC optimizer, current control loop, and DRL agent for adaptive horizon tuning.
The DRL (DDPG) agent is integrated as a high-level “parameter optimizer” that dynamically selects
The specific design philosophy of the state space, the action space, the reward function, and the actor and critic network are illustrated in the following steps. Furthermore, Figure 2 presents a diagram illustrating the DRL-based optimization procedure for the variable horizon.

DRL-based optimization procedure for the variable horizon.
State space
Considering the control objective of ensuring highly precise position tracking with smooth and rapid dynamic response, the fundamental state of the position servo system is characterized by the output position
Although
Action space
It is worth clarifying that the horizon
The prediction horizon
where
The DRL action space is
Reward function
Here, the tracking error
Reward coefficients:
Actor and critic network
The function approximators utilized in the reinforcement learning agent are commonly known as the actor and critic networks. Specifically, the critic network comprises three hidden layers with 64, 64, and 32 neurons distributed among the layers. Moreover, the critic network features five units in the input layer, including four for states and one for action. In addition, the actor network consists of two hidden layers with 64 and 32 neurons, respectively, and employs the ReLU function as the activation function for each hidden layer. The structure of the proposed actor-critic network in the DDPG algorithm is shown in Figure 3.

Schematic of DDPG actor-critic network.
Simulation verification
In this section, MATLAB/Simulink is employed for simulation validation. The parameters of the employed PMSM platform in the simulation and experiment are shown in Table 1. First, the impact of key parameters on the system is analyzed to prepare for reasonable training of DRL. Second, the difference in execution time between the proposed method and the traditional MPC is evaluated. In addition, the P-PI, ADRC, and GPC algorithms are used as control groups to evaluate the dynamic response and disturbance rejection capabilities of the system.
PMSM parameters.
Preparation for DRL training
This section analyzes the impact of two key parameters: the prediction horizon
As shown in Figure 4(a), as the parameter p increases, the constraints on the control energy become greater, which actually acts as a protection mechanism for the system. The results shown in Figure 4(b) and (c) present the system response curves of two operating conditions,

Response curves of the GPC with different p and
DRL training setup
The DDPG agent is trained offline using a Simulink model of the PMSM servo system. The training environment includes reference steps ranging from
Parameters of different controllers.
Efficient comparison with classical MPC
As shown in Figure 5, within the framework of the position servo system, the proposed method exhibits comparable control performance to MPC, with noticeably different one-step computation times of 8.9 and

Simulation results: (a) position response curves under control period of
Simulation verification with different methods
As shown in Figure 5(c), the proposed controller demonstrates dynamic performance comparable to traditional GPC and ADRC. Additionally, it is worth noting that although the P-PI controller is effective, its response speed is slightly slower and the smoothness is reduced compared with the optimal control algorithm. Furthermore, an analysis of Figure 5(d) reveals the excellent capabilities of the proposed controller when confronted with applied load disturbances. It swiftly recovers after the application of external load torque and maintains the overall stability of the system. These results underscore the effectiveness of the proposed control strategy in dealing with challenging real-world conditions.
Experimental verification
In this section, a series of experiments are conducted to validate the effectiveness and performance enhancement of the proposed control scheme. The experiments involve the combination of a DSP with a Beckhoff controller, enabling rapid deployment and validation of complex control algorithms.
Experimental configuration
The experimental setup, as depicted in Figure 6, comprises several pivotal components: a DSP module equipped with the DRV8305 control driver board, a LAUNCHXL-F28379D module serving as the control core board, a

Experimental platform, showing the Beckhoff IPC, TI DSP, power inverter, PMSM, and load torque generation system.
Experimental verification with different methods
To verify the performance improvement of the proposed method, various experiments are conducted in which the traditional P-PI controller, ADRC (Zuo et al., 2021), GPC1 (
Case I—
ranges from
to
and
at the initial moment
As can be seen in Figure 7(a) and (b), it is notable that the proposed approach realizes intelligent horizon adjustment in response to changing operating conditions. The trend of variation

Experimental results: (a) Case I—
Case II—apply a constant load of 0.1 Nm at 0.5 s and 0.2 Nm at 1.0 s, respectively
In Figure 7(c) and (d), the results show that when encountering sudden disturbances, the position drop of the PI controller is the largest, followed by the ADRC, while the proposed framework exhibits the lowest position drop and fastest performance recovery. Furthermore, to uphold optimal control precision in the presence of escalating disturbances, the horizon is deliberately minimized. The ESO employed in the proposed composite controller can accurately estimate the lumped disturbance, which guarantees the position tracking accuracy of the servo system when facing complex disturbances.
Case III—apply a sine wave load disturbance with a period of 1 s and an amplitude of 0.1 Nm at 0.5 s
The results described in Figure 7(e) show that when the system is subjected to time-varying disturbance, the PI controller exhibits the highest position oscillation, while the ADRC maintains a certain disturbance rejection capability. The performance of the GPC method is predominantly influenced by the choice of horizon. In contrast, the proposed control approach exhibits a smaller amplitude of position oscillation compared to other controllers. Consequently, the adaptive adjustment of the horizon allows the proposed strategy to exhibit effective disturbance rejection performance under various operating conditions.
Case IV—robustness against parameter perturbations
To verify the efficacy of the proposed method in handling parameter perturbations, a series of experiments are conducted under different J values. The test conditions include a step change in position from
Experiments are conducted with inertia values of
To gain more insight quantitatively, the performance indices including the settling time, the position drop, and the position fluctuation are, respectively, presented in Table 3.
Performance indices.
Conclusion
This paper has proposed a novel GDPC design scheme for PMSM servo systems that integrates a DRL-based horizon tuning mechanism. In the investigated framework, the horizon has been self-optimized to a certain suitable value according to the operating condition, rather than chosen by human experience. The comparison results have demonstrated that the proposed approach has indeed improved dynamic response and tracking performance. Future work will focus on the multi-objective optimization of the PMSM position servo system in constrained scenarios.
Footnotes
Appendix 1
This subsection provides a rigorous stability analysis of the closed-loop system.
Proof: Defining variables
where
Keeping the selection guideline of the observer parameter
Construct a candidate Lyapunov function as
where
Taking the derivative of the Lyapunov function along (20) yields that
where
By defining
It can be obtained from Khalil (2002) that when the selection of the observer bandwidth factor satisfies the condition
The proof of Theorem 1 is thus completed.
The above analysis rigorously proves the uniform boundedness of the closed-loop system and identifies the admissible range of
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China under grant 62203292.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
