The article aims to propose an optimized tracking control for the high-order nonlinear dynamic system with the unknown dead zone of the controller. The dead zone refers to a control with zero behavior in some ranges, hence the system performance will be inevitably affected. Since the dead-zone phenomenon is frequently encountered in control systems, it is very necessary to consider the effect in optimized nonlinear control. Because the high-order system contains multiple system states, the optimized dead-zone control is designed by combining both reinforcement learning (RL) and sliding-mode control. Furthermore, an adaptive compensation of the remainder of the dead-zone function is added into the dead-zone input, then the optimized dead-zone control is yielded from the RL under identifier–actor–critic architecture, where the identifier can remove the requirement of known dynamic function. Finally, the effectiveness of this optimized control is proved by Lyapunov stability analysis and a simulation example.
Optimal control means to consume the least control source for fulfilling the control task (Hull, 2003). So far, it has already turned into a prevalent design principle in contemporary control systems. To find the optimal control of a nonlinear system, it involves the solution of the Hamilton–Jacobi–Bellman (HJB) equation (Abu-Khalaf and Lewis, 2004; Wen and Niu, 2022). But, because of the strong nonlinearity, getting the analytical solution of this equation is extremely challenging. To overcome the difficulty, reinforcement learning (RL) as an adaptive approximation means has been proven to be a workable method (Mozer et al., 2005). Up to now, it has been significantly researched and developed to address the optimal nonlinear control problems, such as Wen et al. (2018, 2019a, 2019b), Bian and Jiang (2022), and Yang et al. (2018).
Unfortunately, these optimal nonlinear controls using the traditional RL method, such as Wen et al. (2018, 2019a, 2019b), Bian and Jiang (2022), and Yang et al. (2018), still have many shortcomings. The most typical one is the high complexity in an algorithm, as a result, these optimal control methods are difficultly extended in theory and applied in engineering. Furthermore, the primary reason is that both actor and critic training laws for implementing RL are acquired through the negative gradient of the square of the approximated HJB equation. To alleviate the algorithm’s complexity, a simplified RL approach developed by Wen et al. (2020, 2021) was used to derive the adaptive updating laws from the negative gradient of a simple positive function, which is equivalent to the HJB equation, rather than the square of this equation. Subsequently, it is significantly extended and applied by combining with many control techniques proposed by Wen and Li (2022), Wen et al. (2022), Hua et al. (2017), Ji et al. (2022), and Shen et al. (2022).
Whereas, these mentioned optimal control methods still overlook the situation that the system performance is severely constrained by some stubborn nonlinear properties, such as dead zone, hysteresis, and saturation. Furthermore, the dead-zone problem of the controller, also known as the neutral zone or non-active zone, refers to the range of zero control signal, and it is frequently encountered in the control systems. For offsetting the effect of dead zone, scholars have proposed many control methods, such as those proposed by Zhou et al. (2006), Wang et al. (2004), Hua and Ding (2011), Hua et al. (2017), and Tong et al. (2015). In Zhou et al. (2006), a dead-zone inversion method has been developed for a nonlinear system. In Wang et al. (2004), a robust adaptive control method is explored to deal with this dead-zone problem. In Hua and Ding (2011), a control method is proposed to handle the multiple dead-zone inputs for the large-scale system. In Hua et al. (2017), the high-order stochastic multi-agent system with dead-zone input is discussed. In Tong et al. (2015), it is used to develop an observer control in the switching nonlinear system with dead zone. Furthermore, in Ren et al. (2021, 2022), an adaptive boundary control with input dead zones and external disturbances is studied. However, concerning the optimal nonlinear control, it still seldom involves the dead-zone problem. Particularly, for the optimized control of high-order nonlinear dynamic systems, the dead-zone problem will become more intractable because the control relates to the multiple state variables, so the complexity of RL is significantly raised.
To control a high-order system, sliding-mode control (SMC) is the most suitable strategy because SMC can smoothly steer multiple states by a sliding-mode variable (Chiang and Yang, 2006; Li et al., 2023; Ma and Li, 2020; Wen et al., 2023). Moreover, it has the benefits of quick global convergence, straightforward structure, minimal sensitivity to parameter fluctuations, and good tolerance to external disturbances. Inspired by the aforementioned, the article attempts to solve the optimization control problem of high-order nonlinear systems having unknown dead zones. The key contributions are listed below.
This optimized control method can effectively deal with dead-zone problems for high-order nonlinear systems. It is used to convert the dead-zone function into the linear form, then finding the dead-zone input yields the optimized control. Since the dead-zone input has an adaptive term to compensate for the remainder of the dead-zone function, the optimized method can be competent for various dead-zone systems.
Compared with the conventional method, this simplified optimal control algorithm can be performed easily. Since this optimal control obtains the RL from a straightforward positive function that is equivalent to the HJB equation rather than this equation’s square, it can make a substantial simplification of the algorithm.
This optimized control method does not ask for a complete dynamic acknowledgment. Since the RL algorithm integrates an adaptive identifier, it can remove the requirement of a known dynamic function.
Neural network (NN)
In the control community, NN can be a popular function approximator. Over a compact set , it is capable of approximating the continuous function via the following form:
where is the NN weight with neurons; is the basis function vector, which is defined by using the Gaussian function , of which is the width and is the centers of receptive fields.
Concerning this NN approximation (1), here must be an ideal weight vector , which is formulated to be
Consequently, the function could re-described as
where represents the approximation error, which satisfies , and is a constant.
Main result
Problem statement
Regarding a high-order nonlinear system under the canonical dynamic form with relative agree ,
where is a continuous nonlinear function, is the state vector, is the control input variable, which is treated as the output of the following unknown dead zone.
The model of dead zone is formulated to be
where is the input for this dead zone, and represent the dead-zone slopes, and stand for the dead-zone respective right and left break-points (Figure 1).
Dead-zone model.
Assumption 1 (Wang et al., 2004; Yoo et al., 2009).The two dead-zone slopes of positive and negative regions are the same, that is, .
By using Assumption 1, model (5) is re-described as
where
From the above (7), it should be noticed that the function is bounded, and it must exhibit a constant that satisfies , where .
Remark 1. It is worth noting that the control of system (4) cannot be directly designed due to the existence of an unknown dead zone. However, from the output–input relation (5) of system control and dead zone, the control problem of system (4) is converted to find this dead-zone input. Furthermore, this dead zone (5) has a practical engineering background, such as hydraulic servo valve, servo motor (Wang et al., 2004), hence the dead-zone control is a significant research topic.
The control objective
In order to find the dead-zone input to yield the optimization tracking control for the high-order nonlinear dynamic system (4), so that (i) all control signals are semi-globally uniformly ultimately bounded (SGUUB) and (ii) the output state can track the reference signal .
Lemma 1.(Wen et al., 2015). represents a positive continuous function, of which the initial value is bounded. If it can hold , where both and are constants, then there is an inequality that satisfies,
Optimal control description
For the state tracking task, the desired reference trajectory can be denoted as , where its derivatives are assumed to be known and bounded. Then define as the tracking errors. From (4), the dynamic error is present as follows:
Define a sliding-mode variable as
where these coefficients are chosen to make the eigenvalue polynomial Hurwitz, that is, all its roots are located on the opened left-half plane.
Utilizing (9) and (10), the derivative of (10) is calculated to be
Substituting (6) in (11) yields
The infinite horizon integral performance index for the dynamic (11) is considered as
where is the cost function.
Definition 1.(Admissible control; Vamvoudakis and Lewis, 2010). The control of (9) is considered admissible on the set , which is signified as , if holds , and is continuous, and stabilizes (9) on , even makes (13) finite.
It is used to find an admissible control for (9), in order that it can minimize (13) for finishing the control assignment.
In view of (13), the performance index function can be produced as
Let stand for the optimal dead-zone input, then, by replacing with in (14), it can yield the optimal performance index function:
where is an already established compact set.
Along (11), computing the derivative of both sides of (15), thus it can obtain the HJB equation:
As stated previously, is uniquely correlated to (15), thus it can be requested as the only solution to hold above (16). Hence, can be obtained from as
From (6), the dead-zone input corresponding to can be represented to be
Nevertheless, the unknown terms and are the obstacles to put the dead-zone input (18) into action. To achieve the optimal control, on the one hand, it has to consider to compensate the unknown term in this optimal control; on the other hand, this term is hoped to gain by calculating the following HJB equation formed by putting (17) into (16):
However, due to the strong nonlinearity, the analytical solution of (19) is very challenging to obtain. So, the RL strategy is taken into consideration to determine the HJB equation’s approximated solution, and the adaptive approximation is taken into consideration to compensate .
RL design
To facilitate analysis and achieve tracking goal, we adopt a scheme of splitting into the following form:
where is the designed parameter, and , of which .
According to (6), the dead-zone input associated with is
Because the unknown terms and are continuous, they are reformulated as the following forms by using the NN approximations:
where and are the ideal NN weight matrices, and are the basis function vectors, and are two approximation errors that are bounded, and , are the number of neurons.
Inserting (22) and (23) into (20) and (21) yields
where .
Nevertheless, the ideal weights , and the term are unknown, so the dead-zone input (25) is not available. To solve it, this article first establishes RL based on equations (22), (24) and (25) in the identifier–critic–actor architecture, and then creates the adaptive approximation method for the boundedness of term .
In accordance with (22), the identifier NN is established for estimating the unknown dynamic of the control as
where is the identifier output, is the identifier NN weight and has the following weight update law:
where is a designed constant.
In accordance with (24), the critic NN is established for assessing the performance of control as
where is the estimate of , is the critic NN weight and its weight update law listed below,
where , , and represent the critic design parameter, the design parameter and the identity matrix, respectively.
In accordance with (25), the actor NN is set up to execute the control action as
where is the actor NN weight, is an adaptive estimation of the unknown constant , and and are updated by the laws:
where is the actor design parameter, is a designed constant.
Theory with proof
Theorem 1.To the high-order nonlinear system (4) with the bounded initial value. If the dead-zone input is used to obtain the optimized control by the suggested identifier–critic–actor RL of (26), (28), and (30) with the update laws (27), (29), and (31), the designed parameters are chosen to meet the requirements listed below:
The following goals can be accomplished:
all errors , , , , and are SGUUB, where , , , , and ;
all tracking errors converge to a small zero neighborhood.
Proof. Discuss the candidate of the Lyapunov function to be
□
Along (12), (27), (29), (31), and (32), the derivative of (34) is figured out as
Inserting (30) into (35) yields
Based on the NN approximation (22) of function , the above (36) is rewritten as
From and , the following details exist:
Using and inserting (38) and (39) into (37) yields
Using Young’s inequalities, we can obtain the following facts:
Utilizing the results from above, (40) can turn into
Based on , (46) is adjusted to be
From , and , we can obtain
Furthermore, based on Young’s inequality, we can derive
Using above (48), (49), (50), and (51), (47) will become
Depending on (33), the inequality (52) may shift as
where is con- strained by a constant , that is, .
Subsequently, letting , (53) can be changed to
By using Lemma 1, we can obtain
Obtaining (55) implies that all error signals are SGUUB, in the meantime, by picking designed parameters large enough, the tracking error is directed toward a small zero neighborhood.
Simulation example
The example of a second-order nonlinear system simulation used in this section is below:
where , , and represent the position state, velocity state, and dead-zone output, respectively. And using and as the initial states. The dead-zone parameters in this simulation are , , and .
When the time function is used to define the reference trajectory, the tracking errors are produced as
In accordance with (10), the sliding-mode variable is designed as .
The three identifier, critic, and actor NNs are designed with neurons, and the centers of Gaussian functions in the basis function vectors are uniformly distributed from to and the width . Furthermore, the RL parameters of this identifier, critic, actor and adaptive training laws correspond to (27), (29), (31), and (32) are set to , , , , and . Concerning the optimized dead-zone input corresponding to (30), the parameter is set to be . The initial values are set as , , , and .
Figures 2 to 6 exhibit that the simulation results are ideal. Figure 2 displays the tracking performances. Figure 3 demonstrates the two tracking errors , which are convergent to zero, and it indicates that these system states can track the reference signal . Figures 4 to 6 demonstrate the identifier, critic, and actor NN weights and the adaptive parameter to be bounded. The cost function is shown in Figure 6. On the basis of Figures 2 to 6, all errors , , , , and are SGUUB. It indicates that the control objectives can be well accomplished by the proposed optimized method.
The tracking performance.
The tracking errors.
The norms of critic and actor NN weight.
The identifier neural network (NN) weight and adaptive dead-zone parameter.
The cost function.
Conclusion
This article develops an optimized tracking control to handle the dead-zone problem for the high-order nonlinear canonical system. To find the dead-zone input to output this optimized control, the RL and SMC strategies are combined. By adding an adaptive identifier into the RL design, the optimized control applied to an unknown dynamic system is put into action. Meanwhile, to solve the dead-zone problem more effectively, an adaptive approximation term is introduced into the RL to compensate for the unknown remainder of the dead-zone function. Finally, the optimal control approach has been shown to be capable of achieving control goals in line with the Lyapunov stability theory. Simulation results further validate the effectiveness of this approach.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article:: This work is supported in part by the National Natural Science Foundation of China under grant nos. 62073045 and 61973185, and in part by the Natural Science Foundation of Shandong Province under grant nos. ZR2021MF088 and ZR2020MF097, and in part by the Development Plan of Young Innovation Team in Colleges and Universities of Shandong Province under grant no. 2019KJN011.
Data availability
All data included in this study are available upon request by contact with the corresponding author.
ORCID iDs
Shuaihua Ma
Guoxing Wen
References
1.
Abu-KhalafMLewisFL (2004) Nearly optimal state feedback control of constrained nonlinear systems using a neural networks HJB approach. Annual Reviews in Control28: 239–251.
2.
BianTJiangZP (2022) Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: A value iteration approach. IEEE Transactions on Neural Networks and Learning Systems33: 2781–2790.
3.
ChiangCCYangCC (2006) Robust adaptive fuzzy sliding mode control for a class of uncertain nonlinear systems with unknown dead-zone. In: 2006 IEEE international conference on fuzzy systems, Vancouver, BC, Canada, pp. 492–497.
4.
HuaCZhangLGuanX (2017) Distributed adaptive neural network output tracking of leader-following high-order stochastic nonlinear multiagent systems with unknown dead-zone input. IEEE Transactions on Cybernetics47: 177–185.
5.
HuaCCDingSX (2011) Model following controller design for large-scale systems with time-delay interconnections and multiple dead-zone inputs. IEEE Transactions on Automatic Control56: 962–968.
6.
HullDG (2003) Optimal Control Theory for Applications. New York: Springer.
7.
JiWPanYZhaoM (2022) Adaptive fault-tolerant optimized formation control for perturbed nonlinear multiagent systems. International Journal of Robust and Nonlinear Control32: 3386–3407.
8.
LiZSongYWenG (2023) Reinforcement learning based optimized sliding-mode consensus control of high-order nonlinear canonical dynamic multiagent system. IEEE Systems Journal17: 6302–6311.
9.
MaHLiY (2020) A novel dead zone reaching law of discrete-time sliding mode control with disturbance compensation. IEEE Transactions on Industrial Electronics67: 4815–4825.
10.
MozerS, C MHasselmoM (2005) Reinforcement learning: An introduction. IEEE Transactions on Neural Networks16: 285–286.
11.
RenYZhaoZZhangC, et al. (2021) Adaptive neural-network boundary control for a flexible manipulator with input constraints and model uncertainties. IEEE Transactions on Cybernetics51: 4796–4807.
12.
RenYZhuPZhaoZ, et al. (2022) Adaptive fault-tolerant boundary control for a flexible string with unknown dead zone and actuator fault. IEEE Transactions on Cybernetics52: 7084–7093.
13.
ShenFWangXLiH, et al. (2022) Adaptive output-feedback control for a class of nonlinear systems based on optimized backstepping technique. International Journal of Adaptive Control and Signal Processing36: 1077–1097.
14.
TongSSuiSLiY (2015) Observed-based adaptive fuzzy tracking control for switched nonlinear systems with dead-zone. IEEE Transactions on Cybernetics45: 2816–2826.
15.
VamvoudakisKGLewisFL (2010) Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica46: 878–888.
16.
WangXSSuCYHongH (2004) Robust adaptive control of a class of nonlinear systems with unknown dead-zone. Automatica40: 407–413.
17.
WenGChenCLPGeSS (2021) Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Transactions on Cybernetics51: 4567–4580.
18.
WenGChenCLPGeSS, et al. (2019a) Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy. IEEE Transactions on Industrial Informatics15: 4969–4977.
19.
WenGPhilip ChenCLiWN (2020) Simplified optimized control using reinforcement learning algorithm for a class of stochastic nonlinear systems. Information Sciences517: 230–243.
20.
WenGDouHLiB (2023) Adaptive fuzzy leader–follower consensus control using sliding mode mechanism for a class of high-order unknown nonlinear dynamic multi-agent systems. International Journal of Robust and Nonlinear Control33: 545–558.
21.
WenGGeSSTuF (2018) Optimized backstepping for tracking control of strict-feedback systems. IEEE Transactions on Neural Networks and Learning Systems29: 3850–3862.
22.
WenGGeSSChenCLP, et al. (2019b) Adaptive tracking control of surface vessel using optimized backstepping technique. IEEE Transactions on Cybernetics49: 3420–3431.
23.
WenGHaoWFengW, et al. (2022) Optimized backstepping tracking control using reinforcement learning for quadrotor unmanned aerial vehicle system. IEEE Transactions on Systems, Man, and Cybernetics: Systems52: 5004–5015.
24.
WenGLiB (2022) Optimized leader–follower consensus control using reinforcement learning for a class of second-order nonlinear multiagent systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems52: 5546–5555.
25.
WenGNiuB (2022) Optimized tracking control based on reinforcement learning for a class of high-order unknown nonlinear dynamic systems. Information Sciences606: 368–379.
26.
WenGXChenCLPLiuYJ, et al. (2015) Neural-network-based adaptive leader-following consensus control for second-order non-linear multi-agent systems. IET Control Theory & Applications9: 1927–1934.
27.
YangXHeHWeiQ, et al. (2018) Reinforcement learning for robust adaptive control of partially unknown nonlinear systems subject to unmatched uncertainties. Information Sciences463–464: 307–322.
28.
YooSJParkJBChoiYH (2009) Decentralized adaptive stabilization of interconnected nonlinear systems with unknown non-symmetric dead-zone inputs. Automatica45: 436–443.
29.
ZhouJWenCZhangY (2006) Adaptive output control of nonlinear systems with uncertain dead-zone nonlinearity. IEEE Transactions on Automatic Control51: 504–511.