Hierarchical approximate optimal coordinated control of modular robot manipulators through Nash-Stackelberg differential game

Abstract

The hierarchical approximate optimal coordinated control of modular robot manipulators (MRMs) modeled by joint torque feedback is developed via adaptive dynamic programming (ADP). Taking advantage of the Nash-Stackelberg differential game framework, the coordinated control issue is transformed into Stackelberg differential game between MRM as leader formulated as Nash differential game and coordinate operate objective as follower. The critic neural network (NN) is built to deal with the nonlinear cost function and then derived for obtaining the approximate optimal control strategy. Theoretical analysis and experimental verification have respectively proved the effectiveness of the proposed control algorithm.

Keywords

Nash differential game Stackelberg differential game Nash-Stackelberg equilibrium coordinated control adaptive dynamic programming

Introduction

Modular robot manipulator (MRM)^1,2 refers to a category of robots equipped with standard modules and interfaces, enabling them to recombine and reconfigure their own configurations in accordance with diverse task requirements. Guided by the design concepts of modularity and reconfigurability, each joint module of MRM integrates units for communication, sensing, actuation, and control. Through the reconfiguration of these modules, the robot can adopt multiple assembly configurations to accomplish various work tasks, thereby demonstrating advantages that traditional robots do not possess. Furthermore, robot coordinated operation embodies synergy between the precision and efficiency of robotic system as well as the flexibility and intelligence inherent to human operator.^3,4 Consequently, the collaborative task execution between humans and MRMs with operation object has been continuously advancing and widely applied in various daily application fields such as medical care, industry, and elderly care.

During robot coordinated operation task, both humans and robots make corresponding decisions according to every other’s strategies. Therefore, game theory has employed in recent years to describe the coordinated operation process and quantify interaction behavior that are otherwise difficult to characterize.⁵ The well-known Nash equilibrium for zero-sum^6,7/non-zero-sum^8,9 games and Pareto equilibrium for cooperative games^10,11 are both achieved through the adoption of differences in the strategic behavior exhibited by participants. A pivotal aspect of the Nash/Pareto equilibrium framework is that, each participant can obtain information fairly within the dynamic system. However, in robot coordinated operation task, humans and robots cannot possess absolutely equal dominant relationship, nor can they perceive all information about each other.¹² Instead, the follower can only respond to the leader’s actions. Consequently, the aforementioned game approaches, which assume equal roles among participants, are not applicable in this context.

The theoretical framework of the Stackelberg game originates from the 1934 work of German economist Stackelberg and has been applied to address a wide range of economic issues.^13–15 Subsequently, it has also found beyond its core applications to fields like nonlinear process,¹⁶ biology,¹⁷ and decision-making issue.¹⁸ Stackelberg game comprises two distinct types of agents: leaders and followers. Its core characteristic lies in addressing sequential decision-making processes, where the leader formulates and implements its strategy prior to the follower. Following this, the follower attempts to make an optimized response according to leader’s strategy. Simultaneously, the leader adjusts its strategy to generate the optimal response based on follower’s potential actions, according to the perceived information of the follower’s strategy. When humans and MRMs perform coordinated operation tasks, humans can assist robots in operations, thus, humans/coordinate operate objectives are regarded as followers while robots act as leaders. Addressing the problem of finding the optimal control strategies for the coordinated operation system, a crucial step in the analysis involves finding the Stackelberg equilibrium solution within the dynamic system. However, for complex problems such as coordinated operation, the occurrence of the curse of dimensionality means that deriving the optimal solution through analytical equations is not always feasible.^19,20 Therefore, researchers have adopted learning methods based on adaptive dynamic programming (ADP)^21–23 to ensure the system’s convergence to an approximate equilibrium state.

Building upon this foundation, a hierarchical approximate optimal coordinated control strategy is proposed for MRMs modeled with joint torque feedback, according to ADP. Leveraging the Nash-Stackelberg differential game framework, the coordinated control problem is converted into a Stackelberg differential game, where the MRM formulated as Nash differential game acts as the leader and the coordinated operation objective serves as the follower. To handle the nonlinear cost function, a critic NN is constructed; subsequently, the approximate optimal control strategy is obtained with the aid of this critic NN. The performance of the proposed control algorithm is rigorously assessed through theoretical analysis and experimental verification.

The highlights of this article are reflected in following:

The creative Nash-Stackelberg differential game framework is developed for the coordinated operation task, that is with Stackelberg game between MRM acted as Nash game deemed as leader and operation objective as follower.

The developed method with Stackelberg differential game highlights the sequential decision-making process. Besides, the proposed methods have been respectively confirmed by theory and experiments, unlike only in simulation.^24,25

MRM’s dynamic model analysis

Formulation of the MRM dynamics

Following the framework in Ref.,²⁶ we employ the joint torque feedback technique to describe the $i$ th MRM subsystem with the following dynamic model:

\begin{matrix} I_{imo} γ_{ira} {\overset{\cdot\cdot}{q}}_{i} + \frac{τ_{ico}}{γ_{ira}} + f_{ihu} (q_{i}, {\overset{\cdot}{q}}_{i}) + I_{idc} (q, \overset{\cdot}{q}, \overset{\cdot\cdot}{q}) \\ = τ_{i} + {[J_{iac}^{T} f_{co}]}_{i}, \end{matrix}

(1)

with the subscript $i$ specifies the respective joint subsystem, $I_{imo}$ is motor’s moment of inertia, $γ_{ira}$ denotes gear ratio, $q_{i}$ is joint position, $τ_{ico}$ is coupled joint torque, $f_{ihu} (q_{i}, {\overset{\cdot}{q}}_{i})$ means joint friction, $I_{idc} (q, \overset{\cdot}{q}, \overset{\cdot\cdot}{q})$ represents interconnected dynamic coupling (IDC) effect, $τ_{i}$ is control torque, $J_{iac}$ represents Jacobi matrix, $f_{co}$ represents the interaction force between the MRM and object of coordinated operation, and ${[.]}_{i}$ is $i$ th element of the vector.

Assumption 1: The rotor is assumed to be symmetric with respect to its axis of rotation. Joint flexibility is treated as negligible, and torque transmission through the speed reducer is considered to be failure-free. Moreover, the inertia between the torque sensor and the speed reducer is deemed insignificant.

1) Joint friction

$f_{ihu} (q_{i}, {\overset{\cdot}{q}}_{i})$ takes the form of:

\begin{matrix} f_{ihu} (q_{i}, {\overset{\cdot}{q}}_{i}) = {\hat{f}}_{ibv} {\overset{\cdot}{q}}_{i} + ({\hat{f}}_{ist} e^{(- {\hat{f}}_{i τ s} {\overset{\cdot}{q}}_{i}^{2})} + {\hat{f}}_{ico}) sgn ({\overset{\cdot}{q}}_{i}) \\ + f_{ipd} (q_{i}, {\overset{\cdot}{q}}_{i}) + K ({\overset{\cdot}{q}}_{i}) {\tilde{F}}_{ipu}, \end{matrix}

(2)

in which

K ({\overset{\cdot}{q}}_{i}) = {[f_{ibv} - {\hat{f}}_{ibv}, f_{ico} - {\hat{f}}_{ico}, f_{ist} - {\hat{f}}_{ist}, f_{i τ s} - {\hat{f}}_{i τ s}]}^{T},

(3)

where $f_{ipd} (q_{i}, {\overset{\cdot}{q}}_{i})$ means position dependency friction, $f_{ibv}, f_{i τ s}$ are viscous and Stribect friction, $f_{ist}, f_{ico}$ encompass both static and Coulomb friction effects. ${\hat{f}}_{ibv}, {\hat{f}}_{ico}, {\hat{f}}_{ist}, {\hat{f}}_{i τ s}$ are the estimated values of $f_{ibv}, f_{ico}, f_{ist}, f_{i τ s}$ .

Remark 1. $f_{ibv}, f_{ico}, f_{ist}, f_{i τ s}$ and estimations remain bounded, ${\tilde{F}}_{ipu}$ is with $| {\tilde{F}}_{ipu} | \leq b_{ipu}$ , and $b_{ipu}$ is known constant. This formulation directly leads to the expression for $K ({\overset{\cdot}{q}}_{i}) {\tilde{F}}_{ipu}$ : $| K ({\overset{\cdot}{q}}_{i}) {\tilde{F}}_{ipu} | \leq K ({\overset{\cdot}{q}}_{i}) b_{ipu}$ . Besides, $| f_{ipd} (q_{i}, {\overset{\cdot}{q}}_{i}) | \leq b_{ipd}$ , in which $b_{ipd}$ is constant bound.

2) IDC effect

The IDC coupling among joint modules is mathematically characterized by:

\begin{matrix} I_{idc} = I_{imo} \sum_{j = 1}^{i - 1} v_{mi}^{T} v_{lj} {\overset{\cdot\cdot}{q}}_{j} + I_{imo} \sum_{j = 2}^{i - 1} \sum_{k = 1}^{j - 1} v_{mi}^{T} (v_{lk} \times v_{lj}) {\overset{\cdot}{q}}_{k} {\overset{\cdot}{q}}_{j} \\ = I_{imo} \sum_{j = 1}^{i - 1} D_{j}^{i} {\overset{\cdot\cdot}{q}}_{j} + I_{imo} \sum_{j = 2}^{i - 1} \sum_{k = 1}^{j - 1} Θ_{kj}^{i} {\overset{\cdot}{q}}_{k} {\overset{\cdot}{q}}_{j} \\ = \sum_{j = 1}^{i - 1} [I_{imo} {\hat{D}}_{j}^{i}, I_{imo}] {[{\overset{\cdot\cdot}{q}}_{j}, {\tilde{D}}_{j}^{i} {\overset{\cdot\cdot}{q}}_{j}]}^{T} \\ + \sum_{j = 2}^{i - 1} \sum_{k = 1}^{j - 1} [I_{imo} {\hat{Θ}}_{kj}^{i}, I_{imo}] {[{\overset{\cdot\cdot}{q}}_{j}, {\tilde{Θ}}_{kj}^{i} {\overset{\cdot}{q}}_{k} {\overset{\cdot}{q}}_{j}]}^{T}, \end{matrix}

(4)

in which $v_{mi}, v_{lj}, v_{lk}$ are denote the unit vectors along the rotational axes of the $i$ th motor, the $j$ th link, and the $k$ th link. Then, we have $D_{j}^{i} = v_{mi}^{T} v_{lj}$ as well as $Θ_{kj}^{i} = v_{mi}^{T} (v_{lk} \times v_{lj})$ . Additionally, the following relations hold: that ${\hat{D}}_{j}^{i} = D_{j}^{i} - {\tilde{D}}_{j}^{i}$ and ${\hat{Θ}}_{kj}^{i} = Θ_{kj}^{i} - {\tilde{Θ}}_{kj}^{i}$ , where ${\hat{D}}_{j}^{i}, {\hat{Θ}}_{kj}^{i}$ denote estimated values of $D_{j}^{i}, Θ_{kj}^{i}$ and ${\tilde{D}}_{j}^{i}, {\tilde{Θ}}_{kj}^{i}$ represent alignment errors.

Remark 2. From the definitions $| D_{j}^{i} | = | v_{mi}^{T} v_{lj} | < 1$ and $| Θ_{kj}^{i} | = | v_{mi}^{T} (v_{lk} \times v_{lj}) | < 1$ based on vectors $v_{mi}, v_{lk}, v_{lj}$ , we derive the boundedness of $I_{idc} (q, \overset{\cdot}{q}, \overset{\cdot\cdot}{q})$ , specifically $| I_{idc} (q, \overset{\cdot}{q}, \overset{\cdot\cdot}{q}) | \leq b_{idc}$ for a bound $b_{idc}$ .

Define state vector $x_{i} = [x_{i 1}, x_{i 2}]^{T} = [q_{i}, {\overset{\cdot}{q}}_{i}]^{T}$ and control input $u_{i} = τ_{i}$ , $c_{i} = {[J_{iac}^{T} f_{co}]}_{i}$ . The dynamics of the $i$ th subsystem are characterized by the state space:

\begin{matrix} {\begin{matrix} {\overset{\cdot}{x}}_{i 1} = x_{i 2} \\ {\overset{\cdot}{x}}_{i 2} = ℓ_{i} (x) + g_{i} u_{i} + c_{i} \end{matrix}, \end{matrix}

(5)

where

\begin{matrix} g_{i} = {(I_{imo} γ_{ira})}^{- 1} \\ ℓ_{i} = - g_{i} (\begin{matrix} ({\hat{f}}_{ist} e^{(- {\hat{f}}_{i τ s} {\overset{\cdot}{x}}_{i 1}^{2})} + {\hat{f}}_{ico}) sgn (x_{i 2}) + f_{ipd} (x_{i 1}, x_{i 2}) \\ + {\hat{f}}_{ibv} x_{i 2} + K (x_{i 2}) {\tilde{F}}_{ipu} + \frac{τ_{ico}}{γ_{ira}} + I_{idc} (x, \overset{\cdot}{x}, \overset{\cdot\cdot}{x}) \end{matrix}) . \end{matrix}

(6)

In this paper, the primary control objective of this work is to optimally guarantee that the tracking error of MRM system in coordinated operation task is UUB. To address this challenge, the next section proposes a hierarchical approximate optimal coordinated control approach based on Nash-Stackelberg differential game.

Nash-Stackelberg differential game-based hierarchical approximate optimal coordinated control approach of MRM system

Derivation of Nash-Stackelberg differential game

This investigation focuses on leveraging Nash-Stackelberg differential game for approximating optimal coordinated control that MRM can optimally ensure systems’ tracking error UUB. Furthermore, Nash-Stackelberg differential game features a leader-follower hierarchy. The leader prioritizes strategy selection. The follower subsequently determines its optimal strategy in reaction to the leader’s decision. The leader, in turn, formulates its best response by anticipating the follower’s strategic behavior. In the MRM’s coordinated operation task, MRM is deemed as the leader, and the coordinated operation object is regarded as the follower. Besides, MRM system is composed of $n$ modules, which are players in the Nash differential game to optimize the global leader’s performance. Therefore, a joint Nash-Stackelberg differential game is developed in this section.

The diagram of the Nash-Stackelberg differential game-based hierarchical approximate optimal coordinated control is illustrated in Figure 1. Each joint of the MRM system (the leader) is regarded as a player in Nash differential game, while forming Stackelberg game with the object being manipulated (the follower). Consider a scenario where a robot and a human are required to transport a large object that neither could handle independently. In this setup, a desired trajectory is prescribed for the robot, and the manipulated object is designed to follow the robot’s position, thereby establishing hierarchical control framework (Stackelberg game). By employing ADP algorithm, critic neural networks are utilized to approximate the Hamiltonian functions of both the leader and the follower. The resulting residuals are then used to adjust the update laws of the critic networks, ultimately yielding approximate optimal control laws for both the leader and the follower.

Figure 1.

The diagram of the Nash-Stackelberg differential game-based hierarchical approximate optimal coordinated control.

According to (5), one expresses the overall state space equation is as follow:

\begin{matrix} {\begin{matrix} {\overset{\cdot}{x}}_{1} = x_{2} \\ {\overset{\cdot}{x}}_{2} = L (x) + \sum_{v = 1}^{n} G_{v} u_{v} + C (x) \end{matrix}, \end{matrix}

(7)

where $x = [x_{1}^{T}, x_{2}^{T}]^{T} \in R^{2 n}$ is the global state, $x_{1}, x_{2}$ represent $x_{1} = [x_{11}, \dots, x_{i 1}, . . ., x_{n 1}]^{T} \in R^{n}$ and $x_{2} = [x_{12}, \dots, x_{i 2}, \dots, x_{n 2}]^{T} \in R^{n}$ . Then $L (x) = [ℓ_{1} (x), \dots, ℓ_{i} (x), \dots, ℓ_{n} (x)]^{T}$ , $G_{v} = [0, \dots, 0, g_{v}, 0, {. . ., 0]}^{T}$ , $C (x) = [c_{1}, \dots, c_{i}, \dots, c_{n}]^{T}$ , where $g_{v} = {(I_{vmo} γ_{vra})}^{- 1}, v = 1, \dots, n$ .

Define the cost function of coordinated operation object and MRM system as:

\begin{matrix} V_{iF} (\overset{\cdot}{E}, u_{i}, C) = \int_{t}^{\infty} r_{iF} (\overset{\cdot}{E}, u_{i}, C) d τ \\ = \int_{t}^{\infty} ({\overset{\cdot}{E}}^{T} Q_{iF} \overset{\cdot}{E} + \sum_{v = 1}^{n} u_{iv}^{T} R_{iFv} u_{iv} + C^{T} R_{iF} C) d τ, \end{matrix}

(8)

\begin{matrix} V_{iL} (\overset{\cdot}{E}, u_{i}, C) = \int_{t}^{\infty} r_{iL} (\overset{\cdot}{E}, u_{i}, C) d τ \\ = \int_{t}^{\infty} ({\overset{\cdot}{E}}^{T} Q_{iL} \overset{\cdot}{E} + \sum_{v = 1}^{n} u_{iv}^{T} R_{iLv} u_{iv} + C^{T} R_{iL} C) d τ, \end{matrix}

(9)

where $r_{iF}$ and $r_{iL}$ correspond to the utility functions of the follower and the leader. $e_{i} = x_{i 1} - x_{id}$ , ${\overset{\cdot}{e}}_{i} = x_{i 2} - {\overset{\cdot}{x}}_{id}$ mean position as well as velocity error, $x_{id}$ represents the desired position in coordinated operation. $Q_{iF}, Q_{iL}, R_{iFv}, R_{iLv}, R_{iF}, R_{iL}$ are determined positive definite matrices. $E = [e_{1}, e_{2}, \dots, e_{n}]^{T}$ .

The coordinated operation object is seeking an optimal control policy for the MRM in response to the current system state, formulated as the problem of minimizing a cost function defined by:

V_{iF}^{*} (\overset{\cdot}{E}, u_{i}, C) = \min_{C} \int_{t}^{\infty} r_{iF} (\overset{\cdot}{E}, u_{i}, C) d τ .

(10)

The follower’s Hamiltonian function is constructed by combining the infinitesimal version of (10) with equations (1) and (5):

\begin{matrix} H_{iF} (\overset{\cdot}{E}, u_{i}, C, \nabla V_{iF}) = r_{iF} (\overset{\cdot}{E}, u_{i}, C) \\ + \nabla V_{iF} (L (x) + \sum_{v = 1}^{n} G_{v} u_{v} + C (x) - {\overset{\cdot\cdot}{x}}_{d}) . \end{matrix}

(11)

According to the stationary condition $\frac{\partial H_{iF}}{\partial c_{i}} = 0$ , local follower’s optimal control policy can be determined by applying the minimum principle:

c_{i}^{*} = - \frac{1}{2} R_{iF}^{- 1} \nabla V_{iF}^{*} .

(12)

Subsequently, a costate $ℏ$ is defined to incorporate the follower’s future effects, enabling the leader to respond strategically in its decision-making. It guarantees MRM’s requirement for coordinate operation objective’s cost function.

\begin{matrix} {\overset{\cdot}{ℏ}}_{i 1} = - \frac{\partial H_{iF}}{\partial {\overset{\cdot}{e}}_{i}} \\ = - \nabla ℓ_{i}^{T} \nabla V_{iF}^{*} - 2 Q_{iF} {\overset{\cdot}{e}}_{i} - g_{i}^{T} \nabla V_{iF}^{*} u_{i} - \nabla V_{iF}^{*} c_{i} . \end{matrix}

(13)

Based on costate (13) and ${\overset{\cdot}{x}}_{2} = L (x) + \underset{v = 1}{\sum^{n}} G_{v} u_{v} + C^{*} (x)$ , this allows for the derivation of the leader’s optimal cost function:

V_{iL}^{*} (\overset{\cdot}{E}, u_{i}, C) = \min_{u_{i}} \int_{t}^{\infty} (r_{iL} (\overset{\cdot}{E}, u_{i}, C) + λ_{i}^{T} {\overset{\cdot}{ℏ}}_{i 1}) d τ,

(14)

where $λ_{i}$ is Lagrange multiplier regarding to (13).

By an analogous procedure, the leader’s Hamiltonian function and optimal control policy can be formulated:

\begin{matrix} H_{iL} (\overset{\cdot}{E}, u_{i}, C, \nabla V_{iL}) = r_{iL} (\overset{\cdot}{E}, u_{i}, C) \\ + \nabla V_{iL}^{T} (L (x) + \sum_{v = 1}^{n} G_{v} u_{v} + C (x) - {\overset{\cdot\cdot}{x}}_{d}) \\ + λ_{i}^{T} {\overset{\cdot}{ℏ}}_{i 1}, \end{matrix}

(15)

\frac{\partial H_{iL}}{\partial u_{i}} = 0 \to u_{i}^{*} = - \frac{1}{2} R_{iLi}^{- 1} g_{i}^{T} (\nabla V_{iL}^{*} - \nabla V_{iF}^{*} λ_{i}) .

(16)

The behavior of the costate is characterized by:

{\overset{\cdot}{ℏ}}_{i 2} = - {(\frac{\partial H_{iL}}{\partial {\overset{\cdot}{e}}_{i}})}^{T} = - \nabla ℓ_{i}^{T} \nabla V_{iL}^{*} - 2 Q_{iL} {\overset{\cdot}{e}}_{i} + \frac{\partial {\overset{\cdot}{ℏ}}_{i 1}}{\partial {\overset{\cdot}{e}}_{i}} λ_{i},

(17)

\begin{matrix} {\overset{\cdot}{λ}}_{i} = - {(\frac{\partial H_{iL}}{\partial V_{iF}^{*}})}^{T} = - \frac{1}{2} g_{i} (x_{i}) R_{iF}^{- 1} R_{iLi} R_{iF}^{- 1} g_{i}^{T} \nabla V_{iF}^{*} \\ + \frac{1}{2} g_{i} (x_{i}) R_{iF}^{- 1} g_{i}^{T} (x_{i}) \nabla V_{iL}^{*} + \nabla ℓ_{i} λ_{i} + g_{i} u_{i} λ_{i} + c_{i} λ_{i} . \end{matrix}

(18)

The coupled Hamilton-Jacobi equations are obtained by substituting policies (12) and (16) into their respective Hamiltonian functions (11) and (15), leading to:

\begin{matrix} 0 = r_{iF} (\overset{\cdot}{E}, u_{i}^{*}, C_{*}) \\ + \nabla V_{iF} (L (x) + \sum_{v = 1}^{n} G_{v} u_{v} + C (x) - {\overset{\cdot\cdot}{x}}_{d}), \end{matrix}

(19)

\begin{matrix} 0 = r_{iL} (\overset{\cdot}{E}, u_{i}^{*}, C_{*}) \\ + \nabla V_{iL} (L (x) + \sum_{v = 1}^{n} G_{v} u_{v} + C (x) - {\overset{\cdot}{x}}_{d}) \\ + λ_{i}^{T} {\overset{\cdot}{ℏ}}_{i 1} . \end{matrix}

(20)

The solution to the coupled Hamilton-Jacobi equations (19) and (20) for $\nabla V_{iF}$ and $\nabla V_{iL}$ provides the Nash-Stackelberg equilibrium, and leads to an approximate optimal control that guarantees MRM tracking performance in coordinated operations. To circumvent the challenge of solving these complex equations directly, the critic neural network approximation approach is adopted.

Approximate solution of the Nash-Stackelberg differential game-based hierarchical approximate optimal coordinated control via the implementation of critic NN

From decomposition (16), one obtains the representation:

u_{i}^{*} = u_{i 1} + u_{i 2}^{*},

(21)

where $u_{i 1}$ is handling with $ℓ_{i} (x)$ , and $u_{i 2}^{*}$ utilizes optimal compensation of coordinated operation task.

According to (6), $u_{i 1}$ can be designed as:

u_{i 1} = - (\begin{matrix} - ({\hat{f}}_{ist} e^{(- {\hat{f}}_{i τ s} x_{i 2}^{2})} + {\hat{f}}_{ico}) sgn (x_{i 2}) \\ - {\hat{f}}_{ibv} x_{i 2} - g_{i}^{- 1} {\overset{\cdot\cdot}{x}}_{id} - \frac{τ_{ico}}{γ_{ira}} \end{matrix}) .

(22)

The optimal compensation control issue is thereby recasted as a hierarchical approximate optimal control problem based on Nash-Stackelberg differential game. To address this, critic neural network is employed to approximate the corresponding cost functions (10) and (14):

V_{im}^{*} (\overset{\cdot}{E}) = W_{imcr}^{T} ϕ_{imcr} (\overset{\cdot}{E}) + ε_{imcr}, m = F, L,

(23)

where $W_{imcr}$ is denoted as the critic NN vector, $ε_{imcr}$ represents approximate error, $ϕ_{imcr} (\overset{\cdot}{E})$ is activation function.

The gradient of the expression in (23) is obtained as follows:

\nabla V_{im}^{*} (\overset{\cdot}{E}) = \nabla ϕ_{imcr} (\overset{\cdot}{E}) W_{imcr} + \nabla ε_{imcr}, m = F, L,

(24)

where $\nabla ϕ_{imcr} (\overset{\cdot}{E}) = \partial ϕ_{imcr} (\overset{\cdot}{E}) / \partial \overset{\cdot}{E}$ is gradient of activation function, $\nabla ε_{imcr}$ is gradient error.

By substituting (24) into (12) and (16), the optimal control torques are given by:

c_{i}^{*} = - \frac{1}{2} R_{iF}^{- 1} (\nabla ϕ_{iFcr} (\overset{\cdot}{E}) W_{iFcr} + \nabla ε_{iFcr}),

(25)

\begin{matrix} u_{i 2}^{*} = - \frac{1}{2} R_{iLi}^{- 1} g_{i}^{T} (\nabla ϕ_{iLcr} (\overset{\cdot}{E}) W_{iLcr} + \nabla ε_{iLcr}) \\ + \frac{1}{2} R_{iLi}^{- 1} g_{i}^{T} (\nabla ϕ_{iFcr} (\overset{\cdot}{E}) W_{iFcr} + \nabla ε_{iFcr}) λ_{i} . \end{matrix}

(26)

Substituting (25), (26) into (11) and (15), one yields:

\begin{matrix} H_{iF} (\overset{\cdot}{E}, u_{i}, C, W_{iFcr}) = r_{iF} (\overset{\cdot}{E}, u_{i}, C) + \nabla ϕ_{iFcr} (\overset{\cdot}{E}) \\ W_{iFcr} (L (x) + \sum_{v = 1}^{n} G_{v} u_{v} + C (x) - {\overset{\cdot\cdot}{x}}_{d}) = e_{iHF}, \end{matrix}

(27)

\begin{matrix} H_{iL} (\overset{\cdot}{E}, u_{i}, C, W_{iLcr}) = r_{iL} (\overset{\cdot}{E}, u_{i}, C) + \nabla ϕ_{iLcr} (\overset{\cdot}{E}) W_{iLcr} \\ (L (x) + C (x) - {\overset{\cdot\cdot}{x}}_{d} + \sum_{v = 1}^{n} G_{v} u_{v}) \\ + (- \nabla ℓ_{i}^{T} \nabla ϕ_{iFcr} (\overset{\cdot}{E}) W_{iFcr} \\ - 2 Q_{iF} {\overset{\cdot}{e}}_{i} - g_{i}^{T} \nabla ϕ_{iFcr} (\overset{\cdot}{E}) W_{iFcr} u_{i} \\ - \nabla ϕ_{iFcr} (\overset{\cdot}{E}) W_{iFcr} c_{i}) \\ λ_{i} = e_{iHL}, \end{matrix}

(28)

where $e_{iHF}$ and $e_{iHL}$ are residual errors.

The estimated optimal cost function is:

{\hat{V}}_{im}^{*} (\overset{\cdot}{E}) = {\hat{W}}_{imcr}^{T} ϕ_{imcr} (\overset{\cdot}{E}), m = F, L .

(29)

According to (25), (26) as well as (29), approximate optimal control policies are:

{\hat{c}}_{i}^{*} = - \frac{1}{2} R_{iF}^{- 1} (\nabla ϕ_{iFcr} (\overset{\cdot}{E}) {\hat{W}}_{iFcr}),

(30)

\begin{matrix} {\hat{u}}_{i 2}^{*} = - \frac{1}{2} R_{iLi}^{- 1} g_{i}^{T} (\nabla ϕ_{iLcr} (\overset{\cdot}{E}) {\hat{W}}_{iLcr}) \\ + \frac{1}{2} R_{iLi}^{- 1} g_{i}^{T} (\nabla ϕ_{iFcr} (\overset{\cdot}{E}) {\hat{W}}_{iFcr}) λ_{i} . \end{matrix}

(31)

Applying (30) and (31) into (27) and (28), the approximated Hamiltonian functions are given by:

\begin{matrix} {\hat{H}}_{iF} (\overset{\cdot}{E}, {\hat{u}}_{i}, \hat{C}, {\hat{W}}_{iFcr}) = r_{iF} (\overset{\cdot}{E}, {\hat{u}}_{i}, \hat{C}) + \nabla ϕ_{iFcr} (\overset{\cdot}{E}) \\ {\hat{W}}_{iFcr} (L (x) + \sum_{v = 1}^{n} G_{v} {\hat{u}}_{v} + \hat{C} (x) - {\overset{\cdot\cdot}{x}}_{d}) = e_{iF}, \end{matrix}

(32)

\begin{matrix} {\hat{H}}_{iL} (\overset{\cdot}{E}, {\hat{u}}_{i}, \hat{C}, {\hat{W}}_{iLcr}) = r_{iL} (\overset{\cdot}{E}, {\hat{u}}_{i}, \hat{C}) + \nabla ϕ_{iLcr} (\overset{\cdot}{E}) {\hat{W}}_{iLcr} \\ (L (x) + \hat{C} (x) - {\overset{\cdot\cdot}{x}}_{d} + \sum_{v = 1}^{n} G_{v} {\hat{u}}_{v}) + (- \nabla ℓ_{i}^{T} \nabla ϕ_{iFcr} (\overset{\cdot}{E}) {\hat{W}}_{iFcr} \\ - 2 Q_{iF} {\overset{\cdot}{e}}_{i} - g_{i}^{T} \nabla ϕ_{iFcr} (\overset{\cdot}{E}) {\hat{W}}_{iFcr} {\hat{u}}_{i} - \nabla ϕ_{iFcr} (\overset{\cdot}{E}) {\hat{W}}_{iFcr} {\hat{c}}_{i}) \\ λ_{i} = e_{iL}, \end{matrix}

(33)

The approximated Hamiltonian error functions $e_{iF}$ as well as $e_{iL}$ are defined as follow equation:

e_{im} = {\hat{H}}_{im} - H_{im}, m = F, L,

(34)

where $e_{im} = {\hat{H}}_{im}$ is gotten from (27), (28), (32) and (33).

When defining ${\tilde{W}}_{imcr} = W_{imcr} - {\hat{W}}_{imcr},$ it can be deduced that $e_{im} = e_{iHm} - {\tilde{W}}_{imcr}^{T} \nabla ϕ_{imcr} (\overset{\cdot}{E}) \overset{\cdot\cdot}{E}$ , by (27), (28), (32), (33) with (34). According to gradient decent, define residual error function $E_{im} = \frac{1}{2} e_{im}^{2}$ . To minimize this cost function, the critic NN weights are adjusted according to the following update law:

{\overset{\cdot}{\hat{W}}}_{imcr} = - ς e_{im} \nabla ϕ_{imcr} (\overset{\cdot}{E}) \overset{\cdot\cdot}{E},

(35)

where $ς_{imcr}$ is updated rate of critic NN. $ξ_{imcr}$ represents $\nabla ϕ_{imcr} (\overset{\cdot}{E}) \overset{\cdot\cdot}{E}$ , and positive constant $ξ_{iLm}$ that $∥ ξ_{imcr} ∥ \leq ξ_{iLm}$ is assumed. Then, we have

\begin{matrix} {\overset{\cdot}{\tilde{W}}}_{imcr} = - {\overset{\cdot}{\hat{W}}}_{imcr} = ς_{imcr} e_{im} \nabla ϕ_{imcr} (\overset{\cdot}{E}) \overset{\cdot\cdot}{E} \\ = ς_{imcr} (e_{iHm} - {\tilde{W}}_{imcr}^{T} ξ_{imcr}) ξ_{imcr} . \end{matrix}

(36)

Remark 3. The convergence properties of the critic neural network are contingent upon the satisfaction of the persistent excitation condition. This condition, which ensures adequate exploration of the system’s dynamic, is intrinsically ensured within the context of trajectory tracking control task due to the continuous time-varying nature of the reference path.

Theorem 1. Considered $V_{im}$ which is approximated by (23), with $W_{imcr}$ , and ${\hat{V}}_{im}$ given by (29) built with ${\hat{W}}_{imcr}$ . Updating the critic NN weight via law (35) ensures UUB of weight approximation error.

Proof. The following Lyapunov function candidate is selected for stability analysis:

L_{im} (t) = \frac{1}{2 ς_{imcr}} {\tilde{W}}_{imcr}^{T} {\tilde{W}}_{imcr}, m = F, L .

(37)

Time derivative of $L_{im} (t)$ has:

\begin{matrix} {\overset{\cdot}{L}}_{im} (t) = \frac{1}{ς_{imcr}} {\tilde{W}}_{imcr}^{T} {\overset{\cdot}{\tilde{W}}}_{imcr} \\ = {\tilde{W}}_{imcr}^{T} (e_{iHm} - {\tilde{W}}_{imcr}^{T} ξ_{imcr}) ξ_{imcr} \\ = {\tilde{W}}_{imcr}^{T} e_{iHm} ξ_{imcr} - ∥ {\tilde{W}}_{imcr}^{T} ξ_{imcr} ∥^{2} \\ \leq \frac{1}{2} e_{iHm}^{2} - \frac{1}{2} ∥ {\tilde{W}}_{imcr}^{T} ξ_{imcr} ∥^{2}, m = F, L, \end{matrix}

(38)

where ${\overset{\cdot}{L}}_{im} (t) \leq 0$ when $e_{iHm}$ lies outside $Ω_{im} = {{\tilde{W}}_{imcr} : ∥ {\tilde{W}}_{imcr} ∥ \leq \frac{e_{iHm}}{ξ_{iLm}}}$ . Proof complete.

According to (22) and the hierarchical optimal coordinated control (31), ${\hat{u}}_{i}^{*}$ is:

\begin{matrix} {\hat{u}}_{i}^{*} = u_{i 1} + {\hat{u}}_{i 2}^{*} \\ = - (\begin{matrix} - ({\hat{f}}_{ist} e^{(- {\hat{f}}_{i τ s} x_{i 2}^{2})} + {\hat{f}}_{ico}) sgn (x_{i 2}) \\ - {\hat{f}}_{ibv} x_{i 2} - g_{i}^{- 1} {\overset{\cdot\cdot}{x}}_{id} - \frac{τ_{ico}}{γ_{ira}} \end{matrix}) \\ - \frac{1}{2} R_{iLi}^{- 1} g_{i}^{T} (\nabla ϕ_{iLcr} (\overset{\cdot}{E}) {\hat{W}}_{iLcr}) \\ + \frac{1}{2} R_{iLi}^{- 1} g_{i}^{T} (\nabla ϕ_{iFcr} (\overset{\cdot}{E}) {\hat{W}}_{iFcr}) λ_{i} . \end{matrix}

(39)

Theorem 2. Under the hierarchical control policy (39) derived from the Nash-Stackelberg differential game, the closed-loop dynamics of the MRM system (1) achieve position tracking error UUB during coordinated operations.

Proof. The following Lyapunov function candidate is selected for stability analysis:

V_{Ly} = \sum_{i = 1}^{n} \sum_{m}^{L, F} V_{im}^{*} .

(40)

Derivative (40) as follow equation:

\begin{matrix} {\overset{\cdot}{V}}_{iLy} (t) = \\ \sum_{m}^{L, F} ({(\nabla V_{im}^{*})}^{T} (L (x) + \sum_{v = 1}^{n} G_{v} u_{v} + C (x) - {\overset{\cdot\cdot}{x}}_{d})) . \end{matrix}

(41)

Base on HJ equations in (19) as well as (20), one has:

\begin{matrix} {(\nabla V_{iF}^{*})}^{T} (ℓ_{i} (x) - {\overset{\cdot\cdot}{x}}_{id}) = - {\overset{\cdot}{E}}^{T} Q_{iF} \overset{\cdot}{E} + \frac{1}{4} {(\nabla V_{iF}^{*})}^{T} \\ g_{i} R_{iF}^{- 1} g_{i}^{T} (\nabla V_{iF}^{*}) + \frac{1}{4} {(\nabla V_{iF}^{*})}^{T} c_{i} R_{iFi}^{- 1} c_{i}^{T} (\nabla V_{iF}^{*}) . \end{matrix}

(42)

\begin{matrix} {(\nabla V_{iL}^{*})}^{T} (ℓ_{i} (x) - {\overset{\cdot\cdot}{x}}_{id}) = - {\overset{\cdot}{E}}^{T} Q_{iL} \overset{\cdot}{E} + \frac{1}{4} {(\nabla V_{iL}^{*})}^{T} g_{i} \\ R_{iL}^{- 1} g_{i}^{T} (\nabla V_{iL}^{*}) + \frac{1}{4} {(\nabla V_{iL}^{*})}^{T} c_{i} R_{iLi}^{- 1} c_{i}^{T} (\nabla V_{iL}^{*}) + λ_{i}^{T} {\overset{\cdot}{ℏ}}_{i 1} . \end{matrix}

(43)

Combining (42), (43) into (41), we obtain:

\begin{matrix} {\overset{\cdot}{V}}_{iLy} (t) = - {\overset{\cdot}{E}}^{T} (Q_{iF} + Q_{iL}) \overset{\cdot}{E} + \frac{1}{4} {(\nabla V_{iF}^{*})}^{T} g_{i} R_{iF}^{- 1} g_{i}^{T} \\ (\nabla V_{iF}^{*}) + \frac{1}{4} {(\nabla V_{iF}^{*})}^{T} c_{i} R_{iFi}^{- 1} c_{i}^{T} (\nabla V_{iF}^{*}) + \frac{1}{4} {(\nabla V_{iL}^{*})}^{T} \\ g_{i} R_{iL}^{- 1} g_{i}^{T} (\nabla V_{iL}^{*}) + \frac{1}{4} {(\nabla V_{iL}^{*})}^{T} c_{i} R_{iLi}^{- 1} c_{i}^{T} (\nabla V_{iL}^{*}) \\ + λ_{i}^{T} {\overset{\cdot}{ℏ}}_{i 1} + {(\nabla V_{iF}^{*} + \nabla V_{iL}^{*})}^{T} (\sum_{v = 1}^{n} G_{v} u_{v} + C (x)) . \end{matrix}

(44)

Considering (44), one obtains:

\begin{matrix} {\overset{\cdot}{V}}_{iLy} (t) = - {\overset{\cdot}{E}}^{T} (Q_{iF} + Q_{iL}) \overset{\cdot}{E} - {(\nabla V_{iF}^{*})}^{T} (g_{i} (u_{i}^{*} - {\hat{u}}_{i})) \\ - {(\nabla V_{iL}^{*})}^{T} (c_{i}^{*} - {\hat{c}}_{i}) + \frac{1}{4} {(\nabla V_{iF}^{*})}^{T} g_{i} R_{iF}^{- 1} g_{i}^{T} (\nabla V_{iF}^{*}) \\ + \frac{1}{4} {(\nabla V_{iF}^{*})}^{T} c_{i} R_{iFi}^{- 1} c_{i}^{T} (\nabla V_{iF}^{*}) + \frac{1}{4} {(\nabla V_{iL}^{*})}^{T} g_{i} R_{iL}^{- 1} g_{i}^{T} \\ (\nabla V_{iL}^{*}) + \frac{1}{4} {(\nabla V_{iL}^{*})}^{T} c_{i} R_{iLi}^{- 1} c_{i}^{T} (\nabla V_{iL}^{*}) + λ_{i}^{T} {\overset{\cdot}{ℏ}}_{i 1} . \end{matrix}

(45)

Then, substituting (41) into (45), we have:

\begin{matrix} {\overset{\cdot}{V}}_{iLy} (t) = - {\overset{\cdot}{E}}^{T} (Q_{iF} + Q_{iL}) \overset{\cdot}{E} + \frac{1}{4} {(\nabla V_{iF}^{*})}^{T} g_{i} R_{iF}^{- 1} g_{i}^{T} \\ (\nabla V_{iF}^{*}) + \frac{1}{4} {(\nabla V_{iF}^{*})}^{T} c_{i} R_{iFi}^{- 1} c_{i}^{T} (\nabla V_{iF}^{*}) + \frac{1}{4} {(\nabla V_{iL}^{*})}^{T} \\ g_{i} R_{iL}^{- 1} g_{i}^{T} (\nabla V_{iL}^{*}) + \frac{1}{4} {(\nabla V_{iL}^{*})}^{T} c_{i} R_{iLi}^{- 1} c_{i}^{T} (\nabla V_{iL}^{*}) \\ + λ_{i}^{T} {\overset{\cdot}{ℏ}}_{i 1} + \frac{1}{2} {(\nabla ϕ_{iFcr} (\overset{\cdot}{E}) {\hat{W}}_{iFcr} + \nabla ε_{iFcr})}^{T} (g_{i} R_{iF}^{- 1} \\ (g_{i}^{T} \nabla ϕ_{iFcr} {\tilde{W}}_{iFcr} + g_{i}^{T} \nabla ε_{iFcr})) + \frac{1}{2} (\nabla ϕ_{iLcr} (\overset{\cdot}{E}) {\hat{W}}_{iLcr} \\ + \nabla ε_{iLcr})^{T} (R_{iLi}^{- 1} (\nabla ϕ_{iLcr} {\tilde{W}}_{iLcr} + \nabla ε_{iLcr})) \\ = - {\overset{\cdot}{E}}_{T} (Q_{iF} + Q_{iL}) \overset{\cdot}{E} + Π_{iJ}, \end{matrix}

(46)

in which $Π_{iJ}$ has up-bound:

Π_{iJ} \leq π_{iJ},

(47)

where $π_{iJ}$ is a constant.

Combining (46), ${\overset{\cdot}{V}}_{iLy} (t)$ has:

\begin{matrix} {\overset{\cdot}{V}}_{iLy} (t) \leq - {\overset{\cdot}{E}}_{T} (Q_{iF} + Q_{iL}) \overset{\cdot}{E} + π_{iJ} \\ \leq - λ_{\min} (Q_{iF} + Q_{iL}) ∥ \overset{\cdot}{E} ∥^{2} + π_{iJ} . \end{matrix}

(48)

If $\overset{\cdot}{E}$ lies outside:

Ω = {\overset{\cdot}{E} : ∥ \overset{\cdot}{E} ∥ \leq \sqrt{\frac{π_{iJ}}{λ_{\min} (Q_{iF} + Q_{iL})}}},

(49)

(41) is negative. Therefore, ${\overset{\cdot}{V}}_{Ly} (t) < 0$ for any $\overset{\cdot}{E} \neq 0$ when (49) is satisfied. The position tracking error under the coordinated operation task is UUB via hierarchical optimal coordinated control (39).

UUB means regardless of the complexity of the environment, and even in the presence of unavoidable disturbance and modeling error, the robot’s tracking error will not spiral out of control. It will ultimately be confined within a predetermined, acceptable range.

Remark 4. In (8) and (9), matrices $Q_{iF}$ and $Q_{iL}$ affect the tightness of uniformly ultimately bounded. According to (49), the eigenvalue of matrices $Q_{iF}$ and $Q_{iL}$ is big, then the compact set is small, vice versa. Furthermore, the value of the selection matrix in (8) and (9) should be as close to the identity matrix as possible to reflect the relatively equal relationship among various controllers.

Experiments

Experiment 1 setup

This study evaluates the proposed hierarchical approximate optimal control strategy, grounded in the Nash-Stackelberg differential game, against a coordinated operation vertical moving task (Figure 2) on a 7-DOF MRM platform (Figure 3). The experiment is designed to fulfill dual control objectives: ensuring high-fidelity position tracking and minimizing control effort during coordinated manipulation. The proposed control algorithm, though initially developed in continuous time, necessitates discrete implementation during experimental procedures to enable real-time control. Specifically, the robotic control system is integrated within the Robot Operating System (ROS) framework on a host computer, which interfaces with the Sawyer controller to handle experimental data processing. This parameter sampling configuration substantially improves the online processing efficiency of the robotic system while conforming to real-time coding standards. In this experiment, the sampling frequency is set as 1000 Hz, that means controller processing 1000 sampling data every second.

Figure 2.

Experimental with coordinated operation task.

Figure 3.

Experimental platform setup.

Relevant system dynamics and controller configurations are provided in Table 1. The modular robot manipulator exhibits seven degree-of-freedom. Leveraging the forward computation characteristic of ADP, the curse of dimensionality is effectively circumvented. Each neural network, which comprises a single hidden layer with five neurons, entails a computational cost quantified by the number of operations performed per inference $O (5 s_{d})$ , where $s_{d}$ denotes the input dimension (e.g. 7 for position). The critic NN is instantiated using radial basis function neural network (RBFNN), where the activation function defined in (23) is $ϕ_{c} = e^{\frac{- {(\overset{\cdot}{E} - ψ)}^{T} (\overset{\cdot}{E} - ψ)}{ℓ}}$ characterized by a predefined bandwidth $ψ$ and central vector $ℓ$ .

Table 1.

Parameter definition.

Parameter type	Name	Value	Name	Value	Name	Value
Model parameters	${\hat{f}}_{ibv}$	$0.012 Nms / rad$	${\hat{f}}_{ico}$	$0.03 Nm$	${\hat{f}}_{ist}$	$0.04 Nm$
Model parameters	${\hat{f}}_{i τ s}$	$20 s^{2} / {rad}^{2}$	$γ_{ira}$	$100$	$I_{imo}$	$0.12 {kgcm}^{2}$
Control parameters	$b_{ip 1}$	$0.08 Nms / rad$	$b_{ip 2}$	$0.11 mNm$	$b_{ip 3}$	$0.18 mNm$
	$b_{ip 4}$	$63 s^{2} / {rad}^{2}$	$b_{ipd}$	$0.71$	$b_{idc}$	$6.83$
	$ς_{iLcr}$	$0.99$	$n$	$7$	$ς_{iFcr}$	$0.96$
	$ξ_{iLL}$	$8.63$	$π_{iJ}$	$10.06$	$ξ_{iLF}$	$7.23$
	$R_{iF}$	$0.89 I$	$R_{iFv}$	$0.82 I$	$Q_{iF}$	$0.86 I$
	$R_{iL}$	$0.92 I$	$R_{iLv}$	$0.98 I$	$Q_{iL}$	$0.90 I$

Experiment 1 results

A comparative evaluation is conducted to validate the proposed method, encompassing an existing learning-based optimal control approach without Nash-Stackelberg differential game (e.g. distributed optimal control via ADP^27,28) and the novel Nash-Stackelberg differential game-based hierarchical approximate optimal coordinated control framework introduced herein.

(1) Position tracking

The experimental results, as presented in Figures 4 through 6, demonstrate a significant improvement in tracking accuracy within both Cartesian and joint spaces when utilizing the proposed Nash-Stackelberg game-theoretic coordinated control framework. Compared to conventional control method, the error magnitudes achieved under this approach are markedly lower. Importantly, although trajectory errors tend to increase when navigating sharp corners, the proposed strategy effectively constrains these deviations within a narrow margin during steady state operation. This robust performance is largely attributed to the effective compensation provided by the Stackelberg game.

Figure 4.

Position tracking in Cartesian space via existed optimal control (right) and developed control approach (left).

Figure 5.

First joint of position tracking via existed optimal control (lower) and developed control approach (upper).

Figure 6.

Joint position tracking error via existed optimal control (lower) and developed control approach (upper).

(2) Control torque

Figure 7 illustrates the control torque responses observed during the coordinated operation task, employing both the existing and proposed Nash-Stackelberg differential game-based hierarchical approximate optimal coordinated control strategies. The proposed hierarchical control framework, rooted in the Nash-Stackelberg differential game, effectively maintains control torque transients within predefined safe operational limits. This capability stems from its inherent approximate optimal control mechanism, which is specifically designed to optimize output torques, thereby ensuring both stability and efficiency during task execution.

Figure 7.

Control torque via existed optimal control (lower) and developed control approach (upper).

(3) Interaction torque

An analysis of interaction torque is presented in Figure 8. The proposed hierarchical control, leveraging the Nash-Stackelberg differential game, demonstrates superior torque regulation capability. It maintains interaction torque within a stable, reasonable bound and effectively attenuates chattering. This improvement stems from the game-theoretic mechanism’s ability to ensure operational security and seamlessly reconcile the dynamic coupling between the MRM and its task. Moreover, the integrated approximate optimal control further minimizes the torque magnitude, particularly during trajectory changing suddenly.

Figure 8.

Interaction torque curves via proposed control approach.

(4) Critic NN weight

The evolution of the critic neural network weights, as depicted in Figure 9, reveals a direct correlation with the control torque. This observed dynamic correspondence confirms the NN’s capability to encode optimal control policies in real-time. The initial weights of critic NN are all 0 and learning rates are 0.99 and 0.96 for leader and followers. $ψ = [- 3, - 1.5, 0, 1.5, 3], ℓ = 3 .$

Figure 9.

First joint of critic NN weight curves via proposed control approach.

Experiment 2

To further validate the Nash-Stackelberg differential game-based approach to approximate optimal coordinated control, a more demanding experiment involving a complex trajectory is conducted. To verify the robustness of the proposed algorithm, collision is applied on MRM for around 22 s.

(1) Position tracking

The experimental results in Figures 10 and 11 reveal a notable enhancement in tracking precision in Cartesian and joint space under the proposed Nash-Stackelberg game-theoretic coordinated control framework. Its error amplitude is substantially lower, while errors naturally increase at collision happened, the proposed scheme successfully confines them within a tight bound during steady-state phases, thanks to the robustness and effective compensation of the IDC dynamic.

(2) Control torque

Figure 10.

Position tracking in Cartesian space via developed control approach.

Figure 11.

Joint position tracking error via developed control approach.

Figure 12 is control torque under coordinated operation task via proposed Nash-Stackelberg differential game-based hierarchical approximate optimal coordinated control. The proposed hierarchical control strategy, grounded in the Nash-Stackelberg differential game, ensures the restraint of control torque transients within safe operational bounds even collision happened. This performance is achieved through its inherent approximate optimal control mechanism, which explicitly addresses the optimization of output torques.

Figure 12.

Control torque via developed control approach.

(3) Critic NN weight

Figure 13 presents the evolution of the critic neural network weights associated with joint 0. The weight parameters exhibit stable convergence within an acceptable range, thereby validating the feasibility of the proposed control approach.

Figure 13.

First joint of critic NN weight curves via proposed control approach.

Conclusion

This study develops a hierarchical approximate optimal coordinated control scheme for MRMs modeled via joint torque feedback, using ADP. Within the Nash-Stackelberg differential game framework, the coordinated control issue is reformulated as a Stackelberg differential game, where the MRM acts as the leader under a Nash differential game formulation and the coordination objective serves as the follower. The approximation of the nonlinear cost function is achieved via critic neural network, thereby deriving an optimal control strategy. Both rigorous theoretical analysis and comprehensive experimental results confirm the efficacy of the proposed control scheme. Computational complexity and scalability constraint continue to pose substantial obstacles to the practical deployment of adaptive dynamic programming and reinforcement learning methodologies in real-world application. This challenge is particularly pronounced in MRM systems engaged in coordinated manipulation tasks, where the demand for real-time computation intensifies. Achieving an optimal balance between high-precision task execution and computational efficiency in MRM system remains a critical challenge that requires immediate resolution.

Footnotes

Handling Editor: Aarthy Esakkiappan

ORCID iD

Tianjiao An

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work is supported by the Scientific Technological Development Plan Project in Jilin Province of China (20260602029RC).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Whitman

Travers

Choset

. Learning modular robot control policies. IEEE Trans Robot 2023; 39(5): 4095–4113.

. Morphology transformation of underwater self-reconfigurable modular robots via heterogeneous decomposition and distributed control. IEEE Trans Autom Sci Eng 2025; 22: 10698–10712.

Dong

, et al. Event-triggered mixed Nonzero-Sum game optimal control for modular robotic manipulator performing coordinated operation tasks. IEEE Trans Neural Netw Learn Syst 2025; 36(12): 20371–20385.

Zhou

Wang

, et al. A hybrid control strategy for grinding and polishing robot based on adaptive impedance control. Adv Mech Eng 2021; 13: 16878140211004034.

Zahedi

Sengupta

Kambhampati

. A game-theoretic model of trust in human–robot teaming: guiding human observation strategy for monitoring robot behavior. IEEE Trans Hum Mach Syst 2025; 55(1): 37–47.

Lin

Beling

Cogill

. Multiagent inverse reinforcement learning for two-person zero-sum games. IEEE Trans Games 2018; 10(1): 56–68.

Song

Lewis

. Robust optimal control for disturbed nonlinear zero-sum differential games based on single NN and least squares. IEEE Trans Syst Man Cybern Syst 2020; 50(11): 4009–4019.

Chen

Ding

. Finite-time control for double-layer Peltier system based on finite-time observer. Adv Mech Eng 2019; 11: 1687814019836852.

Zheng

Zhang

Yuan

. Nonzero-sum pursuit-evasion game control for spacecraft systems: a Q-learning method. IEEE Trans Aerosp Electron Syst 2023; 59(4): 3971–3981.

10.

Wang

, et al. Cooperative differential game-based optimal control and its application to power systems. IEEE Trans Ind Inform 2020; 16(8): 5169–5179.

11.

Zhang

Cai

. Value iteration-based cooperative adaptive optimal control for multi-player differential games with incomplete information. IEEE/CAA J Automat Sin 2024; 11(3): 690–697.

12.

Dong

, et al. User-led modular robot manipulator systems interaction tasks-oriented hierarchical approximate optimal control: a stackelberg-pareto differential game perspective. IEEE Trans Autom Sci Eng 2025; 22: 17801–17813.

13.

Chen

Lei

, et al. A Stackelberg game framework for energy internet system by operator approach. IEEE Trans Netw Sci Eng 2025; 12(4): 2942–2956.

14.

Yuwono

Schwung

. Distributed Stackelberg strategies in state-based potential games for autonomous decentralized learning manufacturing systems. IEEE Trans Syst Man Cybern Syst 2025; 55: 8112–8125. https://doi.org/10.1109/TSMC.2025.3602958

15.

Yang

Xie

Vasilakos

. Noncooperative and cooperative optimization of electric vehicle charging under demand uncertainty: a robust Stackelberg game. IEEE Trans Vehicular Technol 2016; 65(3): 1043–1058.

16.

Jiang

, et al. Demand response flexibility potential trading in smart grids: a multileader multifollower Stackelberg game approach. IEEE Trans Syst Man Cybern Syst 2023; 53(5): 2664–2675.

17.

Belgana

Rimal

Maier

. Open energy market strategies in microgrids: a Stackelberg game approach based on a hybrid multiobjective evolutionary algorithm. IEEE Trans Smart Grid 2015; 6(3): 1243–1252.

18.

Halabi

Wahab

Al Mallah

, et al. Protecting the internet of vehicles against advanced persistent threats: a Bayesian Stackelberg game. IEEE Trans Reliab 2021; 70(3): 970–985.

19.

Shen

Wang

Zhu

, et al. Data-driven event-triggered adaptive dynamic programming control for nonlinear systems with input saturation. IEEE Trans Cybern 2024; 54(2): 1178–1188.

20.

Qiao

Zhao

Wang

, et al. Action-dependent heuristic dynamic programming with experience replay for wastewater treatment processes. IEEE Trans Ind Inform 2024; 20(4): 6257–6265.

21.

Dong

Zhou

, et al. Model-free optimal decentralized sliding mode control for modular and reconfigurable robots based on adaptive dynamic programming. Adv Mech Eng 2019; 11: 1687814019896923.

22.

Gong

Yang

Liu

, et al. Distributed dynamic event-triggered control for multiagent systems under FDI attack via ESN-based adaptive dynamic programming. IEEE Trans Cybern 2025; 55(8): 3663–3674.

23.

Zhang

Z-X

Xie

, et al. An unknown multiplayer nonzero-sum game: prescribed-time dynamic event-triggered control via adaptive dynamic programming. IEEE Trans Autom Sci Eng 2025; 22: 8317–8328.

24.

Zhang

Zhao

, et al. Distributed adaptive dynamic programming for consensus control of multiagent systems within hierarchical Stackelberg–Nash game framework. IEEE Trans Syst Man Cybern Syst 2025; 55(6): 4286–4300.

25.

Lin

Zhao

Liu

. Event-triggered robust adaptive dynamic programming for multiplayer Stackelberg-Nash games of uncertain nonlinear systems. IEEE Trans Cybern 2024; 54(1): 273–286.

26.

Liu

Goldenberg

. Design, analysis, and control of a spring-assisted modular and reconfigurable robot. IEEE/ASME Trans Mechatron 2011; 16(4): 695–706.

27.

Zhang

Zhao

Liu

. Distributed optimal containment control of wheeled mobile robots via adaptive dynamic programming. IEEE Trans Syst Man Cybern Syst 2025; 55(9): 5876–5886.

28.

Jiang

, et al. Event-triggered-based optimal control for reconfigurable robot via mixed nonzero-sum game. IFAC-PapersOnLine 2025; 59(35): 332–337.