Abstract
Deep learning has been widely used within learning algorithms for robotics. One disadvantage of deep networks is that these networks are black-box representations. Therefore, the learned approximations ignore the existing knowledge of physics or robotics. Especially for learning dynamics models, these black-box models are not desirable as the underlying principles are well understood and the standard deep networks can learn dynamics that violate these principles. To learn dynamics models with deep networks that guarantee physically plausible dynamics, we introduce physics-inspired deep networks that combine first principles from physics with deep learning. We incorporate Lagrangian mechanics within the model learning such that all approximated models adhere to the laws of physics and conserve energy. Deep Lagrangian Networks (DeLaN) parametrize the system energy using two networks. The parameters are obtained by minimizing the squared residual of the Euler–Lagrange differential equation. Therefore, the resulting model does not require specific knowledge of the individual system, is interpretable, and can be used as a forward, inverse, and energy model. Previously these properties were only obtained when using system identification techniques that require knowledge of the kinematic structure. We apply DeLaN to learning dynamics models and apply these models to control simulated and physical rigid body systems. The results show that the proposed approach obtains dynamics models that can be applied to physical systems for real-time control. Compared to standard deep networks, the physics-inspired models learn better models and capture the underlying structure of the dynamics.
Keywords
1. Introduction
During the last 5 years, deep learning has shown the potential to fundamentally change the use of learning in robotics. Currently, many robot learning approaches involve a deep network as part of their algorithm. The network either represents a policy that selects the actions, a dynamics model that predicts the next state, or a state estimator that extracts the relevant features from unstructured observations. Initially, many of these approaches were only applicable to simulated environments due to the large amounts of data required to train the networks. When using massive parallelized simulations, these methods achieved astonishing results (Heess et al., 2017). By now, these learning algorithms have been improved and started to be applied to real-world systems (Akkaya et al., 2019; Haarnoja et al., 2018). On the physical systems, the deep network approaches have not bypassed classical robotics techniques yet, but have shown very promising results achieving comparable results as classical methods.
Within many proposed algorithms, deep networks have replaced analytic models and other function approximators due to their simplicity, generic applicability, scalability, high model capacity, and widespread availability of GPU’s enabling fast training and evaluation. The generic applicability of these black-box models combined with the high model capacity is a curse and blessing. On the one side, this combination enables the learning of arbitrary functions with high fidelity. However, this combination is also susceptible to overfit to the data without retrieving the underlying structure. Furthermore, the black-box nature of standard deep networks prevents including prior knowledge from first-order principles. This limitation is especially problematic for robotics as the overfitting to spurious data can lead to unpredictable behaviors damaging the physical system. The problem is also made unnecessarily harder as all existing knowledge of robotics and mechanics is ignored.
In this article, we propose a new approach that combines existing knowledge with deep networks. This combination enables to learn better representations for robotics and retains the advantages of deep networks. To learn physically plausible continuous-time dynamics models of rigid body systems, we combine Lagrangian mechanics with deep networks. The proposed Deep Lagrangian Networks (DeLaN) use two deep networks to parameterize the kinetic and potential energy (Lutter et al., 2019). The network parameters are learned by minimizing the squared residual of the Euler–Lagrange differential equation. The resulting dynamics models are guaranteed to evolve as a mechanical system and conserve the system energy. Therefore, these models achieve better long-term predictions and control performance than the standard black-box models. The resulting physics-inspired models share many of the characteristics of analytic models without requiring specific knowledge about the individual system. For example, DeLaN models are interpretable and enable the computation of the gravitational forces, the momentum, and the system energy. Previously, computing this decomposition was only possible using the analytic models with the system parameters. DeLaN also enables the computation of the forward and inverse models with the same parameters. These characteristics are in stark contrast to standard black-box models. Such black-box models only obtain either the forward or the inverse model and cannot compute the different physical quantities as these need to be learned unsupervised. Due to these advantages of physics-inspired dynamics models, many variants have been proposed (Greydanus et al., 2019; Gupta et al., 2019; Zhong et al., 2019; Saemundsson et al., 2020; Cranmer et al., 2020).
1.1. Contribution
The contribution of this article is the presentation of a model learning framework that combines the existing knowledge of mechanics with deep networks. To highlight the possibilities of this approach for learning dynamics model, we describe Deep Lagrangian Networks (DeLaN) (Lutter et al., 2019). This model learning approach combines deep learning with Lagrangian mechanics to learn a physically plausible model by minimizing the residual of the Euler–Lagrange ordinary differential equation. In contrast to our previous papers (Lutter et al., 2019; Lutter and Peters 2019), which mainly focused on specific algorithmic ideas, this article 1. Consolidates the existing literature on physics-inspired model learning which has been introduced since the initial presentation of DeLaN. We summarize the individual contributions and merge the variants into a single big picture. 2. Extends the previous experimental evaluation and provides in-depths comparisons of the different variants of physics-inspired networks. We evaluate the control performance of the learned models on the physical system using inverse dynamics control and energy control. In addition, the performance is compared to system identification and black-box model learning. 3. Provides an elaborate discussion on the current shortcomings of physics-inspired networks and highlight possibilities to overcome these limitations.
1.2. Outline
To provide a self-contained overview about physics-inspired deep networks for learning dynamics models, we briefly summarize the related work (Section 2), prior approaches for learning dynamics models of rigid body systems as well as the basics of Lagrangian and Hamiltonian mechanics (Section 3.2). Subsequently, we introduce physics-inspired networks derived from Lagrangian and Hamiltonian mechanics as well as the existing variants (Section 4). Section 5 presents the experimental results of applying these models to model-based control and compares the performance to system identification as well as deep network dynamics models. Finally, Section 6 discusses the experimental results, highlights the limitations of physics-inspired networks, and summarizes the contributions of this article.
2. Related Work
In the main part of this article, we focus on learning continuous-time dynamics models of mechanical systems. However, physics-inspired networks and continuous-time deep networks have been utilized for different applications areas. In this section, we want to briefly summarize the existing work on both topics outside the domain of rigid body systems and their differences.
2.1. Physics-inspired deep networks
Incorporating knowledge of physics within deep networks has been approached by introducing conservation laws or symmetries within the network architecture. Both approaches are tightly coupled due to Noether’s theorem showing that symmetries induce conservation laws. In the case of conservation laws, these laws can be incorporated by minimizing the residual of the corresponding differential equation to obtain the optimal network parameters. The combination of deep learning and differential equations has been well known for a long time and investigated in more abstract forms (Lee and Kang 1990; Meade Jr and Fernandez 1994; Lagaris et al., 1998, 2000). Using this approach, various authors proposed to use the Navier–Stokes equation (Raissi et al., 2017a; Chu et al., 2021), Schroedinger equation (Raissi et al., 2017b), Burgers equation (Holl et al., 2020), Hamilton’s equation (Greydanus et al., 2019; Zhong et al., 2019; Chen et al., 2020; Toth et al., 2019), or the Euler–Lagrange equation (Lutter et al., 2019; Qin 2020; Cranmer et al., 2020; Gupta et al., 2019).
Symmetries can be integrated within the network architecture by selecting a non-linear transformation that is either equivariant, that is, preserves the symmetry, or is invariant to specific transformations of the input. Using this approach, one can derive layers that are translational-, rotational-, scale-, and gauge equivariant (Cohen and Welling 2016; Bekkers 2019; Wang et al., 2020b; Cohen et al., 2019). These architectures are frequently used for computer vision as image classification is translational and rotational invariant (Cohen et al., 2019; Weiler and Cesa 2019; Lenc and Vedaldi 2015). Up to now, only very few papers have applied this approach to model physical phenomena (Wang et al., 2020b; Anderson et al., 2019). A different approach to symmetries was proposed by Huh et al. (2020). To obtain time translation invariance, which is equivalent to conservation of energy, this work optimized time-reversibility. Therefore, the symmetry is not incorporated in the network architecture but the optimization loss.
Besides these generic approaches utilizing symmetries and conservation laws, various authors also proposed specific architectures for individual problems. In this case, the known spatial structure of the problem is embedded within the network architecture. For example, Wang et al. (2020a) proposed a network architecture for turbulent flow predictions that incorporates multiple spatial and temporal scales. Sanchez-Gonzalez et al. (2018) used a graph network to encode the known kinematic structure and the local interactions between two links within the network structure. Similarly, Schütt et al. (2017) incorporates the local structure of molecules within the network architecture.
2.2. Continuous-time models and neural ODEs
The work on neural ordinary differential equation (ODE) by Chen et al. (2018) initiated a large surge of research on continuous-time models. The original work on neural ODE proposed a deep network with infinite depth to improve classification and density estimation. While these algorithms were not meant for modeling dynamical systems, the explicit integration step within the neural ODE led to rediscover continuous-time models for dynamical systems. Since then, neural ODE’s have been frequently mentioned as inspiration to learn continuous-time models (Saemundsson et al., 2020; Huh et al., 2020; Botev et al., 2021; Hochlehnert et al., 2021). Frequently, the term neural ODE is used interchangeably for a continuous-time model with a deep network. In this work, we will only use the term continuous-time model. One technical difference between the original neural ODE and continuous-time models is that the neural ODE uses a variable time step integrator, most commonly the Dormand–Prince method. The continuous-time models use a fixed time step integrator. For the fixed time step integrator, different authors have used the explicit Euler, the Runge–Kutta method, or symplectic integrators. For dynamics models, the fixed time step is convenient as the data is observed at a fixed time step determined by the sampling rate of digital sensors.
3. Preliminaries
We want to introduce the standard model learning techniques for dynamical systems and briefly summarize the relevant theory of Lagrangian and Hamiltonian mechanics.
3.1. Learning dynamics models
Models describing system dynamics, that is, the coupling of the system input
The continuous-time system can be combined with an integrator, for example, explicit Euler, Runge–Kutta method, or symplectic integrators, to predict the next state instead of the change of the system state. Therefore, the continuous-time model is independent of the time discretization. Depending on the chosen representation, the transfer function f and parameters The requirements and assumptions of the different approaches to obtain the dynamics model of mechanical systems. The physics-inspired networks bridge the gap between classical system identification and black-box model learning. While system identification requires knowledge of the kinematic chain, the physics-inspired networks do not require any knowledge of the specific system but obtain comparable characteristics as system identification. Physics-inspired networks also guarantee energy-conserving models and obtain the forward, inverse, and energy model simultaneously.
3.1.1. Model engineering
The most classical approach is model engineering, which is predominantly used within the industry. In this case, the transfer function f is the equations of motion and the model parameters are the physical parameters of the robot consisting of the masses, center of gravity, length, and inertia. The equations of motion have to be manually derived for each system. Frequently, one assumes perfect rigid bodies connected by ideal joints and uses Newtonian, Lagrangian, or Hamiltonian mechanics and the known structure of the systems to derive the equations. The model parameters can be either inferred using the CAD software or measured by disassembling the individual system. The latter is more precise as it incorporates the deviations due to the manufacturing process (Albu-Schäffer 2002). Furthermore, the parameters are identical for the forward and inverse model. Therefore, this approach yields the forward and inverse model simultaneously. To summarize, this approach can yield very precise forward and inverse models for rigid body systems but is labor-intensive as the parameters need to be manually inferred.
3.1.2. Data-driven system identification
Similar to model engineering, data-driven system identification uses the analytic equations of motions as the transfer function. However, the model parameters are learned from observed data rather than measured. Therefore, the equations of motions need to be manually derived but the model parameters are learned. In 1985, four different groups showed concurrently that the dynamics parameters can be obtained by linear regression using hand-designed features for rigid body kinematic chains (Khosla and Kanade 1985; Mukerjee and Ballard 1985; Atkeson et al., 1986; Gautier 1986). This approach is commonly referenced as the standard system identification technique for robot manipulators by the textbooks (Siciliano and Khatib 2016). However, this approach cannot guarantee physically plausible parameters as the dynamics parameters have additional constraints. For example, this approach can yield negative masses, an inertia matrix that is not positive definite, or violate the parallel axis theorem (Ting et al., 2006). The disadvantages of this approach are that one can only infer linear combinations of the dynamics parameters, cannot apply it to close-loop kinematics (Siciliano and Khatib 2016), and can only be applied inverse dynamics. The inverse dynamic formulation is problematic as the inverse dynamics do not necessarily have a unique solution due to friction (Ratliff et al., 2016). To overcome these shortcomings, Ting et al. (2006) proposed a projection-based approach, while many others (Traversaro et al., 2016; Wensing et al., 2017; Ledezma and Haddadin 2017; Sutanto et al., 2020; Lutter et al., 2020, 2021b; Geist and Trimpe 2021) used virtual parametrizations that guarantee physical plausible parameters. For the latter, the optimization does not simplify to linear regression but can be solved using gradient descent. To summarize, this approach only requires the equations of motions analytically and can learn the dynamical parameters from data. Therefore, this approach is not as labor-intensive as model engineering but one needs to ensure to collect “good” data for the learning.
3.1.3. Black-box model learning
While the previous approaches required knowledge about the individual kinematic chain to derive the equations of motion, the black-box approaches do not require any knowledge of the system. These approaches use any black-box function approximator as a transfer function and optimize the model parameters to fit the observed data. For example, the existing literature used Local Linear Models (Schaal et al., 2002; Haruno et al., 2001), Gaussian Mixture Models (Calinon et al., 2010; Khansari-Zadeh and Billard 2011), Gaussian Processes (Kocijan et al., 2004; Nguyen-Tuong et al., 2009; Nguyen-Tuong and Peters 2010; Romeres et al., 2016, 2019; Camoriano et al., 2016), Support Vector Machines (Choi et al., 2007; Ferreira et al., 2007), feedforward (Jansen 1994; Lenz et al., 2015; Ledezma and Haddadin 2017; Sanchez-Gonzalez et al., 2018), or recurrent neural networks (Rueckert et al., 2017; Ha and Schmidhuber 2018; Hafner et al. 2019a, 2019b) to learn the dynamics model. The black-box models obtain either the forward or inverse model and the learned model is only valid on the training data distribution. The previous methods based on the analytic equations of motions obtained both models simultaneously and generalize beyond the data distribution as the learned physical parameters are globally valid. However, the black-box models do not require assumptions about the systems and can learn systems including contacts. These previous approaches relied on assuming rigid body dynamics and could only learn the system dynamics of articulated bodies using reduced coordinates without contacts. Therefore, black-box models can be more accurate for real-world systems where the underlying assumption is not valid but is limited to the training domain and rarely extrapolate.
3.2. Lagrangian mechanics
One approach to derive equations of motion is Lagrangian mechanics. In the following, we summarize this approach as we will use it in Section 4 to propose a physics-inspired network for learning dynamics models. More specifically, we use the Euler–Lagrange formulation with non-conservative forces and generalized coordinates. For more information and the formulation using Cartesian coordinates, please refer to the textbooks (Greenwood 2006; De Wit et al., 2012; Featherstone 2007). Generalized coordinates
3.3. Hamiltonian mechanics
A different approach to deriving the equations of motions is Hamiltonian mechanics. In this case, the system dynamics are described using the state
The Hamiltonian can be computed by applying the Legendre transformation to the Lagrangian which is described by
Using the generalized momentum
The Euler–Lagrange equation (equation (6)) can be easily derived from Hamilton’s equation by substituting equation (7) into equation (9) and using the definition of the generalized momentum, that is
Many textbooks omit the generalized forces within Hamilton’s equation, but adding these generalized forces is straightforward as shown in the previous derivation.
4. Physics-inspired deep networks
A different approach to black-box model learning is to combine black-box models with physics to guarantee a physically plausible dynamics model. One combination is to use deep networks to represent the system energy and use the resulting Lagrangian to derive the equations of motion using the Euler–Lagrange differential equation. This approach was initially proposed by Lutter et al. (2019) with the presentation of Deep Lagrangian Networks (DeLaN). Since then, many papers exploring variations of these approaches have been proposed including approaches that use Hamiltonian mechanics instead of Lagrangian mechanics (Greydanus et al., 2019; Cranmer et al., 2020; Gupta et al., 2019; Zhong et al. 2019, 2020, 2021; Sanchez-Gonzalez et al., 2019; Saemundsson et al., 2020; Hochlehnert et al., 2021).
All of these models have in common, that the learned dynamics models conserve energy when the non-conservative forces can be modeled properly and the generalized coordinates are observed. Therefore, the learned model is guaranteed to adhere to one of the fundamental concepts of physics. This property is beneficial as it has been shown within prior research that naive deep network dynamics models frequently increase or decrease the system energy even when the energy should be conserved (Greydanus et al., 2019; Zhong et al., 2019; Hochlehnert et al., 2021; Saemundsson et al., 2020).
In the following, we present DeLaN (Section 4.1) and the combination of Hamiltonian mechanics and deep networks in Section 4.1. Afterwards, Section 4.3 describes all the proposed extensions of DeLaN and Hamiltonian Neural Networks (HNN). Therefore, this section provides the big picture of existing physics-inspired deep networks for learning dynamics models. The flow charts of the variants are shown in Figure 2 The flowcharts of a continuous-time forward model using a deep network and the physics-inspired networks forward models. (a) Standard deep model learning approach, where a network is used to directly predict the change in position and velocity. (b–c) Deep Lagrangian Networks which use deep networks to predict the Lagrangian 
4.1. Deep Lagrangian networks (DeLaN)
DeLaN is one instantiation of these physics-inspired deep networks. DeLaN parametrizes the mass matrix
Using this parametrization, the forward and inverse model can be derived. The forward model
The inverse model
The partial derivatives within the forward and inverse model can be computed using automatic differentiation or symbolic differentiation. See Lutter et al. (2019) for the symbolic differentiation of the mass matrix and the deep networks.
The system energy cannot be learned using supervised learning as the system energy cannot be observed. Therefore, the network weights of the kinetic and potential energy are learned unsupervised using the temporal consequences of the actions and system energy. One approach to learn the network parameters is to minimize the residual of the Euler–Lagrange differential equation. This optimization problem is described by
4.1.1. Positive-definite mass matrix
To obtain a physically plausible kinetic energy, the mass matrix has to be positive definite, that is
This constraint ensures all non-zero velocities have positive kinetic energy for all joint configurations. We obtain a positive definite mass matrix by predicting the Cholesky decomposition of the mass matrix with a small positive offset ϵ on the diagonal instead of the mass matrix directly. Therefore, the mass matrix is described by
4.1.2. Advantages of DeLaN
In contrast to the black-box model, this parametrization of the dynamics has three advantages, (1) this approach yields a physically plausible model that conserves energy, (2) is interpretable, and (3) can be used as forward, inverse, and energy model. The DeLaN model is guaranteed to evolve like a mechanical system and is passive (Spong 1987) as the forward dynamics are derived from the physics prior and the positive definite mass matrix for all model parameters. If the system is uncontrolled, that is
The model is interpretable as one can disambiguate between the different forces, for example, inertial-, centrifugal-, Coriolis, and gravitational force. This decomposition is beneficial as some model-based control approaches require the explicit computation of the mass matrix, the gravitational force, or the system energy. Furthermore, the same model parameters can be used for the forward, inverse, and energy model. Therefore, the forward and inverse models are consistent. In contrast, black-box models need to learn separate parameters for the inverse and forward model that might not be consistent and cannot obtain the system energy as these cannot be observed.
4.2. Hamiltonian neural networks (HNN)
Instead of using Lagrangian mechanics as model prior for deep networks, Greydanus et al. (2019) proposed to use Hamiltonian mechanics. In this case, the HNN parametrize the Hamiltonian with two deep networks described by
It is important to note that HNN predict the inverse of the mass matrix instead of the mass matrix as in DeLaN. Similar to DeLaN, the forward and inverse models can be derived. The forward model
The inverse model
The network parameters of the kinetic and potential energy can be obtained by minimizing the squared residual using the observed data consisting of
4.2.1. Differences to DeLaN
DeLaN and HNN share the same advantages as both models are derived from the same principle. Therefore, HNN conserve energy, are interpretable, and provide a forward, inverse, and energy model. The main difference is that DeLaN uses position and velocity while HNN uses position and momentum. Depending on the observed quantities, either model fits better than the other. A minor difference is that minimizing the residual of the Euler–Lagrange equation is identical to the inverse model loss while minimizing the residual of Hamilton’s equations is identical to the forward model loss.
From a numerical perspective, the Hamiltonian mechanics prior is slightly beneficial as the forward and inverse model only relies on the inverse of the mass matrix. Therefore, one does not need to numerically compute the inverse of the predicted matrix. Avoiding the explicit inversion makes the learning and model rollout a bit more stable. The Lagrangian mechanics prior relies on the mass matrix as well on the inverse. Therefore, the inverse of the predicted matrix has to be computed numerically. When the eigenvalues of the mass matrix approach ϵ and ϵ ≪ 1, the model rollout and the optimization of the forward model can become numerically sensitive. Therefore, it is important to choose ϵ as large as possible for the corresponding system as this limits the amplification of the acceleration.
4.3. Variations of DeLaN and HNN
Since the introduction of DeLaN (Lutter et al., 2019) and HNN (Greydanus et al., 2019), many other variants and extensions have been proposed within the literature. We provide an overview of the existing work and highlight the differences.
4.3.1. Parametrization of
and
In the previous sections, the Hamiltonian
One benefit of using a black-box
Most existing work uses standard feed-forward networks to model the system energy, the Hamiltonian, or the Lagrangian (Lutter et al., 2019; Greydanus et al., 2019; Zhong et al., 2019; Gupta et al., 2020; Saemundsson et al., 2020; Finzi et al., 2020). Other variants have also applied the physics-inspired networks to graph neural networks (Sanchez-Gonzalez et al., 2019; Cranmer et al., 2020; Botev et al., 2021). Such graph neural networks incorporate additional structure within the network architecture when the system dynamics consist of multiple identical particles without additional constraints. Therefore, these methods exhibit improved performance for modeling N-body problems.
4.3.2. Loss functions and integrators
The loss functions of DeLaN (equation (13)) and HNN (equation (17)) express the loss in terms including the acceleration, that is,
using any numerical integration approach. In the case of the explicit Euler integration, this approach is identical to the loss of equation (13) and equation (17). A common choice to compute the next step is the Runge–Kutta 4 (RK4) fixed time step integrator (Gupta et al., 2019; Greydanus et al., 2019). This loss formulation also enables a multi-step loss which has been shown to improve the performance of model predictive control for deterministic models (Lutter et al., 2021a).
A more elaborate approach has been proposed by Saemundsson et al. (2020) that combines discrete mechanics with variational integrators. This combination guarantees that even the discrete-time system conserves momentum and energy. The RK4 integration might leak or add energy due to the discrete-time approximation. The main disadvantage of the variational integrator networks is that this approach assumes a constant mass matrix. Therefore, the Coriolis and centrifugal force disappear (equation (6)) and the acceleration only depends on the position. Within the discrete mechanics literature, extensions exist to apply the variational integrator to multi-body systems with a non-constant mass matrix. However, these extensions are non-trivial and involve solving a root-finding problem within each integration step (Lee et al., 2020).
4.3.3. Feature transformation
The previous sections always used generalized coordinates or momentum to describe the system dynamics. However, this formulation can be problematic as these coordinates are unknown or unsuitable for function approximation. For example, continuous revolute joints without angular limits are problematic for function approximation due to the wrapping of the angle at ± π. This problem is commonly mitigated using sine/cosine feature transformations. Such feature transformation can be included in physics-inspired networks if the feature transforms mapping from the generalized coordinates to the features
Let g be the feature transform mapping generalized coordinates to the features
This approach is identical to adding an input layer to the neural network with the hand-crafted transformations. The feature transformation was previously introduced by Zhong et al. (2019). However, the authors only manually derived the special case for continuous angle while this approach can be easily generalized to arbitrary differentiable feature transformations.
4.3.4. Actuator models and friction
The physics-inspired networks cannot model friction directly as the learned dynamics are conservative. Incorporating friction within this model learning approach in a non-black-box fashion is non-trivial because friction is an abstraction to combine various physical effects. For robot arms in free space, the friction of the motors dominates, for mechanical systems dragging along a surface, the friction at the surface dominates, while for legged locomotion, the friction between the feet and floor dominates but also varies with time. Therefore, defining a general case for all types of friction in compliance with the Lagrangian and Hamiltonian mechanics is challenging. Various approaches to incorporate friction models analytically can be found in (Lurie 2013; Wells 1967).
Most existing works on physics-inspired networks only focus on friction caused by the actuators, which dominates for robot arms (Lutter and Peters 2019; Gupta et al., 2019; Lutter et al., 2020). In this case, the friction can be expressed using generalized coordinates and is a non-conservative force. Incorporating other types of friction than actuator friction is non-trivial as these cannot be easily expressed using the generalized force. In this case, one requires the contact point and contact Jacobian to map the contact force to the generalized force. For the actuator model, the generalized force required for DeLaN and HNN is expressed using an addition function that modulates the system input and adds friction. This function is described by
It is important to note that the system is not time-reversible when stiction is added to the dynamics as multiple motor-torques can generate the same joint acceleration (Ratliff et al., 2016).
In contrast to these white-box approaches, Gupta et al. (2019) and Zhong et al. (2019) proposed to add a black-box actuator model. For example, Gupta et al. (2019) proposed to use a black-box control matrix
Both matrices are predicted using a deep network. The network parameters of the actuator model are optimized using gradient descent. These black-box actuator models can represent more complex actuator dynamics and even system dynamics violating the assumptions of Lagrangian and Hamiltonian mechanics. However, this actuator model can also result that the potential and kinetic energy are ignored and only the black-box model dominates the predicted dynamics. To avoid that the actuator model predicts the complete system dynamics, it is beneficial to add penalties to the magnitude of the actuator during the optimization. The existing grey-box model learning literature (Lutter et al., 2020; Hwangbo et al., 2019; Allevato et al., 2020) has shown that these penalties improve the performance.
5. Experiments
In the experiments, we apply physics-inspired deep network models to learn the non-linear dynamics of simulated systems and physical systems. Within the simulation experiments, we want to test whether the different physics-inspired networks learn the underlying structure and highlight the empirical differences of the existing approaches. On the physical systems, we compare the model-based control performance of DeLaN with a structured Lagrangian for the fully actuated and under-actuated system to standard system identification techniques and black-box model learning. We only use DeLaN for the physical systems as for these systems we do not observe the momentum. Hence, only the Lagrangian physics prior is applicable. One could treat the Hamiltonian prior as a latent space problem with the momentum being the latent representation. However, this approach would effectively boil down to the Lagrangian prior. Using these experiments, we want to answer the following questions:
5.1. Experimental setup
To answer these questions, we apply the different variations of physics-inspired models to 4 different systems and compare the performance to three baselines. Within the experiments, we denote the physics-inspired networks that only use a single network to represent the Lagrangian or Hamiltonian as black-box DeLaN/HNN. When two separate networks are used to represent that mass matrix and potential energy, we refer to this approach as structured DeLaN/HNN. The detailed differences between both approaches are described in section 4.3.1. For each of the experiments, the dynamics models are learned from a fixed dataset and are trained until convergence. All evaluations are performed on a test dataset that is not contained within the training dataset.
In the following, we briefly introduce the systems and baselines. The code of Deep Lagrangian Networks (DeLaN) and Hamiltonian Neural Networks (HNN) is available at https://github.com/milutter/deep_lagrangian_networks.
5.1.1. Plants
The hyperparameters of DeLaN used for the different dynamical systems. All physical systems have a fixed sampling time of 2.0 ms. The number of training samples for the 2-DoF robot differs by dataset. See Table 2 for more details about the different datasets.
5.1.1.1. Two-link pendulum
The two-link pendulum has two continuous revolute joints, is fully actuated, and acts in the vertical x-z plane with gravity. The pendulum is simulated using Bullet (Coumans and Bai 2016).
5.1.1.2. Cartpole
The physical cartpole (Figure 3(a)) is an under-actuated system manufactured by Quanser (2018). The pendulum is passive and the cart is voltage controlled with up to 500 Hz. The linear actuator consists of a plastic cogwheel drive with high stiction. The (a) Cartpole, (b) Furuta pendulum, and (c) Barrett WAM are used for the evaluation. The Furuta pendulum and cartpole perform a swing-up using the energy controller. The Barrett WAM executes a cosine trajectory with a different frequency per joint.
5.1.1.3. Furuta pendulum
The physical Furuta pendulum (Figure 3(b)) is an under-actuated system manufactured by Quanser (2018). Instead of the linear actuator of the cartpole, the Furuta pendulum has an actuated revolute joint and a passive pendulum. The revolute joint is voltage controlled with up to 500 Hz. The main challenge with this system is the small masses and length scale of the system. These characteristics yield a very sensitive control system.
5.1.1.4. Barrett WAM
The Barrett WAM (Figure 3(c)) consists of four actuated degrees of freedom controlled via torque control with 500 Hz. The actuators are back-driveable and consist of cable drives with low gear ratios enabling fast accelerations. The joint angles sensing is on the motor-side. Therefore, any deviation between the motor position and joint position due to the slack of the cables cannot be observed. We only use the 4 degree of freedom version as the wrist and end-effector joints cannot be excited due to the limited range of motion and acceleration.
5.1.2. Baselines
We use the analytic dynamics model, system identification, and a feed-forward deep network as baselines.
5.1.2.1. Analytic model
The analytic model uses the equation of motion derived using rigid body dynamics and the system parameters, that is, masses, center of gravity, and inertias, provided by the manufacturer. In addition to the rigid body dynamics, these models are augmented with a viscous friction model.
5.1.2.2. System identification
This approach requires the knowledge of the analytic equations of motions and infers the system parameters from data. More specifically, we use the technique described by Atkeson et al. (1986). This approach showed that for rigid body kinematic trees, the inverse dynamics model is a linear model described by
5.1.2.3. Feed-Forward Network
The deep network baseline uses two separate networks, where one describes the forward dynamics and the other the inverse dynamics. This model does not necessarily generate coherent predictions as the parameters of the forward and inverse model are decoupled. Therefore, it is not guaranteed that
5.2. Model prediction experiments
For the simulated experiments, we want to evaluate whether the physics-inspired networks can learn the underlying system dynamics and recover the structure with ideal observations. Therefore, we want to observe the data fit and as well as the long-term forward predictions. Furthermore, we want to differentiate between two separate datasets, (1) a large data set with 100 k samples spanning the state domain uniformly and (2) a small dataset with only 2.5 k samples which consist of trajectories of drawing characters. For the character test dataset, the test set contains different characters than the training set. This dataset only spans a small sub-domain of the state space. The character dataset was initially introduced by Williams et al. (2008) and is available in the UCI Machine Learning Repository (Dheeru and Karra Taniskidou 2017). For training, the datasets are split into a test and training set. The reported results are reported on the test set and averaged over 5 seeds.
5.2.1. Inverse model
The normalized mean squared error (nmse) and mean valid prediction time (VPT) as well as the corresponding confidence interval averaged over 10 seeds. On average the structured Hamiltonian and Lagrangian approaches obtain better forward and inverse models than the black-box counterparts and the standard feed-forward neural network. When observing the corresponding phase-space coordinates, the Hamiltonian and Lagrangian approaches perform comparable.

(a) The learned inverse model using the character dataset averaged over 10 seeds. The test character “e,” “v,” “q” are not contained within the training set. The remaining columns show predicted force decomposition. (b) Plots the inertial force
When comparing the torque decomposition of the inertial, centrifugal, Coriolis, and gravitational forces, all models learn a good decomposition. For the unstructured models, this decomposition can be evaluated by assuming the underlying structure and evaluating the inverse model. For example, the gravitational component by evaluating
5.2.2. Forward model
The results of the forward model are summarized in Table 2 and visualized in Figure 5. Also for the forward model, the physics-inspired networks obtain a better performance on the state error than the feed-forward network. All models perform better on the small character dataset than on the large dataset as the state domain is much smaller than the uniform domain of the large dataset. For the large dataset, the black-box Lagrangian/Hamiltonian approaches are much worse compared to the structured counterparts. This is especially visible for the black-box Lagrangian. The average error and variance is so large because the mass matrix becomes nearly singular for some samples. The nearly singular mass matrix amplifies small differences yielding a very large error. The model rollouts of (a) the position, (b) velocity and (c) momentum, and (d) energy of the forward models for two test trajectories of the uniform dataset averaged over 10 seeds. The structured physics-inspired networks perform the best compared to the standard feed-forward network and the black-box counterparts. Especially the rollout of the black-box Lagrangian commonly diverges as the Hessian of the Lagrangian, which is required for computing the acceleration, becomes close to singular. This nearly singular Hessian causes exploding velocities and consequently also divergence of the estimated momentum computed via 
To compare the long-term predictions of the models, we compare the valid prediction time (VPT) (Botev et al., 2021), which is defined as the duration until the predicted rollout has a larger error than a pre-defined threshold. We define the threshold of the MSE to be 1e − 2, which corresponds to an angular error of
5.2.3. Conclusion
The simulated experiments show that the physics-inspired networks learn the underlying structure of the dynamical system. These models can accurately predict the force decomposition, momentum, and system energy. Furthermore, the physics-inspired models can learn better forward and inverse models than a standard feed-forward deep network. The structured DeLaN and HNN perform better than the black-box counterparts. The forward and inverse models of the structured DeLaN and HNN do not show any empirical differences when the corresponding phase-space coordinates are observed.
5.3. Model-based control experiments
With the experiments on the physical system, we want to evaluate the control performance of the learned models with noisy real-world data. Evaluating the control performance rather than the MSE on static datasets is the more relevant performance measure as the application of the models is control. Furthermore, it has been shown that the MSE is not a good substitute to predict the control performance of a learned model and commonly overestimates the performance (Hobbs and Hepenstal 1989; Lambert et al., 2020; Lutter et al., 2021a). To evaluate the model performance for control, we apply the learned models to inverse dynamics control and energy control. We only apply DeLaN with the structured Lagrangian to the physical systems as the potentially singular mass matrix risks damaging the physical system. HNN do not apply to the system as the momentum cannot be retrieved from the position observations while the velocity can be obtained using finite differences.
5.3.1. Inverse dynamics control
Evaluating learned models by comparing the tracking error of a model-based control law has been a well-established benchmark for evaluating the control performance of models (Nguyen-Tuong et al., 2008, 2009). In this experiment, we use inverse dynamics control as a model-based control law. This feedback controller augments the PD-control law with an additional feed-forward torque to compensate for the non-linear dynamics of the system. Therefore, the inverse dynamics control obtains a better tracking error than the standard PD control. The resulting control law is described by
The results for the simulated and physical Barrett WAM are summarized in Figure 6. In the simulation, DeLaN and the system identification perform equally well on the training velocity. When comparing generalization, the system identification approach generalizes better than DeLaN to higher velocities. This behavior is expected as system identification obtains the global system parameter while DeLaN only learns a local approximation of the mass matrix and potential energy. In comparison to the feed-forward deep network, DeLaN performs worse on the training velocity but generalizes better to higher velocities. Therefore, the deep network overfits to the training velocity. The analytic model and the system identification have a large performance gap in simulation as we use the same analytic model for simulation and the physical system but the analytic model is optimized for the physical system. (a, b) The mean squared tracking error of the inverse dynamics control following cosine trajectories for the simulated Barrett WAM. (c, d) The mean squared tracking error on the physical Barrett WAM. The system identification approach, feed-forward neural network, and DeLaN are trained offline using only the trajectories at a velocity scale of 1×. Afterward, the models are tested on the same trajectories with increased velocities to evaluate the extrapolation to new velocities.
On the physical system, the feed-forward network performs the best on the training domain but deteriorates when the velocity is increased. DeLaN performs worse than the deep network but better than the analytic model and the system identification. The analytic model and the system identification model perform nearly identical. The system identification approach is only marginally better. Both approaches generalize better compared to DeLaN and the deep network. This better generalization is expected as the system parameters are global while the other approaches use local approximations. When increasing the velocity, DeLaN and deep network dynamics model degrade in performance. In contrast to the simulation results, where DeLaN extrapolates better than the deep network, the black-box deep network obtains the better generalization on the physical system. The worse performance and generalization of DeLaN on this physical system can be explained by the assumption of rigid body dynamics. This assumption is not fulfilled due to the cable drives and the motor-side sensing. Therefore, DeLaN cannot model every phenomenon with high fidelity. However, DeLaN learns a good approximation that is better than the system identification approach with the same rigid body assumption.
5.3.2. Energy control
A different approach to test the control performance of the learned models is to apply the learned models to controlling under-actuated systems using an energy controller. More specifically we apply an energy controller to swing up the Furuta pendulum and the cartpole. This energy controller regulates the system energy rather than the position and velocities. The control law is described by
The results for the simulated and physical experiments are summarized in Figure 7. Videos of the physical experiments are available at [Link]. Within the simulation, the analytic model, the system identification model, and DeLaN achieve the successful swing-up of the cartpole and Furuta pendulum. On the physical cartpole, all approaches achieve the swing-up despite the large stiction of the linear actuator. For the physical Furuta pendulum, only the analytic model and DeLaN achieve the swing-up. The system identification model does not. The system identification model fails as the linear regression is very sensitive to the observation noise and the small condition number of the features The position θ and velocity 
5.3.3. Conclusion
The non-linear control experiments on the physical systems show that DeLaN with a structured Lagrangian can learn a good model despite the noisy observations. The resulting model can be used for closed-loop feedback control in real-time for fully actuated and under-actuated systems. For both systems categories DeLaN achieves a good control performance. It is noteworthy that DeLaN is the first model learning approach utilizing no prior knowledge of the equations of motion that can be applied to energy control. The previous black-box model learning approach could not be applied as the system energy can only be learned unsupervised.
6. Conclusion
Coming back to the initial questions of the experiments, the experimental results have showed that
Similar empirical results were also presented by (Greydanus et al., 2019; Cranmer et al., 2020; Gupta et al., 2019, 2020; Saemundsson et al., 2020; Zhong et al., 2019, 2020). When comparing the different physics prior, Hamiltonian and Lagrangian priors yield comparable models when the corresponding phase-space coordinates are observed, that is, velocities for DeLaN and impulse for HNN. When comparing structured DeLaN and HNN to the black-box DeLaN and HNN, we find that the structured approaches achieve better dynamics models (Table 2). The black-box variants struggle to obtain non-singular network Hessians for all possible system states. For the structured approaches, singular Hessians can be prevented by parametrized the kinetic energy such that the eigenvalues of the Hessian are lower bounded and do not become singular.
Despite the advantages of the physics-inspired models to standard deep networks models, the physics-inspired approaches have drawbacks that prevent the general applicability compared to feed-forward networks. In the following, we discuss these limitations.
6.1. Open challenges
Physics-inspired deep networks have two main shortcomings, which have not been solved yet. First of all, the current approaches are only able to simulate articulated rigid bodies without contact, and second, the current approaches rely on knowing and observing the generalized coordinates. Therefore, most of the existing work only showcased these networks for simple n-link pendulums (Zhong et al., 2021) and n-body problems (Sanchez-Gonzalez et al., 2019). For most real-world robotic tasks, these assumptions are not full-filled. One frequently does not know or observe the system state and most interesting robotic tasks include contacts. In contrast to physics-inspired networks, black-box dynamics models work with any observations and contacts. These models have been extensively used for model predictive control and are sufficient for complex control tasks (Hafner et al. 2019a, 2019b; Lutter et al., 2021a). Therefore, these challenges need to be addressed to enable the widespread use of physics-inspired methods for robot control. In the following, we highlight the challenges of both limitation and the initial step towards applying these models to contact-rich tasks with arbitrary observations.
6.1.1. Contacts
Analytically, contact forces can be incorporated by adding generalized contact forces to the Euler–Lagrange equation. In this case, the differential equation is described by
Within the physics-inspired deep network literature, only Hochlehnert et al. (2021) and Zhong et al. (2021) have included contacts. However, both existing works only consider special cases with strong assumptions. For example, Hochlehnert et al. (2021) only consider elastic collisions of simple geometric shapes, that is, circles. In this case, the contact forces can be computed and the contact Jacobian is the identity matrix. Therefore, one only needs to learn an indicator function
A different approach was proposed by Zhong et al. (2021). This work augments the physics-inspired network with a differentiable physics simulator to handle the contacts. In this case, a collision detection algorithm determines all active contacts and the contact Jacobians. The contact forces are computed by solving the LCP. In this case, only the coefficients of the contact model, for example, friction and restitution, are learned from data. Therefore, this approach is similar to the white-box friction models described in Section 4.3.4. This approach also implicitly assumes that the meshes and kinematics are known. Without the kinematics and meshes, the collision detection algorithms cannot compute the active contacts and Jacobians. If these quantities of the system are known, the analytic equations of motions can be computed and many physical parameters can be approximated from the meshes. Therefore, these assumptions are identical to the required knowledge for system identification using differentiable physics simulators (Werling et al., 2021; Degrave et al., 2019; Heiden et al., 2021). The advantage of physics-inspired networks compared to system identification with differentiable simulators is unknown. The experiments only applied the proposed approach to bouncing disks and a multi-link pendulum with a ground plane.
To summarize, no general way to add contacts to physics-inspired networks has been proposed and shown to work for multi-contact physics with complex geometries. The naive approach to add a single network to model the generalized contact forces is challenging as this reduces the physics-inspired model learning approaches to a black-box model learning technique without proper regularization. Therefore, an important open challenge for physics-inspired networks for robotics is to introduce a generic approach to include multiple contacts.
6.1.2. Generalized coordinates
The second limiting assumption is the observation of the generalized coordinates
To overcome this limitation, existing work combined physics-inspired networks with variational autoencoders (VAE) to learn a latent space that resembles the generalized coordinates. In this case, the Lagrangian and Hamiltonian inspired networks are applied in the latent space. Using this approach, the dynamics of single-link pendulums and N-body problems have been learned from artificial images (Greydanus et al., 2019; Zhong et al., 2019; Toth et al., 2019; Saemundsson et al., 2020; Allen-Blanchette et al., 2020). However, these approaches have not been demonstrated on more complex systems and realistic rendering of systems. Botev et al. (2021) also showed that this approach does not necessarily obtain better results than using a normal deep network continuous-time model within the latent space. Therefore, it remains an important open challenge to extend physics-inspired networks to arbitrary observations. The main challenge is to learn a latent space that resembles the generalized coordinates and the naive approach to use a VAE does not seem to be sufficient.
6.2. Summary
We introduced physics-inspired networks that combine Lagrangian and Hamiltonian mechanics with deep networks. This combination obtains physically plausible dynamics models that guarantee to conserve energy. The resulting models are also interpretable and can be used as forward, inverse, or energy models using the same parameters. Previously this was not possible with standard deep network dynamics models. Furthermore, we presented all the existing extensions of physics-inspired networks which include different representations of the Hamiltonian and Lagrangian, different loss functions as well as different actuation and friction models. We elaborated on the shortcomings of the current approaches as these techniques are limited to mechanical systems without contacts and require the observation of generalized positions, velocity, momentum, and forces. Therefore, this summary provides the big picture of physics-inspired networks for learning continuous-time dynamics models of rigid body systems.
Within the experiments, we showed that Deep Lagrangian Networks (DeLaN) and Hamiltonian Neural Networks (HNN) learn the underlying structure of the dynamical system for simulated and physical systems. When the corresponding phase-space coordinates of each model are observed, both models perform nearly identical. On average, the structured Hamiltonian and Lagrangian perform better than their black-box counterparts. Especially for the Lagrangian combination, the black-box approach can lead to high prediction errors due to inverting the Hessian of a deep network. Furthermore, we show that these physics-inspired techniques can be applied to the physical system despite the observation noise. The resulting DeLaN models can be used for real-time control and achieve good performance for inverse dynamics control as well as energy control. Especially the latter is noteworthy, as DeLaN is the first model learning technique that utilizes deep networks and can learn the system energy. Previously this was only possible using system identification which requires knowledge of the kinematic structure to derive the equations of motion.
Footnotes
Acknowledgements
This project has received funding from ABB and NVIDIA. Furthermore, we want to thank the open-source projects NumPy (Harris et al., 2020), PyTorch (Paszke et al., 2019) and JAX (Bradbury et al., 2018).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the H2020 European Research Council; #640554, Nvidia, ABB Corporate Research.
