Architecturally constrained neural energy representations for variationally well-posed higher-order continua

Abstract

Neural constitutive models have recently emerged as powerful tools for data-driven continuum mechanics, yet their integration into higher-order theories remains largely empirical and often lacks structural guarantees. In this work, we develop a rigorous variational framework for invariant neural representations of stored energy densities in strain-gradient continuum models. The energy is expressed as a neural mapping of isotropic invariants of the infinitesimal strain tensor and its gradient, thereby enforcing objectivity and isotropy at the representation level. We prove thermodynamic admissibility through potential structure, establish coercivity in $H^{2} (Ω)$ , and demonstrate weak lower semicontinuity and existence of minimizers via the direct method of the calculus of variations. A detailed Legendre–Hadamard analysis yields explicit sufficient conditions for preservation of strong ellipticity expressed directly in invariant space, clarifying how neural parametrizations influence stability and strain localization. We further discuss the relation of these conditions to convexity, rank-one convexity, and quasiconvexity and outline a polyconvex neural extension suitable for finite-strain settings. The proposed framework embeds neural energy representations into a functional-analytic structure consistent with generalized continuum mechanics, thereby providing a mathematically controlled alternative to black-box constitutive learning in higher-order elasticity.

Keywords

Neural constitutive models strain-gradient continuum mechanics variational analysis strong ellipticity constitutive learning

1. Introduction

Higher-order continuum theories—including second-gradient (strain-gradient) elasticity, Cosserat–micropolar continua, micromorphic media, and related generalized continuum frameworks—provide a mathematically rigorous and physically consistent methodology for incorporating intrinsic length scales, microstructural interactions, curvature effects, and nonclassical boundary phenomena into continuum mechanics. Their importance is now well established in situations where classical Cauchy elasticity becomes insufficient, singular, or incapable of capturing experimentally observed responses. This includes, in particular: (1) size-dependent behavior in microstructured and nanostructured solids, (2) strain localization and the regularization of softening-induced ill-posedness, (3) energetic descriptions of microstructural rearrangements and curvature-dependent mechanisms, and (4) boundary layer effects associated with higher-order tractions, moments, and double forces [1 –8]. In contrast to classical local theories, higher-order continua naturally introduce additional energetic penalties on gradients of deformation or microstructural descriptors, thereby providing both enhanced mathematical regularity and improved physical fidelity in heterogeneous materials.

Discrete microstructured systems such as pantographic lattices, articulated beam networks, modular truss assemblies, and architected metamaterials provide concrete mechanical realizations of media whose effective macroscopic behavior inherently involves higher-gradient effects and internal characteristic lengths. In these systems, the underlying geometry and connectivity of the microstructure generate nonlocal interactions and curvature-dependent responses that cannot be adequately represented within standard first-gradient elasticity. Early foundational work demonstrated that modular truss-beam systems may possess deformation energies depending explicitly on higher displacement gradients, thereby furnishing a discrete mechanical justification for second-gradient continuum theories and related generalized elastic models [9].

Among such systems, pantographic lattices and beam-based metamaterials have emerged as particularly important experimental and numerical platforms for studying the emergence of higher-order kinematics at the continuum scale. Through heuristic homogenization procedures, large-deformation experiments, and detailed numerical simulations, these structures have been shown to exhibit equilibrium configurations, bending-dominated responses, and wave propagation phenomena that are accurately described by strain-gradient and second-gradient continuum models [10,11]. These results provide strong evidence that generalized continuum theories are not merely abstract mathematical extensions of classical elasticity, but rather effective macroscopic descriptions of physically realizable microstructured media.

Recent theoretical developments have further clarified the continuum mechanics of three-dimensional pantographic lattices within the framework of second-gradient elasticity. In particular, rigorous formulations have been developed to characterize their nonlinear equilibrium behavior, stability properties, and higher-order constitutive structure, thereby strengthening the connection between discrete architected systems and generalized continuum theories [12]. Such developments reinforce the interpretation of higher-gradient elasticity as an effective continuum limit of mechanically rich lattice-type microstructures possessing complex internal kinematics.

More broadly, articulated mechanisms and zigzagged lattice geometries also exhibit highly nontrivial deformation patterns involving rotational couplings, constrained motions, and geometric compatibility effects whose effective macroscopic description frequently requires additional kinematical fields or higher-order deformation measures [13]. These examples further motivate the development of mathematically robust generalized continuum frameworks capable of capturing the interplay between geometry, microstructure, and nonlocal mechanical interactions in modern metamaterials and architected solids.

From a variational viewpoint, the introduction of strain-gradient terms yields coercivity in $H^{2} (Ω)$ and suppresses arbitrarily fine microstructure formation that may arise when the first-gradient energy loses ellipticity or quasiconvexity [14,15].

In parallel, the last decade has seen rapid growth of data-driven and machine-learning–based constitutive modeling in computational mechanics. Neural networks and related regressors have been used to approximate stress–strain responses, discover constitutive operators, and build fast surrogates for multiscale simulations [16,17]. Physics-informed learning has further emphasized embedding differential constraints into training, improving extrapolation and physical consistency [18]. Despite their empirical success, however, many neural constitutive models remain “black-box” mappings from kinematic inputs to stresses (or tangent moduli) and therefore lack structural guarantees that are routine in classical mechanics: material frame indifference (objectivity), thermodynamic admissibility (existence of a potential and satisfaction of the dissipation inequality), and well-posedness of the induced boundary value problem. When such models are inserted into finite element solvers, even small violations of these structural constraints can manifest as nonintegrable stress fields, loss of symmetry, spurious mesh dependence, or catastrophic loss of robustness under Newton iterations.

This work addresses this structural gap by embedding neural constitutive representations into a rigorous variational and functional-analytic framework tailored to higher-order continua. The central idea is conceptually simple but mathematically decisive: instead of learning stresses directly, we represent the stored energy density and derive the stress measures as variational derivatives, thereby inheriting thermodynamic consistency and compatibility with the calculus of variations. Concretely, in the small-strain second-gradient setting, one considers a displacement field $u : Ω \to ℝ^{3}$ with infinitesimal strain $ε (u) = sym \nabla u$ and strain-gradient $\nabla ε (u)$ and defines the total potential

\begin{matrix} Π_{θ} (u) = \int_{Ω} W_{θ} (ε (u), \nabla ε (u)) dx - ℓ (u), \end{matrix}

(1)

over an admissible space $V \subset H^{2} (Ω; ℝ^{3})$ compatible with the higher-order kinematics. The manuscript then proposes an invariant neural ansatz

\begin{matrix} W_{θ} (ε, \nabla ε) = Φ_{θ} (I_{1}, I_{2}, J_{1}), I_{1} = tr ε, I_{2} = ε : ε, J_{1} = ∥ \nabla ε ∥^{2}, \end{matrix}

(2)

so that objectivity and isotropy are enforced structurally by representation theory, independently of the parametrization of the scalar map $Φ_{θ}$ . This reduction to invariant space separates (a) the symmetry constraints, which are non-negotiable for physics, from (b) the expressive capacity of the neural parametrization, which can be made as rich as needed.

The motivation for enforcing a variational structure is multifold:

Objectivity and isotropy without penalties. Frame indifference is a structural requirement: for superposed rigid motions $Q \in SO (3)$ , the strain transforms as $ε \mapsto Qε Q^{T}$ and $\nabla ε$ transforms tensorially. Invariant representations guarantee that $W_{θ}$ is objective by construction [19 –21], avoiding the common practice of adding penalty terms to the loss function during training, which may not prevent violations outside the training set.

Thermodynamic admissibility by potential structure. In an isothermal elastic setting, defining stresses via $σ = \partial W_{θ} ∕ ∂ε$ and hyperstresses via $M = \partial W_{θ} ∕ \partial (\nabla ε)$ ensures that the local Clausius–Duhem inequality is satisfied identically (zero dissipation). This is not merely a modeling preference: it guarantees that the constitutive response is integrable and that work-conjugacy relations are consistent with the balance laws [2,22].

Well-posedness and existence of minimizers. For generalized continua, the natural weak formulation lives in $H^{2}$ (or related spaces), and the existence of minimizers is typically proved by the direct method under coercivity and weak lower semicontinuity [14]. Neural parametrizations can easily violate growth conditions or convexity-type conditions unless they are imposed at the architectural level. A main goal is therefore to identify verifiable conditions on $Φ_{θ}$ (regularity, growth, convexity in appropriate arguments) that imply coercivity and existence for $Π_{θ}$ in $H^{2} (Ω)$ , consistent with second-gradient Korn inequalities [23].

Stability: strong ellipticity and localization control. A constitutive model is not useful in computation if it is prone to spurious instabilities. Local stability in elasticity is governed by the Legendre–Hadamard condition, equivalently the positive definiteness of the acoustic tensor $Q (n)$ for all directions n [21,24]. In higher-order models, loss of ellipticity of the first-gradient part is often precisely the mechanism behind localization, while the gradient term regularizes the resulting patterns [7,8]. A key contribution of this work is to express sufficient strong-ellipticity conditions in terms of the Hessian of the scalar invariant map $Φ_{θ}$ , thereby enabling stability constraints to be enforced directly in invariant space rather than by manipulating the full fourth-order tangent tensor.

Beyond convexity: polyconvex neural energies. While convexity in $(ε, \nabla ε)$ is sufficient for existence, it is frequently too restrictive. In finite-strain elasticity, polyconvexity provides a physically meaningful relaxation that guarantees existence while remaining compatible with nonlinear kinematics [21,25]. Recent work has demonstrated that polyconvexity can be enforced within neural architectures (e.g., via convex neural networks and constrained parametrizations), enabling learned hyperelastic laws that remain variationally well-posed and robust in computation [26 –28]. This motivates the polyconvex extension developed here as a natural bridge from small-strain higher-order models to finite-strain settings where microstructure, instability, and large deformation coexist.

Problem statement. The overarching problem is to design neural constitutive models for higher-order continua that are (1) invariant and objective, (2) thermodynamically admissible by construction, (3) variationally well-posed in appropriate Sobolev spaces, and (4) stable in the strong-ellipticity sense, with a clear relationship to classical convexity notions. The difficulty is that expressive neural parametrizations can easily destroy coercivity, lower semicontinuity, or ellipticity unless structural constraints are built in at the representation level. The present study develops a framework that makes these properties checkable and enforceable at the level of the scalar invariant map $Φ_{θ}$ while remaining compatible with generalized continuum mechanics.

Contributions. The main contributions of this work are as follows:

We introduce a structure-preserving invariant neural representation of strain-gradient energies, in which material symmetry and objectivity are enforced at the representation level through dependence on scalar isotropic invariants.

We formulate the associated variational problem in $H^{2} (Ω)$ and establish the existence of minimizers under explicit growth, convexity, and quasiconvexity assumptions, thereby embedding neural parametrizations into a rigorous functional-analytic framework.

We derive a detailed Legendre–Hadamard stability analysis for the invariant neural energy, obtaining explicit sufficient conditions for strong ellipticity expressed directly in invariant space.

We clarify the distinction between convexity, rank-one convexity, quasiconvexity, and polyconvexity within the present framework and propose a polyconvex neural extension suitable for finite-strain elasticity.

These contributions collectively establish a mathematically controlled foundation for data-driven higher-order continuum modeling that preserves invariance, thermodynamic admissibility, existence, and stability.

Outline of the paper

Section 2 introduces the kinematics of second-gradient continua, including the displacement space $V \subset H^{2} (Ω; ℝ^{3})$ , the strain and strain-gradient measures, the total potential functional, and the associated Euler–Lagrange equations with classical stress and hyperstress.

Section 3 develops the invariant neural energy representation $W_{θ} (ε, \nabla ε) = Φ_{θ} (I_{1}, I_{2}, J_{1})$ based on isotropic scalar invariants. Objectivity and isotropy are proved using representation theory, and the structural separation between invariant encoding and neural parametrization is clarified.

Section 4 embeds the neural energy into continuum thermodynamics. The Clausius–Duhem inequality is verified in the purely elastic case, energetic conjugacy is established, and admissible extensions to inelastic internal-variable formulations are discussed.

Section 5 establishes the existence of minimizers for the variational problem in $H^{2} (Ω)$ via the direct method of the calculus of variations. Coercivity, weak lower semicontinuity, and second-order Korn-type estimates are derived under explicit structural assumptions.

Section 6 provides a detailed Legendre–Hadamard analysis of the invariant neural energy, deriving sufficient conditions for strong ellipticity expressed directly in invariant space and characterizing stability at and near the reference configuration.

Section 7 introduces polyconvex neural energy representations for finite-strain elasticity. Convex neural architectures (including input convex neural network (ICNN)-based constructions and positive-semidefinite Hessian parametrizations) are shown to enforce polyconvexity structurally and guarantee the existence of minimizers in $W^{1, p}$ .

Section 8 translates the functional-analytic stability and convexity conditions into explicit neural architectural constraints, demonstrating how ellipticity, convexity, and coercive growth can be enforced independently of training data.

Section 9 clarifies the hierarchy between convexity, polyconvexity, quasiconvexity, rank-one convexity, and Legendre–Hadamard ellipticity and analyzes the regularizing role of second-gradient contributions in preventing fine-scale microstructure formation.

Section 10 discusses the mathematical implications, limitations, and scope of the framework, highlighting the distinction between local ellipticity and global variational well-posedness.

Section 11 concludes by summarizing the structural integration of neural parametrizations into generalized continuum mechanics and outlining open problems, including sharper ellipticity criteria and finite-strain gradient extensions.

2. Kinematics of second-gradient continua

Let $Ω \subset ℝ^{3}$ be a bounded Lipschitz domain with boundary $\partial Ω$ . We denote by $M_{sym}^{3 \times 3}$ the space of symmetric second-order tensors.

2.1. Displacement field and strain measures

We consider small deformations described by a displacement field

u : Ω \to ℝ^{3} .

The infinitesimal strain tensor is defined by

ε (u) : = sym (\nabla u) = \frac{1}{2} (\nabla u + \nabla u^{T}) \in M_{sym}^{3 \times 3} .

In a second-gradient theory, the higher-order kinematical measure is

\nabla ε (u),

a third-order tensor belonging to $ℝ^{3 \times 3 \times 3}$ .

2.2. Admissible function space

Let $Γ_{D} \subset \partial Ω$ be a nonempty portion of the boundary with positive measure. We prescribe homogeneous Dirichlet boundary conditions:

u = 0 on Γ_{D} .

The admissible displacement space is

V : = {u \in H^{2} (Ω; ℝ^{3}) | u = 0 on Γ_{D}} .

The $H^{2}$ regularity is natural in second-gradient elasticity, since $\nabla ε (u)$ involves second derivatives of u.

2.3. Energy functional

Let

W_{θ} : M_{sym}^{3 \times 3} \times ℝ^{3 \times 3 \times 3} \to ℝ

be the stored energy density.

The total potential energy is defined by

Π_{θ} (u) = \int_{Ω} W_{θ} (ε (u), \nabla ε (u)) dx - ℓ (u),

where ℓ is a bounded linear functional on $V$ , representing external loads.

2.4. Variational problem

We consider the minimization problem:

Find u \in V such that Π_{θ} (u) = \inf_{v \in V} Π_{θ} (v) .

The second-gradient contribution provides coercivity in the $H^{2}$ -norm, under appropriate growth conditions on $W_{θ}$ . In particular, if

W_{θ} (ε, \nabla ε) \geq c_{1} (| ε |^{2} + | \nabla ε |^{2}) - c_{2},

then the functional is coercive in $V$ .

2.5. Euler–Lagrange equations

Formally, the first variation yields

δ Π_{θ} (u) [v] = \int_{Ω} σ (u) : ε (v) + M (u) : : \nabla ε (v) dx - ℓ (v),

where the stress measures are defined by

\begin{matrix} σ (u) = \frac{\partial W_{θ}}{∂ε} (ε (u), \nabla ε (u)), \\ M (u) = \frac{\partial W_{θ}}{\partial (\nabla ε)} (ε (u), \nabla ε (u)) . \end{matrix}

Integration by parts leads to fourth-order equilibrium equations with classical and higher-order boundary terms. This highlights the structural role of second gradients in regularizing strain localization.

3. Invariant neural energy representation

We introduce a structure-preserving representation of the stored energy based on isotropic invariants. The objective is to embed the neural parametrization into a representation-theoretic framework that enforces material frame indifference and isotropy at the structural level.

3.1. Representation-theoretic framework

Let $S^{3}$ denote the space of symmetric second-order tensors. Under a superposed rigid body motion $Q \in SO (3)$ , the infinitesimal strain tensor transforms as

ε \mapsto Qε Q^{T} .

Similarly, the third-order tensor $\nabla ε$ transforms as

{(\nabla ε)}_{ijk} \mapsto Q_{ia} Q_{jb} Q_{kc} {(\nabla ε)}_{abc} .

A scalar function

W : S^{3} \times ℝ^{3 \times 3 \times 3} \to ℝ

is objective if

W (Qε Q^{T}, Q (\nabla ε) Q^{T}) = W (ε, \nabla ε) \forall Q \in SO (3) .

By classical representation theory of isotropic tensor functions, a scalar isotropic function of tensor arguments must depend only on scalar invariants generated by contraction operations. We therefore construct the neural energy through invariant arguments.

3.2. Restricted invariant modeling class

We restrict attention to stored energy densities that depend on the strain tensor through the quadratic invariants

I_{1} = tr ε, I_{2} = ε : ε,

and on its gradient through

J_{1} = | \nabla ε |^{2} = {(\nabla ε)}_{ijk} {(\nabla ε)}_{ijk} .

Accordingly, we consider the modeling class

W_{θ} (ε, \nabla ε) = Φ_{θ} (I_{1}, I_{2}, J_{1}),

where

Φ_{θ} : ℝ^{3} \to ℝ

is twice continuously differentiable.

Remark 1. The pair $(I_{1}, I_{2})$ does not constitute a complete integrity basis for all smooth isotropic functions of ε; higher-order invariants such as $tr (ε^{3})$ would be required for full generality. The present framework therefore describes the class of quadratic-invariant small-strain energies, which already includes linear elasticity and a broad class of nonlinear but polynomially bounded models.

This restriction is intentional: it provides a minimal yet structurally consistent function class suitable for neural parametrization while remaining compatible with existence and ellipticity analysis.

3.3. Invariance of the generating set

Lemma 1. The quantities $(I_{1}, I_{2}, J_{1})$ are invariant under orthogonal transformations.

Proof. For $Q \in SO (3)$ , orthogonality implies $Q^{T} Q = I$ . Hence

\begin{matrix} tr (Qε Q^{T}) = tr (ε), \\ (Qε Q^{T}) : (Qε Q^{T}) = ε : ε, \end{matrix}

and

| Q (\nabla ε) Q^{T} |^{2} = | \nabla ε |^{2},

since the Euclidean tensor norms are invariant under orthogonal maps. □

3.4. Neural invariant Ansatz

We define the stored energy density as

W_{θ} (ε, \nabla ε) = Φ_{θ} (I_{1}, I_{2}, J_{1}),

with $Φ_{θ} \in C^{2} (ℝ^{3})$ .

The parameter vector θ represents internal coefficients (e.g., weights in a neural network). Although $Φ_{θ}$ may be identified from data, its exclusive dependence on invariant arguments ensures that structural symmetry constraints are enforced independently of the learning procedure.

3.5. Objectivity and isotropy

Theorem 1 (Frame Indifference). If $Φ_{θ}$ depends only on $(I_{1}, I_{2}, J_{1})$ , then $W_{θ}$ is objective and isotropic.

Proof. Under a superposed rigid body motion,

ε \mapsto Qε Q^{T}, \nabla ε \mapsto Q (\nabla ε) Q^{T} .

By the preceding lemma, the scalar invariants remain unchanged. Therefore

W_{θ} (Qε Q^{T}, Q (\nabla ε) Q^{T}) = Φ_{θ} (I_{1}, I_{2}, J_{1}) = W_{θ} (ε, \nabla ε) .

□

Remark 2. Objectivity refers to invariance under superposed rigid motions, whereas isotropy refers to material symmetry. Because the invariant arguments are rotation-invariant, both properties are satisfied simultaneously.

3.6. Scope of the representation

The above ansatz characterizes the class of quadratic-invariant isotropic small-strain energies. More general isotropic energies may depend on additional invariants of ε and on mixed invariants involving ε and $\nabla ε$ . Such extensions can be incorporated systematically by enlarging the invariant argument set, without altering the structural logic of the framework.

3.7. Regularity

The assumption $Φ_{θ} \in C^{2} (ℝ^{3})$ implies

W_{θ} \in C^{2} (S^{3} \times ℝ^{3 \times 3 \times 3}),

ensuring the existence of first and second variations. This regularity is required for the derivation of Euler–Lagrange equations and for the Legendre–Hadamard analysis performed in section 6.

3.8. Extension to anisotropy

Anisotropic models can be obtained by introducing structural tensors. Let $A$ encode material symmetry. One may then define

W_{θ} (ε, \nabla ε) = Φ_{θ} (I_{1}, I_{2}, J_{1}, ε : A : ε, \dots) .

The invariant neural framework therefore extends naturally to transversely isotropic or orthotropic materials by augmenting the invariant argument list.

Remark 3. The invariant formulation separates structural constraints (symmetry, objectivity) from parametric flexibility (choice of θ). This separation is essential for well-posedness and stability, as an admissible constitutive structure is fixed prior to parameter identification.

4. Thermodynamic structure and dissipation inequality

We now embed the invariant neural energy representation into the framework of continuum thermodynamics for second-gradient media.

4.1. Balance laws

Let $Ω \subset ℝ^{3}$ be occupied by a body. We assume quasi-static conditions and neglect inertial effects.

The balance of linear momentum reads

\begin{matrix} Div σ - {Div}^{2} M + b = 0, \end{matrix}

(3)

where

σ is the Cauchy stress tensor,

M is the third-order hyperstress tensor,

b denotes the body forces,

${Div}^{2} M : = \partial_{j} \partial_{k} M_{ijk}$ .

The associated virtual power identity is

\begin{matrix} \int_{Ω} σ_{ij} δ u_{i, j} + M_{ijk} δ u_{i, jk} dx = \int_{Ω} b_{i} δ u_{i} dx + \int_{\partial Ω} t_{i} δ u_{i} ds + \int_{\partial Ω} m_{i} δ u_{i, n} ds, \end{matrix}

(4)

4.2. Free-energy density

We assume the Helmholtz free-energy density

ψ (ε, \nabla ε) = W_{θ} (ε, \nabla ε),

with

W_{θ} (ε, \nabla ε) = Φ_{θ} (I_{1}, I_{2}, J_{1}) .

We define constitutive relations through energetic conjugacy:

\begin{matrix} σ = \frac{∂ψ}{∂ε}, M = \frac{∂ψ}{\partial (\nabla ε)} . \end{matrix}

(5)

4.3. Local dissipation inequality

The local form of the Clausius–Duhem inequality under isothermal conditions is

\begin{matrix} D = σ : \overset{\cdot}{ε} + M : : \nabla \overset{\cdot}{ε} - \overset{\cdot}{ψ} \geq 0 . \end{matrix}

(6)

4.4. Elastic case

Theorem 2 (Thermodynamic Admissibility of Elastic Neural Energies). Let $ψ = W_{θ} (ε, \nabla ε)$ be continuously differentiable with constitutive relations

σ = \frac{∂ψ}{∂ε}, M = \frac{∂ψ}{\partial (\nabla ε)} .

Then, the local dissipation inequality is satisfied identically, i.e.,

D = 0 .

Proof. By the chain rule,

\overset{\cdot}{ψ} = \frac{∂ψ}{∂ε} : \overset{\cdot}{ε} + \frac{∂ψ}{\partial (\nabla ε)} : : \nabla \overset{\cdot}{ε} .

Substituting the constitutive relations yields

\overset{\cdot}{ψ} = σ : \overset{\cdot}{ε} + M : : \nabla \overset{\cdot}{ε} .

Hence

D = σ : \overset{\cdot}{ε} + M : : \nabla \overset{\cdot}{ε} - \overset{\cdot}{ψ} = 0 .

□

4.5. Energetic conjugacy and boundary contributions

Integrating the local inequality over $Ω$ and applying divergence theorems for higher-order continua yield

\begin{matrix} \frac{d}{dt} \int_{Ω} ψ dx = \int_{Ω} σ : \overset{\cdot}{ε} + M : : \nabla \overset{\cdot}{ε} dx . \end{matrix}

(7)

After integration by parts,

\begin{matrix} \int_{Ω} σ : \overset{\cdot}{ε} + M : : \nabla \overset{\cdot}{ε} dx = \int_{Ω} (Div σ - {Div}^{2} M) \cdot \overset{\cdot}{u} dx + \int_{\partial Ω} T \cdot \overset{\cdot}{u} ds, \end{matrix}

(8)

where $T$ denotes the generalized boundary tractions.

Thus, the energetic formulation is fully consistent with the balance equations.

4.6. Extension to inelastic internal variables

Let the free energy depend additionally on an internal variable α:

ψ = ψ (ε, \nabla ε, α) .

Define thermodynamic force

A = - \frac{∂ψ}{∂α} .

The dissipation becomes

\begin{matrix} D = A \overset{\cdot}{α} . \end{matrix}

(9)

Theorem 3 (Dissipation via Convex Potential). If the evolution law is derived from a convex dissipation potential

\overset{\cdot}{α} \in \partial Φ^{*} (A),

where $Φ^{*}$ is convex, then $D \geq 0$ .

Proof. By properties of convex duality,

A \overset{\cdot}{α} \geq Φ^{*} (A) + Φ (\overset{\cdot}{α}) \geq 0 .

□

4.7. Structural implications for neural parametrizations

Thermodynamic admissibility is not an auxiliary modeling preference but a structural requirement dictated by continuum thermodynamics. Within the present invariant neural framework, admissibility must hold independently of the training data and therefore be enforced at the representation level. In the isothermal setting considered here, this translates into the following constitutive requirements:

Regularity of the stored energy. The invariant scalar map $Φ_{θ}$ must possess sufficient smoothness (at least $C^{1}$ , and $C^{2}$ for stability analysis) to ensure that stresses and hyperstresses are well-defined as variational derivatives. This guarantees the existence of the first and second variations of the total potential.

Symmetry of the Cauchy stress. The stress tensor

σ = \frac{\partial W_{θ}}{∂ε}

must be symmetric, reflecting the balance of angular momentum. Because the energy depends only on the symmetric strain tensor and is derived from a scalar potential, this symmetry follows automatically.

Energetic origin of higher-order stresses. The hyperstress tensor

M = \frac{\partial W_{θ}}{\partial (\nabla ε)}

must arise from the same free-energy density. This ensures integrability of the constitutive response, compatibility with the principle of virtual power, and exact satisfaction of the local Clausius–Duhem inequality in the purely elastic case.

Convexity of dissipation mechanisms for inelastic extensions. When internal variables are introduced, their evolution must be governed by a convex dissipation potential. Convexity guarantees non-negative dissipation and preserves consistency with the Clausius–Duhem inequality.

These conditions collectively ensure that the neural parametrization remains embedded within a thermodynamically consistent constitutive class, rather than merely approximating stress responses in a data-driven manner. These conditions can be enforced at the level of neural parametrization by ensuring smooth activation functions and convex architectures when modeling inelastic processes.

Remark 4. The key advantage of the invariant neural representation is that thermodynamic consistency follows automatically from its potential structure. This stands in contrast to stress-driven neural mappings which may violate dissipation principles.

5. Existence of minimizers in $H^{2} (Ω)$

We establish the existence of minimizers for the second-gradient neural energy functional using the direct method of the calculus of variations.

5.1. Variational setting

Let $Ω \subset ℝ^{3}$ be a bounded Lipschitz domain. We impose Dirichlet boundary conditions on a portion $Γ_{D} \subset \partial Ω$ of positive measure.

Define the admissible space

V = {u \in H^{2} (Ω; ℝ^{3}) ∣ u = ū on Γ_{D}} .

The total potential energy is

Π_{θ} (u) = \int_{Ω} W_{θ} (ε (u), \nabla ε (u)) dx - ℓ (u),

where ℓ is a bounded linear functional on $H^{2}$ .

The minimization problem reads:

Find u \in V such that Π_{θ} (u) = \inf_{v \in V} Π_{θ} (v) .

5.2. Structural assumptions

We assume:

(A1) Regularity.

W_{θ} \in C^{1} (S^{3} \times ℝ^{3 \times 3 \times 3}) .

(A2) Coercive growth. There exist constants $c_{1} > 0$ , $c_{2} \geq 0$ such that

W_{θ} (ε, \nabla ε) \geq c_{1} | \nabla ε |^{2} - c_{2} (1 + | ε |^{2}) .

(A3) Convexity in the higher-order variable. $W_{θ}$ is convex with respect to $\nabla ε$ .

(A4) Quasiconvexity in strain. $W_{θ}$ is quasiconvex with respect to ε.

Assumption (A2) provides second-gradient coercivity, while (A3)–(A4) ensure weak lower semicontinuity.

5.3. Second-order Korn inequality

Lemma 2 (Second-order Korn inequality). There exists $C > 0$ such that for all $u \in V$ ,

∥ u ∥_{H^{2} (Ω)} \leq C (∥ ε (u) ∥_{L^{2} (Ω)} + ∥ \nabla ε (u) ∥_{L^{2} (Ω)}) .

Proof. The classical Korn inequality controls $\nabla u$ by $ε (u)$ . Applying Korn inequality to the derivatives $\partial_{k} u$ provides control of second derivatives via $\nabla ε (u)$ . □

5.4. Coercivity

Proposition 1 (Coercivity). Under Assumption (A2), the functional $Π_{θ}$ is coercive on $V$ .

Proof. From Assumption (A2),

Π_{θ} (u) \geq c_{1} ∥ \nabla ε (u) ∥_{L^{2}}^{2} - c_{2} (1 + ∥ ε (u) ∥_{L^{2}}^{2}) - | ℓ (u) | .

Since ℓ is continuous in $H^{2}$ ,

| ℓ (u) | \leq C ∥ u ∥_{H^{2}} .

Applying the second-order Korn inequality yields

Π_{θ} (u) \geq {\tilde{c}}_{1} ∥ u ∥_{H^{2}}^{2} - {\tilde{c}}_{2} (1 + ∥ u ∥_{H^{2}}),

which implies coercivity. □

5.5. Weak lower semicontinuity

Proposition 2 (Weak lower semicontinuity). Under Assumptions (A3) and (A4), $Π_{θ}$ is weakly sequentially lower semicontinuous in $H^{2}$ .

Proof. Let $u_{k} ⇀ u$ in $H^{2}$ . Then

\begin{matrix} ε (u_{k}) ⇀ ε (u) in H^{1}, \\ \nabla ε (u_{k}) ⇀ \nabla ε (u) in L^{2} . \end{matrix}

Convexity in $\nabla ε$ implies weak lower semicontinuity of the higher-order part. Quasiconvexity in ε ensures weak lower semicontinuity of the first-gradient part. The load functional is continuous. □

5.6. Existence of minimizers

Theorem 4 (Existence). Under Assumptions (A1)–(A4), the functional $Π_{θ}$ admits at least one minimizer in $V$ .

Proof. Let $(u_{k}) \subset V$ be a minimizing sequence. By coercivity, $(u_{k})$ is bounded in $H^{2}$ . Hence, there exists $u \in V$ such that, up to a subsequence,

u_{k} ⇀ u in H^{2} .

Weak lower semicontinuity yields

Π_{θ} (u) \leq \underset{k \to \infty}{\lim \inf} Π_{θ} (u_{k}),

so u is a minimizer. □

5.7. Remarks

Remark 5. Full convexity in $(ε, \nabla ε)$ is not required. Convexity in the higher-order term combined with quasiconvexity in the strain variable is sufficient for existence.

Remark 6. The second-gradient term serves a dual purpose: it regularizes localization phenomena and supplies coercivity in $H^{2}$ , ensuring functional well-posedness even when the first-gradient part is not convex.

6. Strong ellipticity and the Legendre–Hadamard condition

We analyze stability of the invariant neural energy representation through the Legendre–Hadamard condition, which characterizes strong ellipticity of the second variation.

6.1. Second variation and tangent operator

Consider the strain-dependent part of the stored energy

W_{θ} (ε) = Φ_{θ} (I_{1}, I_{2}),

with invariants

I_{1} = tr ε, I_{2} = ε : ε .

We define the fourth-order tangent tensor

ℂ_{ijkl} = \frac{\partial^{2} W_{θ}}{\partial ε_{ij} \partial ε_{kl}} .

The derivatives of the invariants are

\frac{\partial I_{1}}{\partial ε_{ij}} = δ_{ij}, \frac{\partial I_{2}}{\partial ε_{ij}} = 2 ε_{ij},

and

\frac{\partial^{2} I_{1}}{\partial ε_{ij} \partial ε_{kl}} = 0, \frac{\partial^{2} I_{2}}{\partial ε_{ij} \partial ε_{kl}} = 2 δ_{ik} δ_{jl} .

Applying the chain rule yields

ℂ_{ijkl} = Φ_{11} δ_{ij} δ_{kl} + 2 Φ_{2} δ_{ik} δ_{jl} + 4 Φ_{22} ε_{ij} ε_{kl} + 2 Φ_{12} (δ_{ij} ε_{kl} + ε_{ij} δ_{kl}),

where subscripts denote the partial derivatives of $Φ_{θ}$ with respect to $(I_{1}, I_{2})$ .

6.2. Legendre–Hadamard condition

Definition 1. The stored energy satisfies the Legendre–Hadamard condition at strain ε if for all nonzero $a, n \in ℝ^{3}$ ,

ℂ_{ijkl} a_{i} n_{j} a_{k} n_{l} > 0 .

Equivalently, defining the acoustic tensor

Q_{ik} (n) = ℂ_{ijkl} n_{j} n_{l},

strong ellipticity is equivalent to positive definiteness of $Q (n)$ for every unit vector n.

6.3. Ellipticity at the reference configuration

At the reference configuration $ε = 0$ , the tangent tensor reduces to

ℂ_{ijkl}^{(0)} = Φ_{11} (0) δ_{ij} δ_{kl} + 2 Φ_{2} (0) δ_{ik} δ_{jl} .

This has the standard isotropic elasticity structure. The corresponding acoustic tensor is

Q_{ik}^{(0)} (n) = Φ_{11} (0) n_{i} n_{k} + 2 Φ_{2} (0) δ_{ik} .

The eigenvalues of $Q^{(0)} (n)$ are:

Longitudinal mode:

λ_{L} = Φ_{11} (0) + 2 Φ_{2} (0),

Shear modes (multiplicity two):

λ_{S} = 2 Φ_{2} (0) .

Theorem 5 (Strong ellipticity at reference configuration). If

Φ_{2} (0) > 0, Φ_{11} (0) + 2 Φ_{2} (0) > 0,

then the Legendre–Hadamard condition holds at $ε = 0$ .

Proof. Strong ellipticity requires positivity of all acoustic eigenvalues. The two shear eigenvalues are positive if $Φ_{2} (0) > 0$ . The longitudinal eigenvalue is positive if $Φ_{11} (0) + 2 Φ_{2} (0) > 0$ . □

6.4. Local preservation of ellipticity

Proposition 3 (Local Preservation). Assume $Φ_{θ} \in C^{2}$ and the reference configuration satisfies

Φ_{2} (0) > 0, Φ_{11} (0) + 2 Φ_{2} (0) > 0 .

Then there exists $ε_{0} > 0$ such that for all strains with $| ε | < ε_{0}$ , the Legendre–Hadamard condition remains satisfied.

Proof. The acoustic tensor depends continuously on ε. Since the eigenvalues are strictly positive at $ε = 0$ , continuity implies positivity in a sufficiently small neighborhood. □

6.5. Interpretation in invariant space

The above result shows that the ellipticity at the reference state is governed entirely by the invariant derivatives $Φ_{11}$ and $Φ_{2}$ evaluated at $(I_{1}, I_{2}) = (0, 0)$ . Therefore, stability constraints can be enforced directly at the level of the neural parametrization by guaranteeing the positivity of these quantities.

Remark 7. Global ellipticity for arbitrary strain states requires additional constraints on the full tangent tensor and cannot be inferred solely from the positivity of the invariant Hessian. The present conditions are local but sufficient for stability in small-strain regimes.

7. Polyconvex neural energy representations

Convexity of $W_{θ}$ in $(ε, \nabla ε)$ is sufficient for the existence of minimizers, but may be overly restrictive and physically unrealistic, especially in finite-strain elasticity. Polyconvexity provides a weaker yet mathematically robust condition that guarantees existence while remaining compatible with nonlinear kinematics.

7.1. Polyconvexity in finite strain

Let $F = \nabla u$ denote the deformation gradient. An energy density $W : ℝ^{3 \times 3} \to ℝ$ is said to be polyconvex if there exists a convex function

G : ℝ^{3 \times 3} \times ℝ^{3 \times 3} \times ℝ \to ℝ

such that

W (F) = G (F, cof F, \det F) .

Polyconvexity implies quasiconvexity, and hence weak lower semicontinuity of the functional

Π (u) = \int_{Ω} W (\nabla u) dx,

under suitable growth conditions. Classical existence results are due to Ball.

7.2. Neural polyconvex ansatz

We define a neural polyconvex energy as

W_{θ} (F) = G_{θ} (F, cof F, \det F),

where

G_{θ} : ℝ^{3 \times 3} \times ℝ^{3 \times 3} \times ℝ \to ℝ

is convex with respect to all its arguments.

Definition 2. The neural mapping $G_{θ}$ is convex if for every fixed θ its Hessian with respect to the variables $(F, cof F, \det F)$ is positive semidefinite.

This construction enforces polyconvexity structurally, independently of parameter identification.

7.3. Existence result in finite strain

We impose the following growth condition: there exist constants $c_{1}, c_{2} > 0$ and $p > 3$ such that

W_{θ} (F) \geq c_{1} | F |^{p} - c_{2} .

Theorem 6 (Existence under polyconvexity). Let $W_{θ}$ be polyconvex and satisfy the above growth condition. Then, the functional

Π_{θ} (u) = \int_{Ω} W_{θ} (\nabla u) dx

admits at least one minimizer in

{u \in W^{1, p} (Ω; ℝ^{3}) ∣ u |_{Γ_{D}} = ū} .

Proof. Polyconvexity implies quasiconvexity. Together with p-growth and coercivity in $W^{1, p}$ , the direct method of the calculus of variations applies. Weak sequential lower semicontinuity follows from the convexity of $G_{θ}$ . □

7.4. Relation to strong ellipticity

Polyconvexity implies rank-one convexity. For $C^{2}$ energies, rank-one convexity implies the Legendre–Hadamard condition. Hence, smooth polyconvex neural energies are automatically strongly elliptic.

The converse is false: strong ellipticity does not imply polyconvexity. Therefore, polyconvexity constitutes a strictly stronger, global stability requirement.

7.5. Neural implementation strategies

In order to embed neural constitutive parametrizations into a mathematically admissible class, convexity (or polyconvexity) must be enforced at the architectural level rather than learned empirically.

We describe three structurally controlled constructions.

1. ICNN

Let $z \in ℝ^{m}$ denote the vector of invariants (e.g., $z = (F, cof F, \det F)$ in the finite-strain setting). An ICNN is a mapping $G_{θ} : ℝ^{m} \to ℝ$ of the form

h_{0} = z, h_{k + 1} = σ_{k} (W_{k} h_{k} + U_{k} z + b_{k}),

with output

G_{θ} (z) = w_{L}^{⊤} h_{L} + c,

subject to the structural constraint

W_{k} \geq 0 (entrywise) .

If the activation functions $σ_{k}$ are convex and nondecreasing, and the weight matrices $W_{k}$ have non-negative entries, then $G_{θ}$ is convex in z. This follows from the closure of convex functions under non-negative affine composition and convex activation.

Proposition 4. If each layer satisfies $W_{k} \geq 0$ entrywise and $σ_{k}$ is convex and nondecreasing, then $G_{θ}$ is convex.

Proof. Convexity is preserved under composition with convex nondecreasing functions and under the addition of affine terms. The non-negativity of $W_{k}$ ensures that convexity of $h_{k}$ is inherited by $h_{k + 1}$ . An induction over layers yields convexity of $G_{θ}$ . □

Thus, the convexity of $G_{θ}$ is guaranteed independently of training.

2. Positive-semidefinite Hessian parametrization

An alternative approach is to parametrize the Hessian of $G_{θ}$ directly.

Let

\nabla_{z}^{2} G_{θ} (z) = L_{θ} (z) L_{θ} {(z)}^{⊤},

where $L_{θ} (z)$ is a lower triangular matrix-valued neural map. Since $L_{θ} L_{θ}^{⊤}$ is positive semidefinite, this construction guarantees convexity.

Integrating the Hessian twice (with suitable boundary conditions) produces a convex potential.

Proposition 5. If $\nabla_{z}^{2} G_{θ} (z)$ is positive semidefinite for all z , then $G_{θ}$ is convex.

This construction is particularly useful when convexity is required only in selected arguments (e.g., $(F, cof F, \det F)$ ).

3. Convex anchor and convex residual decomposition

A structurally robust strategy is to decompose

G_{θ} (z) = G_{anchor} (z) + ψ_{θ} (z),

where

$G_{anchor}$ is a fixed quadratic coercive form,

$ψ_{θ}$ is a convex neural mapping.

For instance,

G_{anchor} (z) = c_{0} | F |^{2} + c_{1} | cof F |^{2} + c_{2} {(\det F)}^{2}, c_{i} > 0 .

Proposition 6. If $ψ_{θ}$ is convex and $G_{anchor}$ is strictly convex and coercive, then $G_{θ}$ is coercive and convex.

This decomposition ensures:

Existence of minimizers via coercivity,

Stability under parameter updates,

Independence of convexity from training data.

Relation to polyconvexity

In the finite-strain setting, polyconvexity requires convexity in the extended variable

z = (F, cof F, \det F) .

If $G_{θ}$ is convex in z, then

W_{θ} (F) = G_{θ} (F, cof F, \det F)

is polyconvex.

Thus, architectural convexity in the invariant space provides a direct mechanism to enforce polyconvexity at the constitutive level.

Remark 8. The essential feature of these constructions is that convexity and coercivity are structural properties of the hypothesis class itself, not empirical outcomes of training. Consequently, neural energies remain within a mathematically admissible class throughout the optimization process.

8. Neural architecture constraints and structural enforcement

The previous sections derived existence and stability conditions in terms of derivatives of the invariant scalar map $Φ_{θ}$ . We now show how such conditions can be enforced at the level of neural architecture, independently of training data.

8.1. Invariant input encoding

The neural energy is represented as

W_{θ} (ε, \nabla ε) = Φ_{θ} (I_{1}, I_{2}, J_{1}) .

Because $(I_{1}, I_{2}, J_{1})$ are scalar invariants, objectivity and isotropy are enforced structurally. No data-driven penalty is required. This eliminates a major failure mode of black-box stress regressors.

8.2. Ellipticity constraints in invariant space

From the Legendre–Hadamard analysis, strong ellipticity at the reference configuration requires:

Φ_{2} (0) > 0, Φ_{11} (0) + 2 Φ_{2} (0) > 0 .

These conditions depend only on derivatives of $Φ_{θ}$ . Hence, ellipticity can be enforced by architectural design.

For example:

Parametrize $Φ_{2}$ as $Φ_{2} = e^{ψ_{θ}}$ to guarantee positivity.

Parametrize $Φ_{11}$ via $Φ_{11} = e^{χ_{θ}} - 2 Φ_{2}$ .

Use positive-definite Hessian parametrization via Cholesky factors.

Thus, the ellipticity constraints become algebraic constraints on neural outputs.

8.3. Convexity enforcement

A central requirement for the existence of minimizers in the second-gradient setting is weak sequential lower semicontinuity of the functional in $H^{2} (Ω)$ . While quasiconvexity conditions govern the first-gradient part, convexity with respect to the higher-order variable $\nabla ε$ is sufficient to guarantee lower semicontinuity of the corresponding term. In particular, convexity in $\nabla ε$ prevents pathological loss of compactness associated with oscillations of the strain-gradient field.

To make this requirement explicit at the level of neural parametrization, we consider a separable invariant representation

Φ_{θ} (I_{1}, I_{2}, J_{1}) = {\tilde{Φ}}_{θ} (I_{1}, I_{2}) + Ψ_{θ} (J_{1}), J_{1} = ∥ \nabla ε ∥^{2},

where the higher-order contribution $Ψ_{θ}$ depends solely on the scalar invariant $J_{1}$ .

Convexity of $Ψ_{θ}$ ensures convexity of the energy with respect to $\nabla ε$ , since $J_{1}$ is a quadratic form in $\nabla ε$ . Consequently, the higher-order term satisfies the structural condition required for weak lower semicontinuity of the functional.

Importantly, the convexity of $Ψ_{θ}$ need not be imposed as a penalty during training; it can be enforced directly at the architectural level. Several structurally controlled constructions are available:

ICNNs. Designing $Ψ_{θ}$ as an ICNN with convex, nondecreasing activation functions and non-negative weight constraints guarantees convexity by construction.

Non-negative weight parametrization. Restricting specific layers to have entrywise non-negative weights, together with convex activations, preserves convexity under affine composition.

Squared second-derivative representation. Representing the second derivative as $Ψ_{θ}^{″} (t) = (g_{θ} (t))^{2}$ for a neural map $g_{θ}$ ensures $Ψ_{θ}^{″} (t) \geq 0$ for all t, and therefore convexity of $Ψ_{θ}$ .

These constructions convert an abstract functional-analytic condition (convexity in $\nabla ε$ ) into explicit algebraic constraints on the neural architecture. As a result, the admissibility of the higher-order term becomes independent of training data and is preserved throughout the optimization process.

8.4. Coercive growth control

A central requirement for the existence of minimizers in $H^{2} (Ω)$ is coercive growth with respect to the higher-order variable $| \nabla ε |$ . As established in section 5, the direct method of the calculus of variations requires that the stored energy control the $H^{2}$ -norm of admissible displacements. Since $\nabla ε (u)$ contains second derivatives of u, quadratic growth in $| \nabla ε |$ is the natural structural condition.

From a neural modeling perspective, such growth cannot be left to empirical training alone. Neural parametrizations identified from finite datasets may exhibit flat regions, subquadratic growth, or even local degeneracy outside the data manifold, thereby compromising coercivity of the total functional.

To enforce this property at the architectural level, we augment the invariant neural representation by a strictly positive quadratic anchor in the gradient invariant:

Φ_{θ} (I_{1}, I_{2}, J_{1}) = Φ_{θ}^{NN} (I_{1}, I_{2}, J_{1}) + α J_{1}, α > 0 .

Because $J_{1} = ∥ \nabla ε ∥^{2}$ , this augmentation guarantees

W_{θ} (ε, \nabla ε) \geq α ∥ \nabla ε ∥^{2} - C (1 + | ε |^{p}),

for suitable $C, p > 0$ , independently of the learned parameters. Consequently, coercivity in $H^{2} (Ω)$ follows via the second-order Korn inequality.

Importantly, this coercive anchor is not a numerical regularization term but a structural component of the hypothesis class. It ensures that the admissible neural energy remains within a variationally well-posed class throughout training and inference. Moreover, it prevents degeneracy in regions of sparse or extrapolated data, where purely data-driven models may otherwise lose stability.

8.5. Comparison with unconstrained neural stress models

Many data-driven constitutive approaches approximate the stress response directly by a neural mapping

σ_{θ} = N_{θ} (ε, \nabla ε) .

While such formulations may reproduce observed stress–strain relations within a training regime, they generally lack structural guarantees.

In particular, stress-driven neural models:

do not guarantee the existence of an underlying potential and therefore may violate integrability conditions;

may fail to produce a symmetric consistent tangent operator, leading to loss of variational structure and difficulties in Newton-type solvers;

can destroy strong ellipticity or acoustic tensor positivity, since no invariant-based convexity constraints are enforced;

provide no existence guarantees for the associated boundary value problem, as weak lower semicontinuity and coercivity are not structurally controlled.

These deficiencies are not merely theoretical. In computational practice, they may manifest as mesh dependence, nonconvergent iterations, spurious localization, or lack of robustness under extrapolation.

In contrast, the present energy-based invariant framework begins with a stored energy density defined on scalar invariant space. Admissibility conditions—objectivity, thermodynamic consistency, coercive growth, and ellipticity—are enforced structurally through architectural constraints on the scalar mapping $Φ_{θ}$ . Because the invariant space is low-dimensional and symmetry-reduced, these constraints become algebraically tractable and verifiable.

Remark 9. The architectural constraints introduced above translate abstract functional-analytic requirements—coercivity, weak lower semicontinuity, and strong ellipticity—into explicit parametrization rules at the neural design level. This establishes a direct mathematical bridge between neural architecture and partial differential equation (PDE) well-posedness, thereby embedding machine-learned constitutive models into the classical variational framework of generalized continuum mechanics.

Remark 10. A central theme of the present framework is that analytical admissibility conditions arising in continuum mechanics and the calculus of variations can be translated into explicit constraints at the neural-architecture level. In particular, coercivity, convexity, polyconvexity, and Legendre–Hadamard ellipticity are enforced structurally through invariant parametrizations, convex neural architectures, and positivity-preserving representations. Consequently, admissibility is embedded directly into the hypothesis class rather than expected to emerge empirically through training alone.

9. Relation to convexity, rank-one convexity, and quasiconvexity

Convexity plays a central role in existence theory for variational problems. However, in nonlinear elasticity and generalized continua, full convexity is often too restrictive. We therefore clarify the hierarchy of convexity notions and their implications for the present neural framework.

9.1. Convexity and weak lower semicontinuity

A central question in the calculus of variations is whether the energy functional

Π (u) = \int_{Ω} W (ε (u), \nabla ε (u)) dx

is weakly lower semicontinuous in the natural function space. Weak lower semicontinuity is the key structural property required for the direct method to yield the existence of minimizers.

Definition 3. The function W is convex in $(ε, \nabla ε)$ if for all $(X_{1}, Y_{1}), (X_{2}, Y_{2})$ and all $λ \in [0, 1]$ ,

W (λ X_{1} + (1 - λ) X_{2}, λ Y_{1} + (1 - λ) Y_{2}) \leq λW (X_{1}, Y_{1}) + (1 - λ) W (X_{2}, Y_{2}) .

If W is convex and satisfies suitable growth conditions, then $Π$ is weakly lower semicontinuous in $H^{2} (Ω)$ . Consequently, coercivity and convexity together guarantee the existence of minimizers.

However, full convexity is extremely restrictive from a physical standpoint. Most nonlinear elastic energies—especially those capable of describing instabilities, phase transitions, or strain softening—fail to be convex. Thus, weaker notions of convexity are required.

9.2. Rank-one convexity and the Legendre–Hadamard condition

A first relaxation of convexity is rank-one convexity, which is closely tied to stability against simple laminates.

Definition 4. A function $W (F)$ is rank-one convex if, for every F and every rank-one tensor $a \otimes n$ , the function

t \mapsto W (F + t a \otimes n)

is convex for all $t \in ℝ$ .

For twice differentiable energies, rank-one convexity is equivalent to the Legendre–Hadamard condition

ℂ_{ijkl} a_{i} n_{j} a_{k} n_{l} \geq 0,

where ℂ is the fourth-order tangent tensor. This is precisely the condition of strong ellipticity in elasticity theory.

Thus, strong ellipticity represents a local form of rank-one convexity. It ensures local stability of homogeneous states with respect to plane-wave perturbations, but it does not guarantee global variational well-posedness.

9.3. Quasiconvexity

The correct structural condition for weak lower semicontinuity in nonlinear elasticity is quasiconvexity.

Definition 5. $W (F)$ is quasiconvex at $F_{0}$ if for all $φ \in C_{c}^{\infty} (Ω; ℝ^{3})$ ,

\int_{Ω} W (F_{0} + \nabla φ) dx \geq | Ω | W (F_{0}) .

Quasiconvexity characterizes weak lower semicontinuity of integral functionals in $W^{1, p}$ spaces. The classical hierarchy reads

Convexity \Rightarrow Quasiconvexity \Rightarrow Rank-one convexity \Rightarrow Legendre–Hadamard ellipticity,

and the converses generally fail.

Hence, strong ellipticity alone is insufficient to guarantee the existence of minimizers. Additional structure—either convexity or higher-order regularization—is required.

9.4. Second-gradient regularization

Consider now an energy of the form

W (ε, \nabla ε) = W_{0} (ε) + \frac{ℓ^{2}}{2} | \nabla ε |^{2} .

Suppose that $W_{0}$ loses strong ellipticity at some strain $ε^{*}$ , meaning that its acoustic tensor becomes singular. In a purely first-gradient model, this loss permits arbitrarily fine oscillations, which can drive minimizing sequences toward microstructured states.

The second-gradient term alters the second variation:

δ^{2} Π (u) = \int_{Ω} ℂ_{0} (ε) : (\nabla v \otimes \nabla v) + ℓ^{2} | \nabla \nabla v |^{2} dx .

The additional term is coercive in $H^{2}$ and penalizes the curvature of the strain field.

Proposition 7. Let $W_{0}$ be twice differentiable and assume that

ℂ_{0} (ε) a \otimes n : a \otimes n \geq - C | a |^{2} | n |^{2} .

Then, for sufficiently large $ℓ > 0$ , the total second variation is coercive in $H^{2}$ .

Proof. The gradient term contributes

ℓ^{2} ∥ \nabla \nabla v ∥_{L^{2}}^{2} .

By interpolation and second-order Korn-type inequalities, this term dominates any bounded negative contribution arising from the first-gradient part, provided ℓ is sufficiently large. □

Thus, higher-order regularization does not restore classical strong ellipticity, but it restores variational coercivity and prevents unbounded oscillations.

9.5. Implications for neural energy representations

Consider neural strain-gradient energies of separable form

W_{θ} (ε, \nabla ε) = Φ_{θ} (I_{1}, I_{2}) + Ψ_{θ} (J_{1}), J_{1} = ∥ \nabla ε ∥^{2} .

Assume:

$Ψ_{θ} \in C^{1} ([0, \infty))$ with $Ψ_{θ}^{'} (t) \geq c_{α} > 0$ ,

$Φ_{θ}$ satisfies polynomial growth $| Φ_{θ} (I_{1}, I_{2}) | \leq C (1 + | ε |^{p})$ .

Higher-order coercivity

Proposition 8 (Gradient coercivity). If $Ψ_{θ}^{'} (t) \geq c_{α} > 0$ , then

W_{θ} (ε, \nabla ε) \geq c | \nabla ε |^{2} - C (1 + | ε |^{p}) .

Proof. Since $J_{1} = ∥ \nabla ε ∥^{2}$ and $Ψ_{θ}^{'} (t) \geq c_{α}$ , $Ψ_{θ}$ grows at least linearly in $J_{1}$ , yielding

Ψ_{θ} (J_{1}) \geq c_{α} J_{1} - C .

□

Combined with a second-order Korn inequality, this ensures coercivity of the total functional in $H^{2} (Ω)$ .

Regularization of localization

Let

ℂ_{θ} (ε) = \frac{\partial^{2}}{\partial ε^{2}} Φ_{θ} (I_{1}, I_{2})

be the first-gradient tangent tensor. Loss of positive definiteness of $ℂ_{θ}$ corresponds to loss of classical strong ellipticity.

The higher-order contribution generates the bilinear form

\int_{Ω} Ψ_{θ}^{'} (J_{1}) \nabla ε : \nabla ε dx,

which penalizes high-frequency strain fluctuations. Even if $ℂ_{θ}$ becomes indefinite, the total functional may remain bounded below.

Suppression of fine-scale microstructure

Let $(u_{k})$ be a minimizing sequence. If $Ψ_{θ}^{'} (t) \geq c_{α} > 0$ , then

∥ \nabla ε (u_{k}) ∥_{L^{2}}

is uniformly bounded. The Rellich compactness then prevents arbitrarily fine oscillations. Thus, the neural gradient term acts as a curvature penalty that suppresses pathological microstructure formation.

Ellipticity versus well-posedness

It is crucial to distinguish:

Local strong ellipticity of the first-gradient tensor.

Global coercivity and the existence of minimizers of the full functional.

Second-gradient regularization does not restore classical ellipticity of $ℂ_{θ}$ . However, it ensures variational well-posedness through higher-order coercivity.

9.6. Microstructure and relaxation

In classical first-gradient elasticity, loss of quasiconvexity of the strain energy density $W_{0}$ constitutes the fundamental mechanism underlying microstructure formation. When $W_{0}$ fails to be quasiconvex at a given strain state, minimizing sequences of the associated variational problem may develop increasingly fine oscillations in order to lower the energy. These oscillations correspond to the formation of fine-scale microstructures, and in the absence of additional regularization, the energy functional may fail to admit minimizers in the natural Sobolev space. This phenomenon is well documented in the calculus of variations and is closely related to the gap between rank-one convexity and quasiconvexity.

The introduction of a second-gradient contribution modifies this picture fundamentally. Consider an energy of the form

W (ε, \nabla ε) = W_{0} (ε) + Ψ_{θ} (J_{1}), J_{1} = ∥ \nabla ε ∥^{2} .

The higher-order term penalizes spatial variations of the strain field through a curvature-type energetic cost. In particular, if $Ψ_{θ}$ satisfies $Ψ_{θ}^{'} (t) \geq c_{α} > 0$ , then the functional acquires coercivity in $H^{2} (Ω)$ . As a consequence, arbitrarily fine oscillations become energetically unfavorable, since high-frequency strain fluctuations necessarily increase $∥ \nabla ε ∥$ . The second-gradient term therefore acts as a regularization mechanism, selecting microstructures with finite characteristic length and restoring variational well-posedness even when $W_{0}$ is not quasiconvex.

Within the present neural strain-gradient framework, this regularization mechanism is not introduced a posteriori, but is embedded directly at the level of the energy representation. The neural mapping $Ψ_{θ}$ controls the curvature penalty explicitly, and its structural properties (monotonicity, convexity, coercive growth) can be enforced at the architectural level. Consequently, the learned constitutive model inherits a built-in relaxation mechanism that suppresses pathological fine-scale oscillations while remaining compatible with the stability and existence theory developed in the preceding sections.

9.7. Summary of the stability hierarchy

The various notions of convexity that arise in nonlinear elasticity form a strict hierarchy, reflecting progressively weaker structural requirements while retaining different levels of stability and variational control. In the classical setting, one has the chain of implications

\begin{matrix} Convexity & \Rightarrow Polyconvexity \Rightarrow Quasiconvexity \\ \Rightarrow Rank-one convexity \Rightarrow Legendre–Hadamard ellipticity . \end{matrix}

Each implication is strict in general. Convexity is sufficient for weak lower semicontinuity and therefore guarantees the existence of minimizers by the direct method, but it is often too restrictive for physically realistic elastic energies. Polyconvexity relaxes convexity by enlarging the argument space (e.g., to $(F, cof F, \det F)$ in finite strain) while still ensuring quasiconvexity and existence results under suitable growth conditions. Quasiconvexity is the correct condition for weak lower semicontinuity of integral functionals, whereas rank-one convexity and the Legendre–Hadamard condition express local stability with respect to rank-one perturbations and plane-wave modes.

Within the present invariant neural framework, these concepts acquire a clear structural interpretation:

Convex neural parametrizations (e.g., convex architectures in invariant space) guarantee weak lower semicontinuity and thus the existence of minimizers directly.

Polyconvex neural energies provide existence results under weaker structural assumptions, particularly in finite-strain settings, while remaining compatible with nonlinear kinematics.

Legendre–Hadamard conditions, expressed explicitly in terms of derivatives of the invariant scalar map $Φ_{θ}$ , ensure local strong ellipticity and therefore stability against infinitesimal localization.

Second-gradient regularization does not restore classical ellipticity of the first-gradient part, but it enforces global coercivity in $H^{2} (Ω)$ and penalizes high-frequency oscillations, thereby suppressing arbitrarily fine microstructure formation.

This layered structure embeds neural constitutive modeling into a mathematically controlled stability hierarchy. Rather than relying on empirical regularization alone, the framework makes explicit which architectural constraints correspond to which level of variational stability. In this way, expressive neural parametrizations can be reconciled with the fundamental analytical requirements of existence, ellipticity, and localization control in generalized continuum mechanics.

10. Discussion

The results established in the previous sections provide a coherent variational and stability framework for invariant neural energy representations in higher-order continua. We summarize the principal structural properties and discuss their mathematical implications.

10.1. Objectivity and representation

A foundational requirement in continuum mechanics is the material frame indifference. In classical constitutive theory, this constraint is not optional but structural: the stored energy must remain invariant under superposed rigid body motions. In the present framework, this requirement is enforced at the level of representation by constructing the stored energy density exclusively from isotropic scalar invariants of the infinitesimal strain tensor ε and its gradient $\nabla ε$ .

More precisely, the energy is written as

W_{θ} (ε, \nabla ε) = Φ_{θ} (I_{1}, I_{2}, J_{1}),

where $(I_{1}, I_{2}, J_{1})$ are invariant under the action of $SO (3)$ . Objectivity and isotropy therefore follow automatically from representation theory, independently of the parametrization of the scalar map $Φ_{θ}$ . No penalty terms, data-driven symmetry augmentation, or post-processing symmetrization procedures are required.

This structural enforcement sharply contrasts with generic tensor-to-tensor neural mappings $σ_{θ} = N_{θ} (ε, \nabla ε)$ , which must typically rely on soft constraints during training to approximate frame indifference and may violate symmetry outside the training set.

From the viewpoint of invariant theory, the constitutive modeling problem is reduced to the construction of a scalar function defined on a low-dimensional invariant space. This reduction not only simplifies the mathematical analysis but also clarifies the connection between neural parametrization and classical isotropic elasticity. In particular, classical quadratic elasticity emerges as a special case corresponding to quadratic forms in $(I_{1}, I_{2})$ , while higher-order neural terms provide controlled nonlinear extensions within the same structural class.

10.2. Variational structure

A central feature of the proposed framework is the preservation of potential structure. Rather than learning stresses directly, the model learns a stored energy density from which stress and hyperstress fields are derived as variational derivatives:

σ = \frac{\partial W_{θ}}{∂ε}, M = \frac{\partial W_{θ}}{\partial (\nabla ε)} .

This energetic construction yields several decisive consequences:

The Clausius–Duhem inequality is satisfied identically in the purely elastic case, since dissipation vanishes by construction.

The equilibrium problem admits a weak formulation in $H^{2} (Ω)$ , consistent with second-gradient kinematics.

The existence of minimizers follows from coercivity and weak lower semicontinuity via the direct method of the calculus of variations.

In contrast, black-box neural constitutive models typically regress stress tensors directly. Such models need not admit an underlying potential and therefore may violate integrability conditions, destroy symmetry of the tangent operator, or produce ill-posed equilibrium problems. The absence of a variational structure can lead to nonconservative stress fields, loss of ellipticity, and severe computational instability in Newton-type solvers.

The present framework eliminates these failure modes by embedding neural parametrizations within the classical energetic structure of continuum mechanics.

10.3. Stability and ellipticity

Local stability of elastic materials is governed by the Legendre–Hadamard condition. A key result of this work is that strong ellipticity of the first-gradient part can be characterized directly in terms of derivatives of the invariant scalar map $Φ_{θ}$ .

The chain-rule expansion of the tangent operator shows that the acoustic tensor depends explicitly on the Hessian of $Φ_{θ}$ with respect to $(I_{1}, I_{2})$ . Positivity of the invariant Hessian implies positivity of the acoustic tensor at the reference state, and therefore strong ellipticity in a neighborhood of small strains.

This establishes a precise link between scalar convexity properties in invariant space and tensorial stability conditions in physical space. Consequently, ellipticity constraints can be enforced directly at the level of neural parametrization, without explicit manipulation of the fourth-order elasticity tensor.

It is important to emphasize that the derived conditions are sufficient but not necessary. A complete invariant characterization of global ellipticity would require a refined analysis of coupling terms arising in the chain-rule expansion of $ℂ_{θ}$ . This remains an open analytical problem.

10.4. Polyconvex extensions

While convexity in $(ε, \nabla ε)$ guarantees the existence of minimizers, it is generally too restrictive for geometrically nonlinear elasticity. The introduction of polyconvex neural energies extends the framework naturally to finite-strain settings.

By representing the energy as

W_{θ} (F) = G_{θ} (F, cof F, \det F),

with $G_{θ}$ convex in its arguments, polyconvexity is enforced structurally. Since polyconvexity implies quasiconvexity, the existence of minimizers in $W^{1, p}$ follows under standard growth conditions.

From a modeling perspective, this allows the incorporation of nonlinear geometric effects, volumetric changes, and large deformations while retaining analytical well-posedness. This extension is particularly relevant for applications involving microstructural instabilities, buckling, or softening phenomena at finite strain.

10.5. Functional-analytic perspective

The framework embeds neural constitutive modeling into the classical functional setting of Sobolev spaces. The second-gradient contribution ensures coercivity in $H^{2} (Ω)$ , while convexity or polyconvexity ensures weak lower semicontinuity.

As a result, the equilibrium problem is well-posed in the sense of the direct method of the calculus of variations. Minimizing sequences are bounded, compactness is available, and existence follows rigorously. Neural parametrizations therefore operate within the same analytical framework as classical second-gradient elasticity.

This functional-analytic embedding distinguishes the present approach from purely empirical data-driven models, which typically lack existence guarantees or PDE-level stability analysis.

10.6. Numerical implementation strategies and computational outlook

The analytical framework developed in this work naturally calls for a carefully structured computational realization. Because the governing equations are fourth-order and the constitutive response is parametrized through invariant neural mappings, numerical implementation must preserve both variational structure and architectural constraints at the discrete level. We outline below a research program for the robust computational deployment of invariant neural higher-order energies.

Invariant evaluation at quadrature level

At the element level, the primary kinematic quantities are $ε (u_{h})$ and $\nabla ε (u_{h})$ evaluated at quadrature points. The scalar invariants

I_{1} = tr ε, I_{2} = ε : ε, J_{1} = ∥ \nabla ε ∥^{2}

are computed locally and passed to the neural scalar map $Φ_{θ}$ .

This invariant preprocessing layer guarantees that objectivity and isotropy are preserved exactly at the discrete level. Since the neural model only receives invariant inputs, no rotational augmentation or symmetrization is required in training or inference. This dramatically reduces the dimensionality of the learning problem and improves numerical conditioning.

Consistent linearization and automatic differentiation

Robust nonlinear finite element implementation requires the consistent tangent operator. Because stresses are defined as

σ = \frac{\partial W_{θ}}{∂ε}, M = \frac{\partial W_{θ}}{\partial (\nabla ε)},

the element residual and stiffness matrix must incorporate the full second variation of the energy.

Automatic differentiation (AD) provides a natural tool for this task. By implementing $Φ_{θ}$ in a differentiable programming framework, one obtains:

First derivatives for stress and hyperstress evaluation.

Second derivatives for construction of the fourth-order tangent tensor.

Guaranteed symmetry of the consistent stiffness matrix.

Preserving symmetry is crucial for quadratic convergence of Newton-type solvers and for compatibility with symmetric linear algebra routines. In contrast, stress-driven neural regressors may yield nonsymmetric or inconsistent tangents, leading to solver breakdown.

Discretization of fourth-order problems

Second-gradient elasticity leads to fourth-order PDEs. Several discretization strategies are possible:

$C^{1}$ -conforming finite elements. Hermite or spline-based elements provide direct $H^{2}$ conformity, ensuring variational consistency.

Mixed formulations. Introducing auxiliary variables for ε or $\nabla ε$ reduces the problem to a system of coupled second-order equations. This approach enables the use of standard $H^{1}$ -conforming elements.

Discontinuous Galerkin methods. Interior penalty formulations allow $C^{0}$ elements while weakly enforcing continuity of gradients.

Isogeometric analysis (IGA). Spline-based discretizations provide high-order continuity and are particularly well suited for gradient elasticity.

The choice of discretization influences the stability of the numerical solution and the computational cost. Systematic comparison of these approaches for neural higher-order energies constitutes a natural extension of the present work.

Architectural enforcement of stability at training time

The analytical results show that ellipticity and coercivity conditions can be expressed in terms of invariant derivatives of $Φ_{θ}$ . These constraints can be enforced numerically through:

Positive-definite Hessian parametrizations via Cholesky factors.

ICNN architectures.

Convex anchor decompositions with guaranteed quadratic growth.

Barrier or projection methods enforcing $Φ_{2} > 0$ and $Φ_{11} + 2 Φ_{2} > 0$ .

Training under such structural constraints ensures that the learned model remains within the admissible class throughout optimization, preventing the emergence of unstable parameter regimes.

Data-driven calibration with variational consistency

Parameter identification may proceed via minimization of an objective functional measuring discrepancy between experimental and predicted responses. A variationally consistent training procedure can be constructed as:

\min_{θ} \sum_{k} ∥ σ_{θ} (ε_{k}) - σ_{k}^{\exp} ∥^{2}

subject to structural constraints on $Φ_{θ}$ .

Because stresses derive from a potential, gradient-based optimization is stable and fully differentiable. Moreover, PDE-constrained training can be incorporated, where equilibrium equations are enforced at the field level rather than only at the material point level.

Multiscale and homogenization extensions

Invariant neural energies provide a natural interface with computational homogenization. At the microscale, high-fidelity simulations can generate energy data as functions of invariant measures. The neural model then serves as a reduced-order surrogate for the homogenized energy density.

Future work will investigate:

${FE}^{2}$ -type multiscale implementation.

Learning effective strain-gradient moduli from microstructure.

Scale-bridging between discrete lattice models and continuum higher-order representations.

Solver robustness and conditioning

Second-gradient terms introduce additional stiffness and length-scale parameters. Numerical conditioning depends strongly on the relative magnitude of the gradient contribution. Preconditioning strategies, block factorization methods, and operator-splitting techniques will be investigated to ensure scalability in large-scale simulations.

Particular attention must be paid to:

Spectral properties of the discrete fourth-order operator.

Interaction between neural nonlinearities and Newton updates.

Sensitivity of solutions near ellipticity boundaries.

Verification and benchmark problems

To validate the computational framework, a hierarchy of benchmark problems will be considered:

Patch tests for objectivity verification.

Acoustic tensor tests for ellipticity preservation.

Strain localization under softening.

Size-effect simulations in bending or indentation.

Finite-strain large-deformation tests for polyconvex models.

Such verification is essential to demonstrate that architectural constraints indeed translate into discrete stability and robustness.

Analytical versus computational reproducibility

The present work primarily addresses reproducibility at the analytical level. All assumptions, constitutive structures, variational settings, and stability conditions are stated explicitly and can be independently verified within the framework of generalized continuum mechanics and the calculus of variations. In this sense, the mathematical results are reproducible in the classical analytical sense. By contrast, computational reproducibility of trained neural constitutive models requires additional implementation-specific components, including discretization choices for fourth-order PDEs, AD strategies for stress and tangent operators, optimizer selection, training protocols, benchmark datasets, and finite-element implementations compatible with higher-order continua. More broadly, the numerical realization of invariant neural higher-order energies requires the integration of differentiable programming, advanced finite-element technology, and structurally constrained optimization procedures. The analytical guarantees established in this work provide the mathematical and variational foundation for such developments. A systematic computational investigation, including reproducible training pipelines and large-scale numerical benchmarks, constitutes an important direction for future research and will complete the bridge between mathematical admissibility and data-driven computational mechanics.

Local versus global stability

The Legendre–Hadamard conditions derived in the present work provide local sufficient conditions for strong ellipticity near the reference configuration. Such conditions characterize infinitesimal stability with respect to rank-one perturbations, but they do not by themselves guarantee global variational well-posedness for arbitrary strain states. Global existence and stability generally require stronger structural assumptions such as quasiconvexity or polyconvexity together with suitable coercive growth conditions. The present framework therefore distinguishes carefully between local ellipticity, global coercivity, and stronger nonlinear stability notions arising in the calculus of variations.

10.7. Limitations and open problems

Several open questions remain:

The analysis was carried out primarily in the small-strain regime. Extension to geometrically nonlinear higher-order continua requires a systematic treatment of polyconvexity in extended variable spaces.

The sufficient ellipticity conditions may be conservative. Sharper criteria relating invariant convexity and rank-one convexity remain to be derived.

Quasiconvexity for second-gradient neural energies is not fully characterized and constitutes a challenging open problem.

Despite these limitations, the present framework demonstrates that neural parametrizations can be integrated into generalized continuum mechanics without sacrificing invariance, stability, or variational consistency. It provides a mathematically controlled alternative to purely empirical constitutive learning and establishes a rigorous bridge between machine learning and higher-order continuum theory.

11. Conclusion

This work has developed a rigorous variational framework for invariant neural representations of stored energy densities in higher-order continuum models. By expressing the energy as a neural mapping of isotropic invariants of the strain tensor and its gradient, objectivity and isotropy are enforced structurally rather than through penalization or data augmentation procedures. The associated stress and hyperstress fields are derived as variational derivatives of a stored-energy potential, thereby guaranteeing thermodynamic admissibility, energetic conjugacy, and compatibility with the principle of virtual power.

From a functional-analytic perspective, we established explicit growth, regularity, and convexity-type assumptions under which the total potential energy is coercive in $H^{2} (Ω)$ , weakly lower semicontinuous, and admits minimizers via the direct method of the calculus of variations. A detailed Legendre–Hadamard analysis further yielded explicit sufficient conditions for preservation of strong ellipticity, expressed directly in invariant space through derivatives of the scalar neural map. This provides a transparent connection between neural parametrization, convexity properties, and stability against localization and high-frequency oscillations. We additionally clarified the hierarchy between convexity, polyconvexity, quasiconvexity, rank-one convexity, and Legendre–Hadamard ellipticity in the context of second-gradient neural energies and outlined a polyconvex extension appropriate for finite-strain settings.

The present framework demonstrates that neural constitutive modeling need not be incompatible with the analytical structure of continuum mechanics. Rather than replacing classical constitutive theory, neural parametrizations can be embedded within admissible variational classes that preserve invariance, the existence of minimizers, thermodynamic consistency, and PDE-level stability. In this sense, the neural architecture becomes part of the constitutive structure itself, rather than a purely empirical regression mechanism. This establishes a mathematically controlled pathway toward data-driven generalized continua compatible with variational principles, ellipticity constraints, and modern computational mechanics.

An important aspect of the present contribution is that the analytical results are reproducible in the classical mathematical sense. All assumptions are stated explicitly, the variational setting is fully specified, and the proofs follow standard principles from higher-order continuum mechanics and the calculus of variations. The framework is therefore independently verifiable by readers familiar with second-gradient elasticity and nonlinear variational analysis.

At the same time, the present study is intentionally theoretical in scope and does not yet constitute a complete computational implementation framework. Although Sections 7, 8, and 10.6 provide explicit architectural prescriptions for enforcing convexity, coercivity, polyconvexity, and ellipticity at the neural-design level, the manuscript does not include training datasets, benchmark finite-element simulations, optimizer studies, or large-scale numerical experiments. Consequently, while the variational and thermodynamic structure is fully reproducible, practical reproducibility of trained neural constitutive models will require additional computational developments involving: (1) standardized datasets, (2) reproducible training protocols, (3) automatic-differentiation-based consistent tangent operators, (4) robust discretizations of fourth-order PDEs, and (5) open computational implementations integrated with finite-element solvers.

Several open mathematical and computational problems therefore remain. A complete characterization of global ellipticity in invariant coordinates, sharper quasiconvexity criteria for strain-gradient neural energies, and extensions to finite-strain gradient plasticity, damage, and dissipative generalized continua constitute natural analytical directions. From the computational viewpoint, systematic implementation studies, reproducible numerical benchmarks, and PDE-constrained training strategies for invariant neural higher-order energies remain to be developed.

The framework proposed here provides the analytical foundation necessary for such future investigations and establishes a rigorous bridge between machine learning, the calculus of variations, and the mathematical theory of generalized continuum mechanics.

Footnotes

ORCID iD

Koffi Enakoutsa

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Mindlin

. Micro-structure in linear elasticity. Arch Ration Mech Anal 1964; 16: 51–78.

Germain

. The method of virtual power in continuum mechanics. Part 2: microstructure. SIAM J Appl Math 1973; 25: 556–575.

Toupin

. Elastic materials with couple-stresses. Arch Ration Mech Anal 1962; 11: 385–414.

Eringen

. The linear theory of micropolar elasticity. J Math Mech 1966; 15: 909–923.

Altenbach

Ochsner

Eremeyev

. Generalized continua: from the theory to engineering applications. Springer, 2010.

Forest

. Micromorphic approach for gradient elasticity, viscoplasticity and damage. J Eng Mech 2009; 135: 117–131.

Fleck

Hutchinson

. A phenomenological theory for strain gradient effects in plasticity. J Mech Phys Solids 1993; 41: 1825–1857.

Aifantis

. On the microstructural origin of certain inelastic models. J Eng Mater Technol 1984; 106: 326–330.

Alibert

Seppecher

dell’Isola

. Truss modular beams with deformation energy depending on higher displacement gradients. Math Mech Solids 2003; 8: 51–73.

10.

dell’Isola

Giorgio

Pawlikowski

, et al. Large deformations of planar extensible beams and pantographic lattices: heuristic homogenization, experimental and numerical examples of equilibrium. Proc R Soc A Math Phys Eng Sci 2016; 472: 20150790.

11.

Ciallella

Giorgio

Eugster

, et al. Generalized beam model for the analysis of wave propagation with a symmetric pattern of deformation in planar pantographic sheets. Wave Motion 2022; 113: 102986.

12.

Giorgio

dell’Isola

Steigmann

. Second-grade elasticity of three-dimensional pantographic lattices: theory and numerical experiments. Contin Mech Thermodyn 2024; 36: 1181–1193.

13.

Moschini

Murcia Terranova

D’Annibale

. Kinematics of zigzagged articulated parallelograms with articulated braces (ZAPAB) mechanisms. Math Mech Complex Syst 2025; 13: 471–500.

14.

Dacorogna

. Direct methods in the calculus of variations. 2nd ed. Springer, 2008.

15.

Morrey

. Quasiconvexity and the lower semicontinuity of multiple integrals. Pac J Math 1952; 2: 25–53.

16.

Ghaboussi

Garrett Jr

. Knowledge-based modeling of material behavior with neural networks. J Eng Mech 1991; 117: 132–153.

17.

Kirchdoerfer

Ortiz

. Data-driven computational mechanics. Comput Methods Appl Mech Eng 2016; 304: 81–101.

18.

Raissi

Perdikaris

Karniadakis

. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear PDEs. J Comput Phys 2019; 378: 686–707.

19.

Truesdell

Noll

. The non-linear field theories of mechanics. Handbuch der Physik III/3. Springer, 1965.

20.

Spencer

AJM

. Theory of invariants. In: Eringen

(ed.) Continuum physics, vol. 1. Academic Press, 1971, pp. 239–353.

21.

Ogden

. Non-linear elastic deformations. Dover, 1997.

22.

Coleman

Noll

. The thermodynamics of elastic materials with heat conduction and viscosity. Arch Ration Mech Anal 1963; 13: 167–178.

23.

Neff

Pauly

Witsch

. Poincaré meets Korn via Maxwell: extending Korn’s first inequality to incompatible tensor fields. J Differ Equations 2015; 258: 1267–1302.

24.

Marsden

Hughes

TJR

. Mathematical foundations of elasticity. Prentice-Hall, 1983.

25.

Ball

. Convexity conditions and existence theorems in nonlinear elasticity. Arch Ration Mech Anal 1976; 63: 337–403.

26.

Amos

Kolter

. Input convex neural networks. In: Proceedings of the 34th international conference on machine learning, vol. 70, Sydney, NSW, Australia, 6–11 August, 2017, pp. 146–155. PMLR.

27.

Chen

Guilleminot

. Polyconvex neural networks for hyperelastic constitutive models: a rectification approach. Mech Res Commun 2022; 125: 103993.

28.

Fuhg

Jadoon

Weeger

, et al. Polyconvex neural network models of thermoelasticity. J Mech Phys Solids 2024; 192: 105837.

Architecturally constrained neural energy representations for variationally well-posed higher-order continua

Abstract

Keywords

1. Introduction

Outline of the paper

2. Kinematics of second-gradient continua

2.1. Displacement field and strain measures

2.2. Admissible function space

2.3. Energy functional

2.4. Variational problem

2.5. Euler–Lagrange equations

3. Invariant neural energy representation

3.1. Representation-theoretic framework

3.2. Restricted invariant modeling class

3.3. Invariance of the generating set

3.4. Neural invariant Ansatz

3.5. Objectivity and isotropy

3.6. Scope of the representation

3.7. Regularity

3.8. Extension to anisotropy

4. Thermodynamic structure and dissipation inequality

4.1. Balance laws

4.2. Free-energy density

4.3. Local dissipation inequality

4.4. Elastic case

4.5. Energetic conjugacy and boundary contributions

4.6. Extension to inelastic internal variables

4.7. Structural implications for neural parametrizations

5. Existence of minimizers in H 2 ( Ω )

5.1. Variational setting

5.2. Structural assumptions

5.3. Second-order Korn inequality

5.4. Coercivity

5.5. Weak lower semicontinuity

5.6. Existence of minimizers

5.7. Remarks

6. Strong ellipticity and the Legendre–Hadamard condition

6.1. Second variation and tangent operator

6.2. Legendre–Hadamard condition

6.3. Ellipticity at the reference configuration

6.4. Local preservation of ellipticity

6.5. Interpretation in invariant space

7. Polyconvex neural energy representations

7.1. Polyconvexity in finite strain

7.2. Neural polyconvex ansatz

7.3. Existence result in finite strain

7.4. Relation to strong ellipticity

7.5. Neural implementation strategies

1. ICNN

2. Positive-semidefinite Hessian parametrization

3. Convex anchor and convex residual decomposition

Relation to polyconvexity

8. Neural architecture constraints and structural enforcement

8.1. Invariant input encoding

8.2. Ellipticity constraints in invariant space

8.3. Convexity enforcement

8.4. Coercive growth control

8.5. Comparison with unconstrained neural stress models

9. Relation to convexity, rank-one convexity, and quasiconvexity

9.1. Convexity and weak lower semicontinuity

9.2. Rank-one convexity and the Legendre–Hadamard condition

9.3. Quasiconvexity

9.4. Second-gradient regularization

9.5. Implications for neural energy representations

Higher-order coercivity

Regularization of localization

Suppression of fine-scale microstructure

Ellipticity versus well-posedness

9.6. Microstructure and relaxation

9.7. Summary of the stability hierarchy

10. Discussion

10.1. Objectivity and representation

10.2. Variational structure

10.3. Stability and ellipticity

10.4. Polyconvex extensions

10.5. Functional-analytic perspective

10.6. Numerical implementation strategies and computational outlook

Invariant evaluation at quadrature level

Consistent linearization and automatic differentiation

Discretization of fourth-order problems

5. Existence of minimizers in $H^{2} (Ω)$