GaRLILEO: Gravity-aligned radar-leg-inertial enhanced odometry

Abstract

Deployment of legged robots for navigating challenging terrains (e.g., stairs, slopes, and unstructured environments) has gained increasing preference over wheel-based platforms. In such scenarios, accurate odometry estimation is a preliminary requirement for stable locomotion, localization, and mapping. Traditional proprioceptive approaches, which rely on leg kinematics sensor modalities and inertial sensing, suffer from irrepressible vertical drift caused by frequent contact impacts, foot slippage, and vibrations, particularly affected by inaccurate roll and pitch estimation. Existing methods incorporate exteroceptive sensors such as light detection and ranging (LiDAR) or cameras. Further enhancement has been introduced by leveraging gravity vector estimation to add additional observations on roll and pitch, thereby increasing the accuracy of vertical pose estimation. However, these approaches tend to degrade in feature-sparse or repetitive scenes and are prone to errors from double-integrated IMU acceleration. To address these challenges, we propose GaRLILEO, a novel gravity-aligned continuous-time radar-leg-inertial odometry framework. GaRLILEO decouples velocity from the IMU by building a continuous-time ego-velocity spline from SoC radar Doppler and leg kinematics information, enabling seamless sensor fusion which mitigates odometry distortion. In addition, GaRLILEO can reliably capture accurate gravity vectors leveraging a novel soft $S^{2}$ -constrained gravity factor, improving vertical pose accuracy without relying on LiDAR or cameras. Evaluated on a self-collected real-world dataset with diverse indoor-outdoor trajectories, GaRLILEO demonstrates state-of-the-art accuracy, particularly in vertical odometry estimation on stairs and slopes. We open-source both our dataset and algorithm to foster further research in legged robot odometry and SLAM. https://garlileo.github.io/GaRLILEO/.

Keywords

radar legged robot odometry gravity SLAM

1. Introduction

Legged robots have increasingly attracted attention for their robust mobility and adaptability in harsh environments, where traditional wheeled unmanned grounded vehicle (UGV) systems often struggle. Their ability to traverse stairs, steep slopes, uneven surfaces, and unstructured terrain makes them well suited for real-world deployment in search-and-rescue, inspection, and exploration tasks (Tranzatto et al., 2022a). To fully leverage these capabilities in practice, it is essential to ensure accurate odometry estimation, which underpins stable locomotion, localization, and mapping in such challenging scenarios (Figure 1).

Figure 1.

Overall preview of GaRLILEO. The four subfigures in the upper row present the problematic situations that quadrupedal robots may encounter while performing real-world tasks, while the yellow letters specify the situations and the red words explain the substantial issues generated from them. Two boxes in the left part of the lower row summarize the major contribution and method of the GaRLILEO, which significantly reduces odometry error, especially in the vertical direction. Two graphs in the right part of the lower row present the short experimental results, showing the accuracy of GaRLILEO in multiple sequences that include loops, sharp turns, and staircases, where most baselines fail to maintain accuracy in odometry estimation.

A common practice for robust state estimation in legged robots is leveraging proprioceptive sensing, which directly captures internal kinematics of the robot through contact measurements, joint encoders, and inertial sensing (Hartley et al., 2018b, 2020; Kim et al., 2021; Lin et al., 2023; Yang et al., 2023a). These proprioceptive approaches capitalize on the direct sensing of robot dynamics as they do not depend on external observations, making them inherently robust to visual or geometrical degradation. Nonetheless, frequent contact impacts, foot slippage, and intense vibrations significantly impair the accuracy of proprioceptive odometry, particularly in the vertical direction.

A natural progression to address this vertical drift is to leverage exteroceptive sensors, such as cameras and LiDARs. Most LiDAR-based methods apply ground segmentation and ground constraints to suppress vertical errors (Seo et al., 2022; Shan and Englot, 2018; Wang et al., 2024; Wei et al., 2021). However, these strategies are mainly effective in wide, flat, and structured environments, which differ significantly from the cluttered and irregular terrains targeted by legged robots. Camera-based approaches also frequently rely on planar segmentation and Manhattan world assumptions to constrain vertical error (Li et al., 2020; Shu et al., 2021), yet these assumptions break down in complex natural environments. Moreover, recent studies further indicate that simply introducing planar landmarks may not guarantee direct improvement in managing odometry drift (Arndt et al., 2023), underscoring the limitations of exteroceptive registration in realistic legged robot scenarios.

Using an inertial measurement unit (IMU) with cameras and LiDARs can significantly enhance the state estimation. Among its many advantages, gravity estimation provides additional constraint for roll and pitch, enhancing state estimation, a concept that was first introduced in VINS-Mono (Qin et al., 2018). While some studies (Agha et al., 2021; Kubelka et al., 2022; Nemiroff et al., 2023; Ramezani et al., 2022; Wang et al., 2023) report notable improvements, others indicate only marginal performance gains (Burnett et al., 2025). This may be because most existing methods rely on fusing IMU acceleration with pose estimates. Such fusion requires double integration, and the gravity estimate inherently depends on the quality of the pose estimation. Consequently, feeding this pose-dependent gravity estimate back into the state may provide only limited improvement.

A promising alternative for overcoming these limitations is the use of radar. By providing direct velocity measurements, radar can fuse its ego-velocity with IMU acceleration via single integration for local gravity estimation (Noh et al., 2025). While effective, this integration can be limited in legged robot operations, where velocity changes sporadically due to frequent contacts and impacts. Another more intuitive way to integrate radar into the legged robot system is by utilizing the instantaneous ego-velocity derived from the leg kinematics as in Co-RaL (Jung et al., 2024). This approach takes advantage of radar’s ability to operate reliably under environmental degradation while also utilizing the high-frequency, proprioceptive information provided by leg kinematics sensors. However, despite these improvements, residual vertical drift persists due to inaccurate roll and pitch estimation. This limitation primarily stems from the exclusive reliance on IMU gyroscopes for orientation.

To bridge the critical gap between proprioceptive odometry and robust gravity estimation, we extend our previous works GaRLIO (Noh et al., 2025) and Co-RaL (Jung et al., 2024) to complete GaRLILEO, a Gravity-aligned Radar-Leg-InertiaL Enhanced Odometry framework. First, we decouple the IMU from velocity estimation, relying solely on radar and leg kinematics to compute the ego-velocity. This decoupled scheme is particularly advantageous for legged robot systems where noisy accelerations are often recorded during ground contact. Unlike previous radar-based approaches that represent velocity as a discrete state, our method formulates velocity as a continuous-time variable and enables seamless sensor fusion across modalities with different frame rates. We note that this is uniquely feasible in a leg–radar system, as radar directly measures velocity and leg kinematics can directly compute it, eliminating the need for pose-level measurements that would require double integration.

Beyond seamless fusion between radar and leg kinematics, the fundamental challenge of pose estimation of legged robots lies in the contact impacts and vibrations that persistently corrupt gravity estimation. To suppress these sudden, undesired acceleration measurements from the IMU, we employ splines as an inherent filter, bounding the gravity vector within a smoothed, continuous vector space. Furthermore, to naturally constrain the magnitude of the gravity vector during optimization, we introduce $S^{2}$ gravity factor. Rather than conventionally restricting the vector to a unit sphere, this factor actively anchors its magnitude to 9.81 while optimizing its direction. This design enables precise gravity estimation, effectively constraining the roll and pitch of a legged robot, even under harsh conditions such as contact impacts, instantaneous slips, and intense vibrations. Our main contributions are as follows:

• The proposed continuous-time proprioceptive state estimation framework effectively overcomes the asynchronous, high-impact nature of radar-Leg-IMU systems. By formulating ego-centric velocity through splines and decoupling it from noisy IMU accelerations, abrupt motion prevalent in legged locomotions is better handled. This continuous formulation inherently filters out instantaneous leg slips and radar noise, providing a stable, distortion-free foundation for downstream gravity and pose estimation without relying on visual or LiDAR features.

• Our velocity-aware gravity estimation directly attacks the pervasive issue of vertical drift in legged odometry. By integrating a soft $S^{2}$ -constrained gravity factor within our continuous-time framework, we continuously anchor the gravity vector’s magnitude while optimizing its direction. This ensures that even under the intense vibrations caused by foot contacts, the system maintains a precise lock on the local gravity vector, reducing roll and pitch degradation.

• We present a comprehensive, real-world radar-Leg-Inertial dataset which features aggressive elevation changes across stairs, slopes, and slippery indoor/outdoor terrain. Validated against high-fidelity terrestrial laser scanning (TLS) and motion capture ground truth trajectories, GaRLILEO demonstrates state-of-the-art (SOTA) performance, proving exceptionally resilient to z-axis variations where traditional methods fail. To foster further research, both the dataset and framework source code are released to the community.

2. Related works

In this section, we review prior work most relevant to our approach. We begin with system on chip (SoC) radar odometry methods that leverage ego-velocity estimation. We then turn to recent developments in odometry for legged robots, covering both exteroceptive and proprioceptive paradigms. Building on this, we highlight advances in gravity estimation within state estimation frameworks. Lastly, we review continuous-time odometry methods that are particularly relevant to our work.

2.1. SoC radar odometry

Radars have emerged as a critical sensing modality in robotics, offering robust perception in adverse environments (Harlow et al., 2024). Robotics applications commonly utilize two types of radars: spinning radars and phased-array SoC radars (Kim et al., 2025b). This paper mainly focuses on SoC radars, which employ Frequency modulated continuous wave (FMCW) technology to generate 4D point clouds capturing range, azimuth, elevation, and Doppler radial velocity. While structurally similar to LiDAR data, SoC radar measurements are typically sparser and exhibit lower point precision, posing unique challenges for odometry estimation.

A primary approach to leveraging SoC radar for odometry involves directly estimating ego-velocity using point-wise radial velocity measurements. Early work by Kellner et al. (2013) introduced instantaneous ego-motion estimation using random sample consensus (RANSAC) for outlier rejection and least-squares optimization for a single SoC radar. Building on this foundation, subsequent studies can be categorized into two main directions: those integrating spatial information into ego-velocity estimation and those fusing it with inertial measurements from IMUs.

Several approaches have explored the integration of spatial information from SoC radar point clouds. For example, Michalczyk et al. (2022) utilized stochastic cloning to associate 3D points between consecutive point clouds, thereby enhancing odometry accuracy. Similarly, 4D-iRIOM (Zhuang et al., 2023) combined SoC radar velocity with scan-matching techniques, improving robustness by aligning scan-to-submap registration with Doppler-driven ego-velocity estimates. Recently, Huang et al. (2024) introduced radar cross section (RCS)-based filtering to refine point correspondences. Despite these advancements, the low precision and sparsity of SoC radar point clouds often make registration vulnerable, leading to unbounded drift or failure in odometry estimation under challenging conditions.

In contrast to registration-based methods, other studies have integrated ego-velocity estimation with inertial measurements from IMUs. For instance, Doer and Trommer (2020) employed an extended Kalman filter (EKF), while Park et al. (2021) utilized factor graph optimization to achieve 6-degree of freedom (DoF) odometry in visually degraded environments. Specifically, Park et al. (2021) leveraged two perpendicular SoC radars for 3D ego-velocity estimation and introduced a radar velocity factor for pose-graph simultaneous localization and mapping (SLAM) that incorporates IMU rotation data. More recently, DRIO (Chen et al., 2023) estimated ego-velocity and ground points, achieving robust 2D odometry. Also, Co-RaL (Jung et al., 2024) proposed a 4-DoF optimization strategy to mitigate vertical drift caused by the limited elevation resolution of SoC radars. Similarly, DeRO (Do et al., 2024) employed dead reckoning with SoC radar ego-velocity and gyroscope data, further refined through an iterative extended Kalman filter (IEKF) with tilt angle estimation based on accelerometers. Recently, River (Chen et al., 2024) tightly fused SoC radar ego-velocity and IMU measurements using a B-spline-based framework, presenting precise velocity estimation.

While prior works have significantly advanced SoC radar odometry by leveraging ego-velocity and spatial information, they face persistent challenges in mitigating vertical drift and ensuring robust performance under the dynamic conditions of legged robot systems. In such scenarios, low-precision vertical velocity measurements from SoC radars and contact-induced noise in IMU acceleration data often contribute to significant vertical drift in odometry estimation. Most existing methods either suffer from unbounded drift due to insufficient vertical constraints or rely on point cloud registration schemes that are highly sensitive to radar noise and sparsity. To overcome these limitations, our proposed method, GaRLILEO, seamlessly integrates radar-derived ego-velocity with high-rate proprioceptive measurements within a continuous-time framework, significantly reducing odometry distortion. Furthermore, our robust proprioceptive-based local gravity estimation scheme effectively mitigates vertical drift, enabling stable and accurate odometry even in dynamic and complex environments.

2.2. Leg kinematics odometry

Compared with traditional wheeled UGV, a legged robot provides two unique sensor modalities: (1) contact sensors, which indicate the contact state of each foot, and (2) joint encoders, which measure the orientation of each joint. Using forward kinematics, the relative position of each foot with respect to the robot base can be computed (Roston and Krotkov, 1992), completing leg-based odometry. The leg odometry can be divided into two categories: methods that leverage exteroceptive sensors such as LiDAR or cameras, and those that focus primarily on fusing proprioceptive sensors alone.

2.2.1. Exteroceptive approach

A wide range of odometry frameworks for legged robots have harnessed exteroceptive sensors—such as cameras and LiDAR—in combination with leg kinematics, to capitalize on feature-rich geometric information. Fallon et al. (2014) introduced a pose estimator that fuses inertial and leg kinematics measurements with localization derived from LiDAR and pre-built maps. Similarly, Nobili et al. (2017) fused inertial, leg kinematics, camera, and LiDAR measurements based on the EKF, demonstrating robust state estimation across multiple walking gaits. This line of work was further extended by Pronto (Camurri et al., 2020), which focused on managing time-delayed signals from the vision sensors and fusing leg kinematics velocity from each leg by using a weighted average. VILENS (Wisth et al., 2022) proposed a tightly-coupled, graph-optimization-based approach by fusing camera, LiDAR, leg kinematics sensors, and IMU. This work enabled temporal proprioceptive odometry based on the preintegration factor when exteroceptive sensors failed under extreme conditions. STEP (Kim et al., 2022) improved the stability of stereo camera-based odometry by incorporating leg kinematics, achieving robust performance in dynamic scenes. Similarly, Leg-KILO (Ou et al., 2024) demonstrated stable LiDAR-based odometry by leveraging leg kinematics during dynamic movements of legged robots. Cerberus (Yang et al., 2023b) introduced online calibration of kinematics parameters and contact outlier rejection to reduce drift in camera-IMU-leg kinematics odometry. To address dynamic locomotion behaviors such as jumping and trotting at varying speeds, Dhédin et al. (2023) fused leg odometry from Pronto (Camurri et al., 2020) with IMU frequency speed camera-IMU odometry. Recently, MUSE (Nisticò et al., 2025) integrated foot-slip detection with camera-LiDAR odometry to further enhance robustness. Holistic Fusion (Nubert et al., 2025) provided a unified framework in which leg odometry, especially using the contact frame as a landmark feature, could be seamlessly fused with exteroceptive sensors.

While these exteroceptive sensor-based approaches have shown strong odometry performance—especially when combined with leg kinematics information—they still encounter challenges in environments where perceptual features are sparse, ambiguous, or repetitive. In particular, visual and LiDAR odometry can degrade rapidly in poorly illuminated areas, featureless corridors, reflective surfaces, or in the presence of dense smoke and dust, limiting their reliability in many real-world legged robot applications.

2.2.2. Proprioceptive approach

In contrast to exteroceptive approaches, proprioceptive-based odometry estimates the robot pose without dependence on external features. Bloesch et al. (2012) introduced an EKF-based framework that fused leg kinematics measurements and IMU data for state estimation, which was later extended to a unscented Kalman filter (UKF) backbone (Bloesch et al., 2013).

Based on the contact theorem that a contact frame is fixed while a contact sensor is on, Hartley et al. (2018b) proposed a forward kinematics factor and a preintegrated contact factor. This was later generalized to a hybrid contact factor that dynamically switches the contact frame based on the assumption that at least one foot is in contact with the ground (Hartley et al., 2018a). The contact kinematics theory was incorporated into an invariant EKF framework, yielding a Lie group-based estimator that achieved globally consistent state estimation using IMU, kinematics, and contact measurements (Hartley et al., 2020). Fink and Semini (2020) additionally fused force and torque sensor data to develop a low-level state estimator, calculating both kinematics and dynamics of the robot. Still, the possible foot slip, even with the contact sensor in place, remains a challenge.

To address foot slip on the contact frame, approaches from various directions have been introduced recently. TSIF (Bloesch et al., 2017) presented a recursive estimation framework minimizing the residual between two consecutive states and demonstrated improved resilience to measurement outliers. Kim et al. (2021) adopted a fixed-lag smoother state estimation on the SO(3) manifold, paired with slip rejection strategies to mitigate kinematics model failures. DRIFT (Lin et al., 2023) combined contact estimation and gyro filtering within an invariant EKF, enabling robust odometry on low-cost legged robots. Yang et al. (2023a) utilized multiple IMUs on each foot to explicitly detect contact and foot slip, overcoming the zero-velocity contact frame assumption prevalent in contact sensor-based proprioceptive odometry studies.

Although proprioceptive odometry is robust to environmental conditions and feature degradation, it suffers from drift over time, particularly in the vertical direction, due to the drastic vibration from contact impact and the lack of an absolute orientation reference. To address this, Co-RaL (Jung et al., 2024) introduced an integration of SoC radar-derived ego-velocity with leg kinematics velocity incorporating rolling contact awareness. However, despite these integrated measurements, vertical drift remains due to inaccurate roll and pitch estimation. This stems from not accounting for IMU acceleration, which contains critical information, such as the gravity vector, essential for accurate odometry.

GaRLILEO improves upon Co-RaL by integrating SoC radar, leg kinematics, and IMU data using a continuous-time B-spline approach and introducing robust continuous velocity-aware gravity estimation. This enables substantially improved robustness of odometry, especially in the vertical direction, compared to Co-RaL and other purely proprioceptive methods.

2.3. Local gravity estimation

Gravity, with its constant magnitude and direction, provides a physically grounded reference for reliable roll and pitch estimation in odometry and SLAM. Accurate roll and pitch estimation is crucial for mitigating vertical drift, which is stimulated by erroneous leakage of horizontal movement onto the vertical axis. Accordingly, accurate local gravity estimation has emerged as a critical constraint for suppressing accumulated vertical drift errors in state estimation frameworks.

Early gravity estimation in LiDAR odometry typically inferred the local gravity vector from IMU acceleration or employed probabilistic filtering over time. Nebula (Agha et al., 2021) introduced a gravity factor based on IMU acceleration during stationary intervals, constraining roll and pitch in the state estimation. D-LIOM (Wang et al., 2023) and Wildcat (Ramezani et al., 2022) formulated gravity alignment as an optimization constraint using IMU data and exteroceptive-based odometry. Nemiroff et al. (2023) further extended this by jointly optimizing accelerometer intrinsics and the gravity vector. While these velocity-ignorant models presented the potential for local gravity estimation to mitigate odometry vertical drift, they fundamentally relied on correlating pose changes from exteroceptive odometry with IMU acceleration, which amplified errors due to IMU bias and noise through the process of double integration. This limitation hinders robustness in dynamic environments or those prone to slippage. Furthermore, Burnett et al. (2025) report that these methods offer only marginal performance gains; this is likely due to the inherently pose-dependent nature of velocity-ignorant gravity estimation, which limits the benefit of feeding estimation back into the state.

To overcome these issues, GaRLIO (Noh et al., 2025) explicitly incorporated direct velocity measurements from radar Doppler data for local gravity estimation. By fusing radar-derived ego-velocity with LiDAR odometry, GaRLIO constructed a velocity-aware gravity constraint that significantly enhanced the accuracy of gravity estimation. However, GaRLIO still depended on an initial gravity guess from LiDAR scan registration, making it susceptible in feature-degraded environments or unadaptable to a system without LiDAR. Additionally, mismatches in time intervals between the IMU preintegration and radar ego-velocity updates may introduce estimation inconsistencies due to the lack of continuity in the discrete-time sensor-fusion approach.

In contrast, our GaRLILEO framework integrates a continuous velocity-aware local gravity estimation approach, enabling precise and robust estimation of the gravity vector even in visually degraded or featureless environments. Furthermore, by continuously fusing SoC radar velocity and leg kinematics measurements via a time-continuous B-spline optimization framework, GaRLILEO prevents the inconsistency that may arise from discrete sensor fusion across modalities with differing frame rates.

2.4. Continuous-time state estimation

Sensor fusion is essential in robotics to leverage the complementary strengths of heterogeneous sensors. Traditional discrete-time frameworks synchronize each sensor by associating with the nearest available timestamp; however, this introduces motion distortion and overlooks latent state information between sampled states (Talbot et al., 2024).

To address this limitation, recent works have adopted B-spline-based continuous-time trajectory representations, enabling smooth and differentiable state estimation that supports querying poses, velocities, and accelerations at arbitrary time instances. Furgale et al. (2012, 2015) formulated batch estimation of robot trajectories in SE(3) using temporal B-splines, establishing the groundwork for continuous-time sensor fusion.

Ovrén and Forssén (2019), Hug and Chli (2020), Lang et al. (2022), and Hug et al. (2022) extended these ideas for vision-inertial fusion, achieving improved robustness to motion distortion and asynchronous measurements. Similarly, Droeschel and Behnke (2018) leveraged B-splines for continuous-time SLAM with LiDAR-inertial system. Later, recent works like Lang et al. (2023, 2024) and Lv et al. (2023) addressed multi-modal SLAM and odometry between LiDAR, camera, and IMU. Jung et al. (2023) further addressed asynchronous fusion for multi-LiDAR-IMU odometry using B-spline approach for continuous-time formulations. For SoC radar-IMU systems, River (Chen et al., 2024) introduced a B-spline-based radar-inertial velocity estimator that could operate robustly under perception-degraded conditions.

Inspired by these recent advances, GaRLILEO presents the first integration of leg kinematics with SoC radar and IMU in a continuous-time B-spline framework. By directly incorporating leg kinematics velocities and radar measurements at their respective timestamps without preintegration, GaRLILEO provides more accurate and robust state estimation. Additionally, this continuous-time framework even enhances the gravity estimation accuracy, reducing the vertical drift, especially in challenging legged robot scenarios where slip and dynamic contact events frequently occur.

3. Preliminary

In this section, we summarize the background for the following sections.

3.1. Radar radial velocity and ego velocity

Leveraging the FMCW technique, phased array radar can provide not only the 3D position of the points that they acquire, but also the radial velocity of the point in the sensor coordinate. If a single radar point is generated from a stationary object, the radial velocity of the point can be represented with the ego-centric velocity of the sensor. Let the ego-centric velocity of the sensor is v, the radial velocity of the point is v_j, and the 3D position of the point is represented with vector p_j; then their relationship is like follows:

v_{j} = - \frac{p_{j}}{‖p_{j}‖} v .

(1)

Using this radial velocity, we can calculate ego-velocity using RANSAC and least-square optimization processes (Kellner et al., 2013). For a simple illustration, let’s assume that the radar acquires only 2D data. Then, equation (1) can be expressed as follows:

\begin{aligned} θ_{j} & = \tan^{- 1} \frac{p_{j, y}}{p_{j, x}} \\ v_{j} & = - [\begin{matrix} \cos (θ_{j}) \sin (θ_{j}) \end{matrix}] v, \end{aligned}

(2)

where

v = {[v_{x} v_{y}]}^{⊤}

. Expanding (2) to all N radar points, the result equation is like follows:

\begin{aligned} [\begin{matrix} v_{1} \\ ⋮ \\ v_{N} \end{matrix}] = [\begin{matrix} \cos (θ_{1}) \sin (θ_{1}) \\ ⋮ \\ \cos (θ_{N}) \sin (θ_{N}) \end{matrix}] v, \end{aligned}

(3)

while this equation can be expanded to 3D points as follows:

\begin{aligned} [\begin{matrix} v_{1} \\ ⋮ \\ v_{N} \end{matrix}] = [\begin{matrix} \frac{p_{1, x}}{p_{1, r}} \frac{p_{1, y}}{p_{1, r}} \frac{p_{1, z}}{p_{1, r}} \\ ⋮ \\ \frac{p_{N, x}}{p_{N, r}} \frac{p_{N, y}}{p_{N, r}} \frac{p_{N, z}}{p_{N, r}} \end{matrix}] v, \end{aligned}

(4)

where

p_{j, r} = \sqrt{p_{j, x}^{2} + p_{j, y}^{2} + p_{j, z}^{2}}

Based on equation (3) or equation (4), whether the data is 2D or 3D, if the matrix including the angular information of points is referred to as M, and the vector including the radial velocity of points is referred to as v_R, the ego-velocity v can be calculated by least-square optimization using a pseudo-inverse matrix as $v = {(M^{⊤} M)}^{- 1} M^{⊤} v_{R}$ , or based on singular value decomposition (SVD) of matrix M.

3.2. Leg kinematics and ego velocity

Two key sensors for state estimation of the legged robot are the joint encoders and contact sensors. Joint encoders measure the absolute angle of each joint, being a key proprioceptive sensor for estimating the robot’s pose. Contact sensors are positioned at every foot, measuring whether the foot is currently in contact or not.

Using the joint encoder measurement and the physical modeling of each node, the relative coordinate transformation between frames $F^{i}$ and $F^{i + 1}$ at each joint can be calculated as follows:

\begin{aligned} T_{i + 1}^{i} & = [\begin{matrix} R_{i} t_{i} \\ 0^{⊤} 1 \end{matrix}] \\ R_{i} & = E x p ({α_{i} u_{i}}^{⊤}), \end{aligned}

(5)

where α_i denotes the joint angle measured by the i-th joint encoder, and u_i is the basis vector of the i-th joint expressed in its local frame. Using the T on every joint, the relative coordinate between the robot base frame and the end-effector (foot) frame can be calculated as follows:

T_{foot}^{base} = T_{hip}^{base} T_{knee}^{hip} T_{foot}^{knee},

(6)

assuming the hip and knee joints between the base coordinate and the foot coordinate. This chain-style frame calculation (i.e., forward kinematics) allows us to calculate the relative pose of each end-effector based on the base frame of the robot.

Under the no-slip assumption, when a single contact is made on any foot, the contacting frame of the foot should remain static until the contact state is resolved. Using this, during a foot is remaining contact state, the relative velocity of the base frame can be calculated by inverting the forward kinematics function to estimate the pose difference of the base frame during the contact state.

From this forward kinematics, we will derive the ego-centric velocity and use it for the velocity factor, which will be detailed in Section §4.3.

3.3. B-spline interpolation

A k-order B-spline consists of several polynomial segments of degree k − 1 with at most C^k−2 continuity (Patrikalakis and Maekawa, 2002). This continuous-time spline representation allows evaluating the state at arbitrary timestamps via smooth interpolation, which is particularly useful for fusing asynchronous sensors and reduces the need for explicit sensor-to-sensor time synchronization by enabling direct evaluation at each measurement time. Moreover, for legged robots where gait patterns introduce high-frequency noise, the smoothness of the spline acts as a low-pass filter, alleviating spike artifacts often observed in discrete-time alternatives.

Leveraging these advantages, we parameterize continuous-time trajectories of ego-centric velocity $v (t) \in R^{3}$ and rotation R(t) ∈ SO(3) using third-order (k = 3) uniform cumulative B-splines, each consisting of second-degree polynomial segments. Due to the continuity property, discontinuities in velocity and acceleration are effectively eliminated. This cumulative B-spline formulation offers O(k) computational complexity in temporal derivatives, making it well suited for real-time odometry (Sommer et al., 2020).

At a given time $t \in [t_{i}, t_{i + 1})$ , the velocity v(t) depends exclusively on k control points due to the local support property of B-splines, and its representation in matrix form is as follows:

v (u) = [\begin{matrix} v_{i} & d_{1}^{i} & \dots & d_{k - 1}^{i} \end{matrix}] \cdot {\tilde{M}}^{(k)} \cdot u^{⊤},

(7)

where v_i denotes the i-th control point of v(t),

d_{j}^{i} = v_{i + j} - v_{i + j - 1}

u = [\begin{matrix} 1 & u & \dots & u^{k - 1} \end{matrix}]

, u = (t − t_i)/(t_i+1 − t_i) is the normalized time within the knot interval, and

{\tilde{M}}^{(k)}

is the cumulative spline matrix that depends solely on the B-spline order. Since u is the only term that depends on time, differentiating equation (7) with respect to t yields the following expression:

\dot{v} (u) = \frac{1}{Δ t} [\begin{matrix} v_{i} & d_{1}^{i} & \dots & d_{k - 1}^{i} \end{matrix}] \cdot {\tilde{M}}^{(k)} \cdot {\dot{u}}^{⊤} .

(8)

At this point, we’d like to note that, in this paper, the velocity spline is constructed from ego-centric velocity; therefore, $\dot{v}$ does not directly represent ego-centric acceleration. Computing ego-centric acceleration requires additional considerations about the angular velocity.

Analogous to the $R^{3}$ case, a cumulative B-spline of order k defined on a Lie group $L$ with control points $R_{0}, \dots, R_{N} \in L$ is expressed as

\begin{aligned} R (u) & = R_{i} \cdot \prod_{j = 1}^{k - 1} E x p (λ_{j} (u) \cdot L o g (R_{i + j - 1}^{⊤} R_{i + j})) \\ \dot{R} (u) & = R_{i} \cdot \prod_{j = 1}^{k - 1} \{\prod_{l = 1}^{j - 1} A_{l} (u)\} {\dot{A}}_{j} (u) \{\prod_{l = j + 1}^{k - 1} A_{l} (u)\}, \end{aligned}

(9)

where

A_{j} (u) = E x p (λ_{j} (u) \cdot L o g (R_{i + j - 1}^{⊤} R_{i + j}))

λ (τ) = {\tilde{M}}^{(k)} \cdot u (τ)

, and λ_j is j-th element of λ. More details can be found on Sommer et al. (2020).

4. Factor graph formulation

4.1. State definition and notation

In this work, we adopt a robot-centric state to embed the local gravity within the system state, which is defined as follows:

\begin{matrix} x & ≜ & [\begin{matrix} x_{R} & x_{v} & x_{g} & x_{b_{v}} & x_{b_{a}} & | & g^{G} & {\bar{R}}_{G}^{I_{0}} \end{matrix}], where \\ x_{R} & ≜ & [\begin{matrix} R_{I_{0}}^{G} & \dots & R_{I_{l - 1}}^{G} \end{matrix}] \\ x_{v} & ≜ & [\begin{matrix} v_{I_{0}}^{I_{0}} & \dots & v_{I_{m - 1}}^{I_{m - 1}} \end{matrix}] \\ x_{g} & ≜ & [\begin{matrix} g^{I_{0}} & \dots & g_{n - 1}^{I} \end{matrix}] \\ x_{b_{v}} & ≜ & [\begin{matrix} b_{v_{0}} & \dots & b_{v_{i - 1}} \end{matrix}] \in R^{2 \times i} \\ x_{b_{a}} & ≜ & [\begin{matrix} b_{a_{0}} & \dots & b_{a_{i - 1}} \end{matrix}] \in R^{3 \times i} \\ g^{G} & ≜ & [\begin{matrix} 0 & 0 & 9.81 \end{matrix}] . \end{matrix}

(10)

The system state x consists of B-spline control points and bias terms. x_R, x_v and x_g denote the control points of the B-splines for the SO(3) orientation R(t), ego-centric velocity v(t), and local gravity vector g(t), respectively. For g(t), we adopt an $R^{3}$ spline parameterization for practical implementation and debugging simplicity compared to an $S^{2}$ spline. $R_{I}^{G}$ represents the rotation of the IMU frame expressed in the global frame, while $v_{I}^{I}$ and g_I denote the ego-centric velocity and the local gravity vector, both expressed in the IMU frame. The bias terms b_v and b_a correspond to the velocity state bias and the IMU accelerometer bias, respectively. The global coordinate frame $F^{G}$ is defined such that its z-axis is aligned with the global gravity vector g^G. The rotation matrix ${\bar{R}}_{G}^{I_{0}}$ is used to align the estimated global gravity vector to g^G. Further details on g^G and ${\bar{R}}_{G}^{I_{0}}$ are included in Section §5.1.4.

To achieve sensor fusion with multiple sensor modalities, our algorithm employs an incremental factor graph optimization framework that leverages B-splines for optimizing each control point. The factor graph used in our approach comprises an IMU factor, a velocity factor, and a gravity factor, collectively enabling robust and accurate state estimation.

4.2. IMU factor

We employ two types of IMU factors: a gyroscope factor r_ω, and an accelerometer factor r_a. Let ${\tilde{a}}_{m_{i}}$ and ${\tilde{ω}}_{m_{i}}$ denote the acceleration and angular velocity measurements obtained from the IMU accelerometer and gyroscope at time t_i, respectively. The IMU measurements are modeled as

\begin{align} ω_{m_{i}} & = {\tilde{ω}}_{m_{i}} - b_{ω_{i}} - n_{ω_{i}} \end{align}

(11)

\begin{align} a_{m_{i}} & = {\tilde{a}}_{m_{i}} - b_{a_{i}} - n_{a_{i}}, \end{align}

(12)

where $ω_{m_{i}}$ and $a_{m_{i}}$ are the true angular velocity and acceleration at time t_i; $b_{ω_{i}}$ , $b_{a_{i}}$ are the gyroscope and accelerometer biases; and $n_{ω_{i}}$ , $n_{a_{i}}$ are zero-mean Gaussian noises. In our setup, the gyroscope bias is weakly observable, so we set $b_{ω_{i}} \equiv 0$ , while the accelerometer bias $b_{a_{i}}$ is modeled as a Gaussian-driven random walk. Building on these equations, the residual functions for both IMU factors are formulated as follows, enabling direct optimization of control points for the continuous-time trajectory on rotation and velocity:

\begin{aligned} r_{ω} (t_{i}) & = ω (t) - {\tilde{ω}}_{m_{i}} \\ r_{a} (t_{i}) & = (ω (t) \times v (t) + \dot{v} (t) - R {(t)}^{⊤} g^{G}) + b_{a_{i}} - {\tilde{a}}_{m_{i}} . \end{aligned}

(13)

4.3. Velocity factor

Our system leverages two sources of ego-centric velocity: a radar-derived velocity and a leg kinematics-based velocity. For the radar sensor, assuming that the j-th target point of radar measurements ${\tilde{p}}_{j}^{r}$ is stationary, the Doppler measurement ${\tilde{v}}_{m, j}^{r}$ equals the magnitude of the ego-velocity projected onto the unit line-of-sight vector to the target point and its conversion into the IMU coordinate frame is expressed as

\begin{aligned} {\tilde{v}}_{m_{i}, j}^{r} = - \frac{{({\tilde{p}}_{j}^{r})}^{⊤}}{‖{\tilde{p}}_{j}^{r}‖} {(R_{r}^{I})}^{⊤} (v_{i} + {⌊ ω_{i}^{I} ⌋}_{\times} t_{r}^{I}), \end{aligned}

(14)

where

[R_{r}^{I}, t_{r}^{I}]

is the extrinsic matrix from radar frame

F^{r}

to IMU frame

F^{I}

and v_i,

ω_{i}^{I}

are true ego-centric velocity and angular velocity at time t_i. Because static points predominate in most environments, this formulation provides highly robust velocity estimates; however, the limited point resolution of radar introduces noticeable inaccuracies, particularly along the elevation axis, as mentioned in Co-RaL (Jung et al., 2024).

The translation between the robot’s body frame $F^{B}$ and the contact frame $F^{C}$ at time t_i can be obtained using the following forward kinematics function f_p:

t_{C_{i}}^{B_{i}} = f_{p} ({\tilde{α}}_{i}),

(15)

where

\tilde{α}

is the vector of joint angle measurements from the encoders on every joint at time t_i. Joint position and velocity are obtained from the leg joint encoders, and the driver calculates which foot is in contact with the ground.¹ These measurements enable computation of ego-centric velocity in the robot’s body frame,

v_{B_{i}}^{B_{i}} = - J_{p} ({\tilde{α}}_{i}) {\tilde{\dot{α}}}_{i} - ω^{B} \times f_{p} ({\tilde{α}}_{i}),

(16)

where

J_{p} (\cdot) \in R^{3 \times 3}

is a Jacobian of f_p (⋅) (Wisth et al., 2022). Empirically, the joint velocity reported by the encoders equals the finite difference approximation of the joint positions; however, under highly nonlinear joint trajectories, the velocity obtained in this manner diverges from the actual velocity. Therefore, we computed the ego-centric velocity by differentiating the leg translation function

f_{p} (\tilde{α})

derived from joint positions only as follows:

v_{B_{i}}^{B_{i}} = - \frac{f_{p} ({\tilde{α}}_{i + 1}) - f_{p} ({\tilde{α}}_{i})}{t_{i + 1} - t_{i}},

(17)

which is intuitively explained in Figure 2. Using the contact-centric leg kinematic locomotion, the ego-centric velocity of the system can be estimated. While this approach provides accurate estimates when the contact frame remains fixed to the ground, practical deployments must account for leg slip. Accordingly, we augment the leg kinematics factor with a velocity bias term to account for slip-induced drift of the leg kinematics-based velocity, particularly on slippery or deformable surfaces. Because slippage occurs primarily along the horizontal x- and y- directions, the velocity bias is modeled exclusively in these two dimensions. Taking these considerations into account, the corresponding leg kinematics factor r_L and the radar factor r_R are defined as

\begin{aligned} r_{L} (t_{i}) & = = {(R_{B}^{I})}^{⊤} (v (t_{i}) + {⌊ ω (t_{i}) ⌋}_{\times} t_{B}^{I}) - v_{B_{i}}^{B_{i}} - [\begin{matrix} b_{v} & 0 \end{matrix}] \\ r_{R} (t_{i}) & = = \sum_{j \in [1, q]} [- \frac{{({\tilde{p}}_{j}^{r})}^{⊤}}{‖{\tilde{p}}_{j}^{r}‖} {(R_{r}^{I})}^{⊤} (v (t_{i}) + {⌊ ω (t_{i}) ⌋}_{\times} t_{r}^{I})] \\ + \sum_{j \in [1, q]} [- {\tilde{v}}_{m_{i}, j}^{r}], \end{aligned}

(18)

where

[R_{B}^{I}, t_{B}^{I}]

is the extrinsic matrix from robot base frame to IMU frame, and q refers number of radar targets.

Figure 2.

Comparison between body-centric and contact-centric leg locomotion of a single leg during the contact state. (a) On body-centric calculation, the end-effector position differs as time passes. Based on the contact information from the contact sensor positioned at every foot, every contact frame should remain static while the contact sensor is on. Therefore, using the forward kinematics, the ego-centric velocity of the robot base can be calculated as in (b).

4.4. Gravity factor

To mitigate the LiDAR-dependent and discrete-time sensor-fusion limitations of GaRLIO (Noh et al., 2025) and enable robust gravity estimation for legged robot platforms, we adopt a B-spline-based, velocity-aware gravity factor r_G to parameterize continuous-time state functions as follows:

\begin{aligned} r_{G} (t_{i}) = \sum_{i, j \in W} {‖g (t_{i}) - \frac{{\tilde{R}}_{m_{i}}^{⊤} {\tilde{R}}_{m_{j}} v (t_{j}) - v (t_{i}) - β_{j}^{i}}{t_{j} - t_{i}}‖}^{2}, \end{aligned}

(19)

where t_i and t_j denote the timestamps of the i-th and j-th IMU measurements included in window W, respectively.

{\tilde{R}}_{m}

denotes the onboard AHRS orientation measurement obtained from internal filter of IMU by fusing magnetometer with standard inertial measurements, while

β_{j}^{i} = \int_{t \in [t_{i}, t_{j})} {\tilde{R}}_{m} (t) ({\tilde{a}}_{t} - b_{a_{t}}) d t

. Furthermore, to mitigate erroneous gravity estimation caused by IMU noise, we also introduce a gravity spline equipped with a smoothing constraint

r_{S^{2}}

(Figure 3). The gravity spline control points are subject to a fixed-norm constraint. Still, since we parameterize local gravity using splines on an

R^{3}

manifold rather than an

S^{2}

manifold, the interpolated B-spline segments between control points are not explicitly norm-constrained. To resolve this, a novel first-order derivative factor is defined and applied to the spline as follows:

\begin{aligned} r_{S^{2}} (t) = \dot{g} (t) + {⌊ ω (t) ⌋}_{\times} g (t), \end{aligned}

(20)

which enforces a soft norm constraint throughout the spline trajectory.

Figure 3.

Soft $S^{2}$ -constrained gravity factor $r_{S^{2}}$ . Prior to optimization, the orange vectors are initialized through gravity spline extrapolation. During optimization, the factor (blue) constrains vectors that would otherwise drift in the $R^{3}$ manifold (red), steering them to lie on the $S^{2}$ surface (green).

5. Continuous radar-leg-IMU fusion

Figure 4 illustrates the overall pipeline of GaRLILEO. The algorithm performs state estimation using three distinct splines representing SO(3) rotation, ego-velocity, and local gravity. In the initialization stage, the three splines and global gravity vector are recovered from the input data, similar to River (Chen et al., 2024). After transforming the data into a gravity-aligned coordinate frame, a tightly coupled incremental factor graph optimization is executed. Subsequently, a rotation refinement procedure using the estimated local gravity is applied to the marginalized older states, followed by odometry estimation through a dead reckoning approach.

Figure 4.

Overview pipeline of GaRLILEO.

5.1. Initialization

5.1.1. SO(3) spline initialization

The initial SO(3) spline is constructed by applying the factor r_ω, using IMU gyroscope measurements as constraints. The initial global frame is set as the initial frame of IMU. The initialization is formulated as

\min_{x_{R}} \sum_{k \in W_{init}} {‖r_{ω} (t_{k})‖}^{2} .

(21)

5.1.2. Global gravity initialization

Our method refines rotation by estimating local gravity and comparing it with global gravity. Therefore, accurate estimation of the global gravity reference is critical. For global gravity estimation, two different methods are popularly exploited: dynamic initialization and stationary initialization methods.

The first type is dynamic initialization introduced in River (Chen et al., 2024), which estimates global gravity using an initial SO(3) spline and radar measurements. Here, ego-velocities are computed from radar scans collected during the initialization phase, and then are converted into the global frame using the initial SO(3) spline. The global gravity is estimated using the following equation:

\begin{aligned} \min_{g_{I_{0}}} & \sum_{k \in W_{init}} {‖r_{dg} (t_{k})‖}^{2}, \\ s . t . & ‖g_{0}^{I}‖ = 9.81, \\ w h e r e & r_{dg} (t_{k}) = \frac{α_{k + 1}^{k} - R (t_{k + 1}) v_{I_{k + 1}}^{I_{k + 1}} + R (t_{k}) v_{I_{k}}^{I_{k}}}{t_{k + 1} - t_{k}} - g^{I_{0}} \\ α_{k + 1}^{k} = \int_{t \in [r_{k}, r_{k + 1})} R (t) ({\tilde{a}}_{t} - {\hat{b}}_{a_{t}}) d t . \end{aligned}

(22)

However, due to the highly nonlinear characteristics of IMU acceleration measurements from contact impact during legged robot locomotion, estimating global gravity dynamically at the radar frame rate using only the initial SO(3) spline results in inaccurate estimation. Therefore, a more stable method for estimating the global gravity is required on the legged robot UGV system.

Instead, we adopted a stationary initialization method for precise global gravity estimation. By computing the mean and variance of ego-velocities derived from accumulated radar scans during the initialization phase, the stationary condition can be determined using the following equation:

\begin{aligned} mean (v_{I_{k}}^{I_{k}}) < τ_{1} & t r (V a r (v_{I_{k}}^{I_{k}})) < τ_{2} . \end{aligned}

(23)

Under stationary conditions, the accelerometer measurements directly correspond to gravity. The mean of these measurements $\bar{\tilde{a}}$ serves as a prior factor, which is combined with the factor from the dynamic initialization method to calculate global gravity:

\begin{aligned} r_{sg} (t_{k}) = & w_{1} \cdot (\bar{\tilde{a}} - g_{0}^{I}) \\ + & w_{2} \cdot (\frac{α_{k + 1}^{k} - R (t_{k + 1}) v_{I_{k + 1}}^{I_{k + 1}} + R (t_{k}) v_{I_{k}}^{I_{k}}}{t_{k + 1} - t_{k}} - g_{0}^{I}) . \end{aligned}

(24)

Since this approach yields a more stable and accurate global gravity estimation compared to dynamic initialization on a legged robot, every sequence included in the dataset of this paper begins stationary, thereby enabling the use of static initialization.

5.1.3. Velocity spline initialization

The radar, IMU, and leg kinematics information are leveraged to construct the ego-centric velocity spline. During this process, velocity and acceleration biases are simultaneously initialized by solving the following least square (LSQ) problem:

\begin{aligned} \min_{X} & \{\sum_{k \in I} w_{ω} {‖r_{ω} (t_{k})‖}^{2} + \sum_{k \in I} w_{a} {‖r_{a} (t_{k})‖}^{2} \\ + \sum_{k \in L} w_{L} {‖r_{L} (t_{k})‖}^{2} + \sum_{k \in R} w_{R} {‖r_{R} (t_{k})‖}^{2}\} . \end{aligned}

(25)

To maintain consistency, the SO(3) rotation spline is fixed during the initialization process of the velocity spline.

5.1.4. Z-axis alignment with global gravity

The gravity vector has only 2-DoF observations, roll and pitch direction (Kubelka et al., 2022). Thus, to ensure optimization updates are confined to observable axes without explicit constraints, the coordinate frame is transformed by aligning its z-axis with the global gravity vector. Consequently, the SO(3) splines undergo the same coordinate transformation:

\begin{aligned} x_{R} = & {({\bar{R}}_{G}^{I_{0}})}^{⊤} \cdot x_{R} \cdot {\bar{R}}_{G}^{I_{0}}, \\ w h e r e g^{G} = & {({\bar{R}}_{G}^{I_{0}})}^{⊤} \cdot g^{I_{0}} . \end{aligned}

(26)

5.1.5. Gravity spline initialization

The final phase of initialization is on gravity spline. We initialize the gravity spline by recovering it through control points computed as

g^{I_{k}} = {(R_{I_{k}}^{G})}^{⊤} g^{G},

(27)

using the previously initialized SO(3) splines.

5.2. Factor graph optimization

5.2.1. Incremental optimization

Following initialization and spline recovery, incremental factor graph optimization is performed on the B-spline control points, tightly coupling IMU, leg kinematics, and radar measurements. When new control points are introduced, an equal number of the oldest points in the window are marginalized, and the information of the remaining non-marginalized control points is condensed into a prior factor and propagated to the next optimization window (Figure 5). The end time of the window is determined by the timestamp of the incoming radar topic, and optimization begins after the three B-splines are linearly extended and their control points appended. The factor graph comprises both gravity factors $r_{G}, r_{S^{2}}$ ; an IMU gyroscope factor r_ω; radar and leg kinematics velocity factors r_R, r_L; a bias-prior factor r_b; a marginalization-prior factor r_p; and an end-tail factor r_e. The optimization problem associated with the proposed system is defined as follows:

\begin{aligned} \min_{X} \sum_{k \in W_{i}} & {w_{g} r_{G} (t_{k}) + w_{S^{2}} {‖r_{S^{2}} (t_{k})‖}^{2} + w_{ω} {‖r_{ω} (t_{k})‖}^{2} \\ + w_{L} {‖r_{L} (t_{k})‖}^{2} + w_{R} \cdot ρ_{r} ({‖r_{R} (t_{k})‖}^{2})\} \\ + w_{b} {‖r_{b_{i}}‖}^{2} + w_{p} {‖r_{p_{i}}‖}^{2} + w_{e} {‖r_{e_{i}}‖}^{2}, \\ s . t . & ‖g^{I_{i}}‖ = 9.81, \end{aligned}

(28)

where ρ_r is the Cauchy loss function, employed to mitigate the influence of non-static object-oriented radar points. The bias-prior factor and the end-tail factor, which are not addressed in Section §4, are defined as follows:

\begin{aligned} r_{b_{i}} & = [\begin{matrix} b_{a_{i}} - b_{a_{i - 1}} \\ b_{v_{i}} - b_{v_{i - 1}} \end{matrix}] \\ r_{e_{i}} & = [\begin{matrix} v_{I_{m - 1}}^{I_{m - 1}} - 2 \cdot v_{I_{m - 2}}^{I_{m - 2}} + v_{I_{m - 3}}^{I_{m - 3}} \\ L o g ({(R_{I_{l - 2}}^{G})}^{⊤} R_{I_{l - 3}}^{G} {(R_{I_{l - 2}}^{G})}^{⊤} R_{I_{l - 1}}^{G}) \end{matrix}] . \end{aligned}

(29)

Figure 5.

Factor Graph Overview. At each iteration, the number of control points marginalized (gray) equals the number newly added in the preceding window (red); the remaining control points are carried forward as a prior factor for the next solve (blue). The marginalized control points (gray) are then subjected to a post-optimization stage that refines the SO(3) spline.

Dependencies between sensor measurements and each spline during the optimization of (28) are illustrated in Figure 6. To mitigate the contact-induced noise of IMU acceleration degrading the velocity spline, we decoupled the velocity spline from IMU. Instead, we leverage continuous-time ego-centric velocity spline generated from SoC radar and leg kinematics measurements, improving the robustness of local gravity estimation compared with prior methods. Within each window W_i, the biases $b_{v_{i}}, b_{a_{i}}$ are assumed to remain constant, and an end-tail factor minimizes endpoint instability of the splines (Chen et al., 2024).

Figure 6.

Relationship between sensor data and splines during incremental optimization. Gray-shaded boxes indicate sensor measurements active within the sliding optimization window; colored lines denote inter-spline and measurement-spline dependencies. In GaRLILEO, velocity is decoupled from the IMU, constructing a continuous-time ego-velocity spline from radar and leg kinematics measurements. This decoupling is particularly advantageous for legged robots, where ground contact induces noisy accelerations on IMU.

5.2.2. Marginalization

At each optimization step, we marginalize out the states and measurements that moved outside of the sliding window, summarizing them into a single prior factor. By reusing historical constraints in this condensed form, the estimator retains observability of active variables while bounding the graph size, achieving computational efficiency for real-time performance. By linearizing all factors about the current best estimate of the i-th window, we obtain an equation as follows:

\begin{aligned} [\begin{matrix} H_{α α} & H_{α β} \\ H_{β α} & H_{β β} \end{matrix}] [\begin{matrix} x_{α} \\ x_{β} \end{matrix}] = [\begin{matrix} b_{α} \\ b_{β} \end{matrix}], \end{aligned}

(30)

where x_β represents the states which are marginalized out, while x_α indicates the states retained in the optimization. By applying the Schur complement (Sibley et al., 2010), we reformulate the equation as follows:

(H_{α α} - H_{α β} H_{β β}^{- 1} H_{β α}) \cdot x_{α} = b_{α} - H_{α β} H_{β β}^{- 1} b_{β},

(31)

leading to the marginalization-prior factor r_p as

\begin{aligned} r_{p_{i}} = (H_{α α} - H_{α β} H_{β β}^{- 1} H_{β α}) \cdot (x_{α, i} - x_{α, i - 1}) \\ - (b_{α} - H_{α β} H_{β β}^{- 1} b_{β}) . \end{aligned}

(32)

5.2.3. Post optimization

After incorporating the optimized local gravity state from (28), we refine the rotation states of the relevant static control points. This procedure is applied to the control points marginalized in the previous optimization window, as shown in Figure 5. In this step, we update only the rotation states while keeping others fixed:

\begin{aligned} \min_{x_{R}} & \sum {‖r_{post} (k)‖}^{2}, \\ w h e r e & r_{post} (k) = R_{I_{k}}^{G} g_{I_{k}} - g_{G} . \end{aligned}

(33)

Since the global z-axis is aligned with gravity, the above equation provides information only about roll and pitch but not yaw; rotation about the gravity axis remains unobservable. After the post-optimization step, the robot’s pose is estimated using dead reckoning based on the optimized states. The ego-centric velocity is first transformed into the global frame, and subsequently, the robot position is computed using the following equation:

p (t_{k}) = p (t_{k - 1}) + \int_{t \in [t_{k - 1}, t_{k})} R (t) \cdot v (t) d t .

(34)

6. Radar-leg-IMU dataset

6.1. System configuration

The overall sensor configuration and the coordinate frames of each sensor are illustrated in Figure 7, while detailed specifications for each sensor are provided in Table 1. Two different data acquisition setups are employed in this work, both built on Spot, a quadrupedal robot from Boston Dynamics, which provides joint encoder and contact sensor data at 150 Hz. An IWR1843BOOST mmWave radar module captures 4D radar point clouds with a maximum range of 11 m and a range resolution of 4.8 cm, while a 3DM-GV7-AHRS IMU from Microstrain operates at 100 Hz to provide inertial measurements including onboard AHRS orientation measurement. Both systems share the same radar and IMU sensor models.

Figure 7.

SNU and RAI sensor system deployment. Both systems include the same TI-mmWave radar, MicroStrain IMU, and Boston Dynamics Spot quadrupedal robot, but are attached with different extrinsics.

Table 1.

The sensor specifications.

Sensor	Manufacture	Model	Topic name	Frequency	Description
Legged robot	Boston Dynamics	Spot	/joint_states /spot/status/feet	150Hz 150 Hz	Sensor measurements
Radar	Texas Instruments	IWR1843BOOST	/ti_mmwave/radar_scan_pcl_0	20 Hz
IMU	MicroStrain	3DM-GV7-AHRS	/imu	100 Hz
LiDAR	Ouster	OS1-32	/ouster/points	10 Hz	Ground-truth reference
Laser scanner	Leica	RTC360	—	—

6.1.1. SNU system

The SNU system, used for experiments at Seoul National University, collects data in diverse conditions, including both indoor and outdoor environments. To obtain a baseline trajectory, an OS1-32 LiDAR (maximum range 150 m) is mounted, and a TLS-generated map is used as ground-truth reference (see Section §6.4). Data acquisition and logging are performed onboard using the Spot CORE, equipped with an Intel 8th Gen i5 processor and 16 GB of DDR4 RAM.

6.1.2. RAI system

The RAI system, used for experiments at the Robotics and AI Institute, operates entirely within an indoor motion capture environment, providing a controlled experimental setup. While sharing the radar and IMU configurations of the SNU system, the LiDAR is omitted, and ground truth is obtained from the motion capture system. Data is acquired and logged onboard with an NVIDIA Jetson AGX Orin, including a 12-core Arm Cortex-v8.2 CPU and 64 GB of LPDDR5 RAM.

6.2. Details of each sequence

An overview of the 12 sequences is given in Table 2, and their environments are illustrated in Figure 8; details of each sequence are as follows:

• Atrium: A large flat floor indoor atrium with floor-to-ceiling glass and a high ceiling, where the expansive open floor and long glass facades induce strong specular reflections, multipath, and low parallax on radar sensor.

• BridgeLoop: An indoor sequence with three repetitions that traverses pedestrian bridges and short stair segments around an open atrium, stressing robustness in wide, low-parallax spaces.

• CorriLoop: A narrow, rectangular indoor corridor traversed twice; long straight segments with repeating doors/walls and a glossy floor create perceptual aliasing, while four sharp 90° turns stress turn handling and loop consistency.

• BiCorridor: A two-level corridor running in the same building as CorriLoop. One loop on the first floor, then a stair ascent, and a second loop on the adjacent floor in the reverse direction. The narrow rectangular hallways contain long, low-parallax segments with repeating doors/walls that induce geometric degeneracy and perceptual aliasing, while the stairs introduce vertical motion and contact disturbances. This sequence stresses robustness to aliasing, direction reversal, and floor-to-floor consistency in constrained indoor spaces.

• Downstair: A multi-floor indoor sequence that starts on the second level with a long rectangular loop, descends one flight to traverse an extended straight corridor, then descends again to finish with a smaller rectangular loop. Wide, glossy hallways and two prolonged downward staircases stress perception during extended descent, floor-to-floor transitions, and low-parallax segments.

• Upstair: An indoor ascent sequence in the same building as BridgeLoop, climbing three floors by alternately traversing pedestrian bridges and short stair flights. The sequence includes upward motion—yaw changes occur between the bridge and stair segments—stressing vertical translation, stair negotiation, and transitions across landings in a wide, low-parallax space.

• SlopeStair: A mixed indoor–outdoor traverse with a long downhill ramp, multiple upstair flights, and doorway/corridor transitions, stressing robustness to large elevation changes, lighting shifts, and abrupt structural/surface variations.

• Overpass: This sequence takes place on outdoor stairs and a pedestrian overpass, with lamps and reflective paving. It stresses robustness to open-air transitions and repeated elevation changes across steps, ramps, and long sidewalk segments.

• Tunnel: This sequence traverses a long, semi-open tunnel lined with glass façades and repetitive concrete pillars. The path includes gentle ramps and a tight U-turn, stressing robustness under feature-degenerate geometry and illumination changes.

• Quad: This sequence traverses an outdoor campus quad with broad paved plazas, tiled walkways, curbs, and stairs. It stresses robustness to feature-sparse open areas and repetitive textures while handling stair climbing, ramps, and outdoor slopes.

• MoCap-E: This sequence has two loops in the indoor Motion Capture (MoCap) room, each including a short stair–slope vertical motion zone and a cushion zone with two soft cushions that disturb leg odometry. In the second loop, a folded box at the slippery zone is deliberately dragged, breaking the assumption of a stationary floor and causing a marked discrepancy between leg kinematics and other sensors.

• MoCap-H: This sequence shares the same layout as MoCap-E but is run at a higher speed. In the slippery zone, the robot even moves forward while the floor shifts backward due to dragging, further violating the stationary floor assumption and amplifying the leg kinematics disagreement.

Table 2.

The description for each sequence.

Sequence	Path length (m)	Elevation change (m)	Duration (s)	Outdoor	Indoor	Stair	Slope	# of loop
Atrium	109.93	—	124.50	✗	✓	✗	✗	1
BridgeLoop	161.17	1.72	187.20	✗	✓	✓	✓	3
CorriLoop	208.68	—	229.40	✗	✓	✗	✗	2
BiCorridor	240.82	4.72	277.29	✗	✓	✓	✗	1
Downstair	233.75	8.81	270.90	✗	✓	✓	✗	2
Upstair	197.22	9.37	227.89	✗	✓	✓	✓	1
SlopeStair	273.37	10.06	307.49	✓	✓	✓	✓	1
Overpass	169.17	7.23	213.49	✓	✗	✓	✗	1
Tunnel	247.94	—	277.00	✓	✓	✗	✗	1
Quad	447.83	10.72	503.69	✓	✗	✓	✓	1
MoCap-E	44.91	0.57	139.10	✗	✓	✓	✓	2
MoCap-H	42.48	0.60	79.46	✗	✓	✓	✓	2

Figure 8.

Environmental examples of the acquired sequences. Diverse environments are included in each sequence to consider various situations that the quadrupedal robot may encounter in a real-world mission. (a) Upstair; (b) Quad; (c) Overpass; (d) Tunnel; (e) MoCap.

6.3. Extrinsic calibration of sensor systems

Extrinsic calibration between Spot, radar, and IMU is required for accurate odometry estimation. On both sensor systems, we leveraged the CAD model of each system to acquire the exact extrinsic parameters between the sensors. As included in Figure 7, the radar attached to the RAI system is slightly biased to the right side of the robot, and the IMU is attached perpendicularly compared with the SNU system. For more detailed information about the extrinsic calibration parameter, please refer to the project homepage.

6.4. Ground truth trajectory generation

6.4.1. SNU sequences

Accurate 6-DoF ground truth poses are essential for evaluating various robotic tasks, including state estimation and SLAM. Unlike previous studies, our deployments span both indoor and outdoor environments over several hundred meters, requiring millimeter-level precision. These stringent requirements render traditional ground truth sources, such as LiDAR-IMU-based references (Jung et al., 2024) and RTK-GNSS systems (Barnes et al., 2020; Geiger et al., 2013; Kim et al., 2025a), unsuitable. While MoCap systems can provide high-frequency and high-precision pose estimates, their limited workspace makes them impractical for large-scale deployments (Doer and Trommer, 2021). Some prior works (Tranzatto et al., 2022b) utilize survey-grade maps as ground truth references by performing scan-to-map matching of deskewed LiDAR points using synchronized inertial and ranging sensors. However, for legged robots, continuous dynamic motion during data acquisition significantly degrades LiDAR deskewing and motion estimation accuracy.

To address this, we adopt the approach proposed in Hu et al. (2024), which combines FAST-LIO2 (Xu et al., 2022) odometry and loop closure factors with a degeneration-aware map factor derived from dense prior maps. As illustrated in Figure 9, prior maps are collected using a Leica RTC360. This graph-based formulation enables accurate pose estimation even in degenerate and stationary conditions, thereby providing reliable ground truth for our evaluation.

Figure 9.

Examples of ground truth TLS map on SNU sequences. Leveraged for generating ground truth trajectory. (a) Indoor Sequences. Left: BiCorridor, Right: Upstair; (b) Outdoor Sequences. Left: Quad, Right: Overpass.

Because this ground truth framework leverages FAST-LIO2 (Xu et al., 2022), a tightly coupled LiDAR-IMU SLAM as an odometry front-end, precise extrinsic calibration between those two sensors $T_{I}^{L}$ is essential. We perform this extrinsic calibration using the robust method of Zhu et al. (2022), which applies diverse rotational motions across all three axes to accumulate sufficient motion data, allowing rapid and accurate estimation without initial parameter guesses. The method directly computes spatial and temporal offsets from unsynchronized data, yielding high-precision LiDAR-IMU extrinsics required for our ground-truth estimation.

6.4.2 RAI sequences

For MoCap sequences acquired using the RAI sensor system, we utilized the MoCap system to obtain a highly precise ground truth trajectory. As these sequences were collected exclusively in a controlled indoor environment, the MoCap odometry could be reliably used as reference. The experiment was conducted in a 13.5 m × 5 m × 3 m motion capture room equipped with 20 Vicon Valkyrie VK 16 cameras, ensuring complete coverage of the space. The system streamed motion capture data at 120 Hz using Vicon Tracker 4.3 software.

7. Experiment results

In this section, we evaluate the performance of GaRLILEO against SOTA odometry algorithms that utilize SoC radar, IMU, and leg kinematics. The experiments are conducted on a self-collected real-world dataset.

Odometry accuracy is assessed using the root mean square error (RMSE) of absolute pose error (APE) and relative pose error (RPE), each decomposed into translational and rotational components. The units are as follows: APE_t (m), APE_r (°), RPE_t (m/m), and RPE_r (°/m). To specifically evaluate vertical drift in odometry, we report the z-axis APE (APE_z). All evaluations are performed using the Evo Trajectory Evaluator (Grupp, 2017), a widely adopted open-source toolkit for odometry benchmarking in robotics.

The comparative analysis is organized by the baselines’ sensor configurations, while GaRLILEO is evaluated in its full configuration to report system-level performance, that is, the benefit of fusing leg kinematics, radar, and IMU. First, we compare GaRLILEO against recent SoC radar-IMU odometry methods, including Co-RaL (Jung et al., 2024), which additionally integrates the leg kinematics velocity factor. Next, we also evaluate GaRLILEO against open-source leg kinematics-IMU fusion odometry methods. Every parameter is adopted from the official implementation, except that the robot-specific parameters of legged robots are modified based on the official Unified Robot Description Format (URDF) file of Boston Dynamics SPOT. Detailed descriptions of the baseline algorithms and comprehensive evaluation results are provided in the following subsections.

After comparing GaRLILEO directly with the baselines, we conducted detailed ablation studies on the modules of GaRLILEO. Specifically, we analyzed the complementary effect between radar and leg kinematics, the contribution of gravity factors to both local gravity vector estimation and odometry, and the impact of the velocity bias term.

7.1. Radar-IMU odometry comparison

In this subsection, we compare the performance of GaRLILEO with that of five recent SoC radar-inertial odometry (RIO) methods. The baseline methods are listed as follows:

• River (Chen et al., 2024): A B-spline-based continuous velocity estimator that fuses SoC radar and IMU, employing dead reckoning to compute full odometry.

• Co-RaL (Jung et al., 2024): A cooperative odometry algorithm integrating SoC radar, IMU, and leg kinematics velocity, designed to operate robustly across diverse environments.

• EKF-RIO (Doer and Trommer, 2020): An EKF-based odometry method that fuses SoC radar and IMU data.

• DeRO (Do et al., 2024): A dead reckoning based RIO method that combines SoC radar ego-velocity and gyroscope data using an IEKF, with accelerometer-based tilt angle estimation.

We evaluate the performance of GaRLILEO and baseline methods, with a focus on odometry accuracy in diverse environments. Table 3 presents quantitative results for APE and RPE metrics. Below, we elaborate on the results for each sequence. Some sample trajectories and qualitative evaluation are illustrated in Figure 10. For both quantitative and qualitative analysis, GaRLILEO achieves superior odometry accuracy across most metrics, particularly excelling in vertical accuracy (APE_z).

Table 3.

Evaluation on radar-based methods. Bold numbers indicate the smallest error, and underlined numbers denote the second smallest error within each metric.

Figure 10.

Radar–Inertial baseline odometry results on the (a) BridgeLoop and (b) SlopeStair sequences. Red dotted lines in (a, bottom) indicate the starting points of each loop in BridgeLoop sequence, while the faint yellow region in (b, top) denotes the outdoor segment of the SlopeStair sequence. The red zoomed-in views highlight that GaRLILEO and Co-RaL—both integrating SoC radar and leg kinematics—converge more closely to the ground-truth final position. The colored numbers on the right-hand side of the bottom plots show relative start-to-end vertical drift with respect to the full path length, where GaRLILEO achieves substantially lower drift. The cyan zoomed view in (a) illustrates that GaRLILEO (blue) follows the most consistent turning trajectory across repeated loops. In (b), the red circles indicate the failure points of baseline odometries during outdoor-to-indoor transitions, whereas radar–leg fused odometries, GaRLILEO, and Co-RaL maintain robust estimation. Both subfigures qualitatively illustrate the odometry accuracy of GaRLILEO, particularly in suppressing vertical drift.

7.1.1. Atrium

We first evaluate methods on the Atrium sequence, a flat environment with a single loop, minimal vertical motion, where contact-induced drift is negligible. River and DeRO, which rely on dead reckoning, show large APE_t and APE_z due to insufficient drift mitigation from the high vibration of the legged robot UGV. EKF-RIO achieves improved accuracy on most metrics by tightly fusing SoC radar–derived ego-velocity with IMU measurements via an EKF-based approach. However, it exhibits a higher APE_r than DeRO. Co-RaL, which integrates the leg kinematics velocity preintegration factor, achieves the second-lowest APE_z among the methods. GaRLILEO, leveraging accurate local gravity estimation, achieves an APE_z only about 5% of Co-RaL’s while also notably reducing errors in every other metric, making a big gap with baselines.

7.1.2. BridgeLoop

We next assess the BridgeLoop, where the robot completes triple loops in a consistent rotational direction while going over gentle slope bridges and short stairs multiple times. In BridgeLoop, noise in roll and pitch estimation from contact uncertainty leads to substantial vertical drift, as reflected in the elevated APE_z of River and DeRO. EKF-RIO suffers from significantly high APE_r due to the limited orientation observability of IMU-only fusion, which becomes more severe in multi-loop trajectories. In contrast, Co-RaL achieves the second-lowest APE_r via the 4-DoF radar factor, enhancing both translational and rotational accuracy. GaRLILEO provides the most robust pose estimates, presenting the lowest errors in every metric except RPE_r, while RPE_r is at a similar level to the second-best baseline, Co-RaL. Notably, the vertical drift APE_z is almost identical to the flat sequence Atrium, expressing the vertical robustness of GaRLILEO on multiple turns.

7.1.3. CorriLoop

The CorriLoop sequence, consisting of two loops on flat indoor terrain, shows similar trends. While River and EKF-RIO suffer from significant rotational errors, GaRLILEO effectively suppresses them through accurate roll and pitch estimation. Although Co-RaL achieves comparable APE_r due to the radar factor, their APE_z remains about 35× higher than GaRLILEO’s, reflecting limited roll and pitch observability.

7.1.4. BiCorridor

The BiCorridor sequence combines a loop with stair ascents, traversing two floors with opposite rotational directions. Despite environmental similarities to CorriLoop, the sharp rotations and stair climbs substantially degrade odometry accuracy, particularly along the vertical axis. This results in high APE_t and APE_z for simple dead reckoning methods, such as River and DeRO. Interestingly, EKF-RIO achieves lower APE_t than Co-RaL, but suffers from much higher APE_r. In contrast, GaRLILEO effectively addresses these challenges by continuously fusing leg kinematics, IMU, and SoC radar, maintaining sub-meter APE_z.

7.1.5. Downstair

In the Downstair sequence, only GaRLILEO and Co-RaL converge successfully, while River, EKF-RIO, and DeRO return erroneous odometry results due to noisy IMU measurements caused by rapid and repetitive impacts during stair descent. Among the convergent methods, GaRLILEO and Co-RaL achieve the lowest APE_t, demonstrating stable and precise estimation. With its local gravity factor, GaRLILEO achieves sub-meter APE_z, which is lower than 10% of the second-best result from Co-RaL, significantly outperforming all prior radar-IMU methods.

7.1.6. Upstair

The Upstair sequence, which involves multiple stair ascents and a sloped bridge in an open space, exhibits similar trends. Methods without leg kinematics input, such as River and EKF-RIO, suffer from pronounced vertical drift due to the lack of leg kinematics constraints. Because this sequence consists of the same directional looped trajectories with stair climbing in each loop, EKF-RIO behaves similarly to the BridgeLoop and CorriLoop sequences. The second-best result, Co-RaL, shows vertical errors nearly 20× larger than GaRLILEO, underscoring the effectiveness of our continuous-time local gravity estimation for precise roll and pitch correction.

7.1.7. SlopeStair

As the system moves from indoors to outdoors and back in during this sequence, severe drift at the corner before re-entry causes EKF-RIO and DeRO to diverge. This highlights the limitations of SoC radar-IMU-only systems during indoor/outdoor transitions. Although River converges, their vertical errors (APE_z) remain significantly higher than those of methods incorporating leg kinematics, highlighting the importance of kinematics sensing for stable and accurate odometry. Co-RaL, fusing leg kinematics with radar-IMU data, achieves the second-best performance across most metrics, demonstrating the benefits of multi-modal integration. Finally, GaRLILEO delivers the most accurate odometry overall, achieving sub-meter-level vertical accuracy.

7.1.8. Overpass

In the Overpass sequence, staircases appear before and after the overpass; when the robot is on the stairs, fewer radar returns are observed because the sensor is angled skyward. Similar to the SlopeStair sequence, EKF-RIO and DeRO fail to converge, underscoring their limitations in handling sharp turns in open environments. River achieves reduced vertical drift (APE_z) compared to the SlopeStair sequence. This improvement, however, is largely attributed to the shorter trajectory and smaller elevation change of the Overpass sequence itself. Co-RaL demonstrates slightly better accuracy than River due to the additional fusion of leg kinematics, though the improvement is marginal. In contrast, GaRLILEO robustly manages outdoor contact-induced drift, reducing vertical error to lower than 15% and achieving substantial improvement in overall APE compared to all baselines.

7.1.9. Tunnel

In the Tunnel sequence, which features a long passage with a sharp U-turn, GaRLILEO maintains sub-meter APEz and achieves the lowest error on every metric. Co-RaL, benefiting from radar–leg kinematics fusion, ranks second. River and DeRO exhibit severe vertical drift due to contact impacts and limited observability. EKF-RIO shows less drift than these methods, but still larger vertical errors than leg kinematics-fused methods.

7.1.10. Quad

The Quad sequence, the longest and most challenging dataset, includes stairs, slopes, and substantial elevation changes. Frequent contact drift, multiple dynamic objects, outdoor stairs and slopes, and a reduced number of radar points make accurate pose estimation particularly difficult. River, EKF-RIO, and DeRO fail to converge due to sparse radar returns. Due to its factor graph framework that adaptively relies on the more reliable sensor modality, Co-RaL succeeds in converging and even outperforms GaRLILEO on RPEr. Nevertheless, GaRLILEO achieves a notably lower error on all other metrics through precise local gravity estimation, while exhibiting slightly higher but competitive RPEr compared to Co-RaL. These results highlight that GaRLILEO consistently provides robust and accurate pose estimation across diverse environments, owing to its local gravity model.

7.1.11. MoCap-E

Both MoCap sequences include two challenging test scenarios: (i) a slippery zone with a backward-moving floor in the second loop and (ii) a cushion zone included in both loops. These environments are designed to degrade leg odometry. In the slippery zone, as the contact frame continuously moves, leg kinematics produce erroneous horizontal velocity estimates. In the cushion zone, where the contact frame moves downward while the contact sensor remains active, leg kinematics yield incorrect vertical velocity estimates.

In MoCap-E, Co-RaL fails to converge due to discrepancies between leg kinematics and radar. EKF-RIO and DeRO achieve similar APE_t and APE_z, though DeRO exhibits higher APE_r and RPE_r since it relies solely on IMU-based orientation estimation. River achieves the second-best performance, showing its potential in controlled indoor environments. Still, GaRLILEO delivers the most accurate odometry, effectively handling leg kinematics failures in specific zones and enhancing overall odometry estimation, especially in the vertical direction, by precisely estimating the gravity vector. This robustness stems from its B-spline-based continuous odometry scheme, which ensures stable trajectory estimation, even in the presence of discrepancies between sensor modalities.

7.1.12. MoCap-H

In MoCap-H, the results follow a similar trend. Because the slippery zone that caused Co-RaL to diverge in MoCap-E is shorter, Co-RaL successfully converges over the whole trajectory. However, due to persistent velocity discrepancies between radar and leg kinematics, Co-RaL, which integrates both modalities, performs slightly worse than EKF-RIO and River in terms of APE_t and APE_z. Despite such discrepancies, GaRLILEO again produces the most accurate odometry by leveraging both sensor modalities while robustly handling modality failures that impair cooperative estimation.

7.2. Leg kinematics odometry comparison

In this subsection, we compare GaRLILEO against four proprioceptive odometry methods that fuse IMU and leg kinematics sensors, including leg joint encoders and contact sensors. The baseline methods are described as follows:

• Pronto (Camurri et al., 2020): A proprioceptive-only version of Pronto, enabling real-time odometry estimation at high frequency, compatible with control loop.

• MUSE (Nisticò et al., 2025): A proprioceptive-only version of MUSE, a recent leg kinematics-fused odometry that leverages a foot-slip detection algorithm for enhanced robustness.

• Drift (Lin et al., 2023): An invariant EKF-based leg odometry algorithm that fuses contact estimation and gyroscope filtering, designed for low-cost legged robots.

• Holistic (Nubert et al., 2025): A proprioceptive-only version of Holistic Fusion, leveraging foot contact points as landmark measurements. Our leg kinematics velocity estimation result is attached for the front-end.

This subsection evaluates the odometry accuracy of proprioceptive methods, with a focus on their performance in diverse environments. Table 4 presents quantitative results for APE and RPE metrics, while Figure 11 shows qualitative results for two sequences. GaRLILEO consistently achieves the most accurate odometry estimates across most sequences, primarily due to its integration of SoC radar-derived ego-velocity, which effectively mitigates the challenges posed by leg contact drift.

7.2.1. Atrium

Pronto, Drift, and Holistic achieve similar APE_t, while Pronto, through weighted averaging of leg kinematics velocities, attains the most accurate APE_z, demonstrating the reliability of leg kinematics in this environment. MUSE, which incorporates a slip detection algorithm, achieves a comparable APE_z to Pronto and even lower APE_t, indicating the presence of slip or drift in the contact frame even in low-dynamic indoor settings. GaRLILEO achieves the most accurate overall odometry, presenting the lowest error on every metric.

7.2.2. BridgeLoop and CorriLoop

The BridgeLoop and CorriLoop sequences emphasize the importance of accurate roll and pitch estimation, particularly in mitigating vertical drift. As shown in Table 4, MUSE is more robust than Pronto in terms of APE_t. This highlights the advantage of a physical contact sensor that remains reliable indoors, including stairways and slopes. Drift and Holistic yield less accurate odometry, as they either assume a constant contact frame to correct IMU drift or treat contact frames as landmark features. In contrast, GaRLILEO robustly mitigates contact slip effects through radar velocity estimation and a velocity bias term, achieving the most accurate odometry overall, particularly in the vertical direction, owing to precise roll and pitch observation and an accurate gravity estimation scheme.

7.2.3. BiCorridor

In the BiCorridor sequence, although the environment is similar to CorriLoop, the results differ due to the inclusion of an upstairs section with a narrow turn. Pronto and MUSE produce relatively inaccurate odometry, as indicated by large APE_r from diverging rotational estimates. A similar trend is observed in Holistic, although its errors are more biased toward the vertical direction. Interestingly, Drift achieves lower APE_r than the other baselines due to its gyro filter scheme, but still suffers from substantial vertical drift. In contrast, GaRLILEO delivers the lowest error except RPE_r, owing to precise gravity estimation that effectively suppresses divergence, particularly in the vertical axis.

7.2.4. Downstair and Upstair

The Downstair and Upstair sequences enable a comparison between heavy stair descent and ascent. Among the leg kinematics-only baselines, Drift exhibits the largest APE_t and APE_z, but achieves relatively low APE_r. Its gyro filter effectively mitigates horizontal errors from contact impacts on stairs; however, its vulnerability in roll and pitch estimation leads to substantial vertical divergence. Due to the repetitive rotation during stair ascent, Pronto and MUSE show much higher APE_r in Upstair compared with Downstair, as they rely on IMU-based rotational velocity for orientation estimation. All comparison methods record higher APE_t in the downstairs sequence, indicating that stair descent induces stronger contact drift. Despite these challenges, GaRLILEO achieves accurate vertical pose estimation with APE_z around 50 cm, delivering the most accurate and robust results through radar-based velocity estimation, particularly in stair environments.

7.2.5. SlopeStair Sequence

In the SlopeStair sequence, leg odometry performance degrades severely on staircases and outdoor-slope segments, especially just before re-entry indoors, at which point Pronto fails to converge. Drift and Holistic succeed in converging by relying on the contact sensor, but their trajectories diverge significantly. Interestingly, MUSE yields even worse APE_t, yet maintains sub-meter APE_z. This reflects the effectiveness of MUSE’s slip detection module in handling vertical drift caused by contact slips, while horizontal drift—especially in yaw—remains difficult to suppress. In contrast, consistent with the radar-IMU experiments, GaRLILEO achieves the lowest error on every metrics in this sequence.

7.2.6.

In the Overpass sequence, the most severe contact slips in leg odometry occur on the stair sections positioned before and after the overpass. Drift achieves competitive APE_r, but suffers from large vertical drift, resulting in the highest APE_t. MUSE and Holistic provide similar accuracy, particularly in APE_t and APE_z. Pronto, although excluding its contact estimator, produces comparably robust odometry, especially in APE_r and APE_z. In contrast, GaRLILEO achieves the lowest APE_t and APE_z, demonstrating robustness to contact impacts during stair ascent and descent through its local gravity–based roll and pitch estimation.

7.2.7. Tunnel

In the Tunnel sequence, a single sharp U-turn largely determines the overall APE level, as it is the only rotational motion in the entire sequence. Drift achieves competitive horizontal rotation accuracy, recording the second-lowest APE_r. However, it fails to maintain robust horizontal orientation, resulting in the highest APE_z. Holistic yields the highest APE_r, and this single divergence in orientation estimation also leads to the largest APE_t, despite a moderate APE_z. Pronto and MUSE achieve similar APE_z, incorporating joint-encoder-based contact estimation and slip detection, respectively. In contrast, GaRLILEO delivers the most robust odometry across all metrics, owing to precise roll and pitch estimation before and after the sharp U-turn. This is evident in its lowest APE_z, which is approximately 13% of the second-best method.

7.2.8.

The Quad sequence involves substantial contact drift, as it covers the longest path among all datasets and includes outdoor stairs and a long slope. Holistic shows low vertical APE (APE_z); however, orientation divergence causes overall odometry failure, as indicated by the highest APE_t and APE_r. Drift produces more precise orientation estimates thanks to its gyro filtering scheme, yet still suffers from large APE_t due to significant vertical drift. MUSE fails to converge over the full trajectory, unable to handle divergence in orientation estimation. Interestingly, Pronto achieves the best performance in both APE_t and APE_r, demonstrating the potential of joint-encoder-based contact estimation, particularly in long outdoor environments. Even on this long outdoor sequence, GaRLILEO delivers the most accurate vertical estimation with its local gravity model.

7.2.9. MoCap-E and MoCap-H

In the MoCap sequences, discrepancies in leg odometry arise in both horizontal and vertical directions, as also observed in the radar-based experiments. Drift records the highest APE_t and APE_z, caused by vertical divergence in the cushion zone and horizontal errors in the sliding zone. Notably, all leg kinematics–IMU baselines exhibit limited accuracy in the sliding zone, since they lack a sensor modality capable of capturing the robot’s state in a dynamic floor. Compared with Holistic, both Pronto and MUSE achieve lower APE_t, showing greater robustness to contact failure through joint encoder–based contact estimation or slip detection. Among these, MUSE achieves the lowest APE_t and APE_z, as its slip detection module effectively mitigates contact failure, leading to more robust odometry estimation. Across the two MoCap sequences, GaRLILEO is most accurate, thanks to radar-derived ego-velocity fed into the velocity spline, especially in slippery sections where leg-kinematics-based estimation fails.

Table 4.

Evaluation on leg kinematic based methods.

Figure 11.

Leg-kinematics baseline odometry results on the (a) CorriLoop and (b) Upstair sequences. Red dotted lines in (a, bottom) indicate the starting points of each loop in CorriLoop sequence. Both red zoomed-in views in (a) and (b) highlight that GaRLILEO integrating SoC radar and leg kinematics converge more closely to the ground-truth final position. The colored numbers on the right-hand side of the bottom plots show relative end-to-end vertical drift with respect to the full path length, where GaRLILEO achieves substantially lower drift compared with other baselines. Cyan zoomed view in (a) illustrates that GaRLILEO (blue) follows the most consistent turning trajectory across repeated loops. Similarly, in (b), the cyan zoomed view indicates that GaRLILEO (blue) follows the most accurate corner odometry during the upstairs with repeated turns. Both subfigures qualitatively show the odometry accuracy of GaRLILEO, particularly in terms of vertical drift suppression.

7.3. Complementary effects of radar and leg kinematics

To analyze the complementary roles of radar and leg kinematics, we compare three GaRLILEO configurations: the full system R+L+I, a radar-IMU-only variant R+I in which the leg kinematics factor r_L is disabled, and a leg-IMU-only variant L+I in which the radar factor r_R is disabled. We additionally report EKF-RIO and MUSE for the SNU sequences, and River and MUSE for the RAI sequences, as they are the best-performing representative radar-based and leg-based comparisons of each sensor configuration.

7.3.1. SNU Sequences

Table 5 summarizes the results on the SNU sequences. Overall, the sensor-subset variants R+I and L+I exhibit complementary strengths. When radar returns are sufficiently informative, as is often the case in indoor environments, R+I can provide competitive horizontal constraints in terms of APE_xy on several sequences, whereas L+I substantially improves vertical robustness in terms of APE_z by leveraging high-rate proprioceptive velocity information. By fusing both modalities, R+L+I attains the best accuracy on most sequences, reflecting cooperative fusion that emphasizes the more reliable modality when the other becomes less informative.

Regarding the modality-specific comparisons, L+I is consistently competitive with, and often outperforms MUSE on SNU sequences, suggesting that our continuous-time formulation and factor design provide strong proprioceptive odometry. In practice, the radar input is available at 20 Hz, and using radar alone is not sufficient to fully realize the benefits of our proposed modules; as a result, R+I tends to perform at a similar level to dedicated radar-IMU pipeline, EKF-RIO on several sequences.

7.3.2. RAI Sequences

To further evaluate complementarity under conditions where leg-kinematics-based odometry degrades, we conduct additional analysis on the MoCap-E and MoCap-H sequences, which include intentional perturbations in leg odometry, as visualized in Figure 12. Quantitative results are reported in Table 6.

Table 5.

Complementary effects of radar and leg kinematics on SNU sequences. L+I refers to the leg kinematics-only and R+I refers to the radar-only version of the full GaRLILEO (R+L+I).

On MoCap-E sequence, where the robot stays stop on the slippery zone during the floor is moving to backward, significant difference between radar-aided methods (i.e., R+L+I, R+I, and River) presents comparably more accurate APE_xy then leg kinematic only methods (i.e., L+I and MUSE), proving the radar sensor is working cooperatively with leg kinematic sensors when the stationary plane assumption breaks. In APE_z, River presents lower error compared with MUSE, as the cushion zone ruins the leg kinematic estimation. Similarly, as can be found from Figure 13(a), only R+I keeps stable odometry during the cushion zone, while leg kinematic aided ones goes slightly upper direction. However, due to the local gravity spline which is more benefited by the high-frequency ego-centric-velocity measurement of leg kinematics, APE_z of R+I is higher than other versions of GaRLILEO. Still, the combination of both sensors, full GaRLILEO, presents the most accurate odometry estimation. It can be concluded that radar and leg kinematics can work in complementary ways to estimate the robot’s odometry in a slippery environment accurately.

Figure 12.

v_x-time and v_z-time Velocity Error (difference from ground truth) in the MoCap-E and MoCap-H sequences. A dotted square on (a) and (b) includes a slippery zone where the floor moves backwards, while two dotted squares on (c) and (d) include cushion zones where the leg kinematics velocity presents a high impact on the vertical direction.

Figure 13.

xy plot and z-time plot of MoCap sequences. (a) and (b) are xy and z-time odometry plot of MoCap-E and MoCap-H sequences, respectively. Blue dotted box: zoomed view of slippery zone where leg odometry fails temporarily, especially on the xy plane. The yellow-highlighted area is the exact slippery zone. Black dotted box: zoomed side view of cushion zone where leg odometry fails temporarily, especially on the vertical direction. Comparing the radar-only, leg kinematics-only, and full module versions of GaRLILEO to check the cooperative effect of radar and leg sensors in the slippery zone to check the cooperative work between radar and leg kinematics.

Table 6.

Complementary effects of radar and leg kinematic on RAI sequences. L+I refers to the leg kinematics-only and R+I refers to the radar-only version of the full GaRLILEO (R+L+I).

On MoCap-H sequence, though the existence of cushion zone and slippery zone is similar, details like maintaining walking on slippery zone and higher overall robot speed are different. As can be found from Figure 13(b), L+I fails at a similar point due to the broken stationary floor assumption in the slippery zone. Because of this, L+I results in higher APE_xy compared with other GaRLILEO variants. On vertical drift, the tendency of APE_z is similar to MoCap-E, but the upper direction bias of leg kinematics in the cushion zone is notably mild because of the locomotion speed difference. Due to the higher speed of the robot, the effect of the cushion less emerges and leads to a highly accurate APE_z of L+I. In conclusion, exploiting the biased accuracy of R+I and L+I on APE_xy and APE_z each, R+L+I presents most accurate APE_t overall, while lying between R+I and L+I on other APE metrics. This tendency to rely more on relatively more accurate sensor modalities can be inferred as the cooperative effect of GaRLILEO.

7.4. Effect of gravity factors on state estimation accuracy

In this subsection, we evaluate the effect of our gravity factors $r_{S^{2}}, r_{post}$ on two different sub-experiments: (1) effect of the soft $S^{2}$ -constrained gravity factor, and (2) effect of the post-optimization.

7.4.1. Effect of the Soft $S^{2}$ -Constrained Gravity Factor

In the first sub-experiment, we aimed to highlight the effectiveness of the soft $S^{2}$ -constrained gravity factor by comparing the accuracy of the estimated local gravity vector between the full model of GaRLILEO: Ours and the model without the factor: w/o $r_{S^{2}}$ . Ground-truth local gravity $g_{t}^{*}$ was obtained by (i) rigidly aligning the ground truth pose to the estimated trajectory with Evo Trajectory Evaluator (Grupp, 2017), (ii) B-spline interpolation of the aligned poses at the estimating timestamps, and (iii) rotating the global gravity vector into the local frame using aligned ground truth orientation as follows:

g_{t}^{*} = R_{t}^{⊤} g_{0}, where g_{0} = [\begin{matrix} 0 \\ 0 \\ - 9.81 \end{matrix}] .

(35)

To evaluate roll and pitch observation accuracy, we use the mean angular error of the estimated local gravity vectors over each sequence.

As detailed in Table 7, the ablation without the soft

S^{2}

gravity factor (w/o

r_{S^{2}}

) estimates local gravity with errors exceeding twice those of Ours. This behavior is closely related to gravity-norm preservation during continuous-time estimation. Although we clamp the gravity control points to have magnitude 9.81 m/s², gravity is parameterized with an

R^{3}

B-spline rather than an

S^{2}

spline; hence, the interpolated B-spline segment between adjacent control points is not inherently norm-preserving. As a result, without the soft

S^{2}

factor, ‖g(t)‖ may exhibit local norm drops between control points, which degrade the reliability of local gravity direction estimation, especially under strong contact-induced vibrations.

Table 7.

Effect of $r_{S^{2}}$ factor on estimating local gravity.

To make this explicit, we conducted additional experiments to measure and compare the gravity-norm behavior with and without the soft

S^{2}

factor

r_{S^{2}}

. Specifically, we compute the minimum, maximum, mean, and standard deviation of the gravity vector’s magnitude ‖g(t)‖ over each sequence, which are summarized in Table 8. The minimum captures the worst-case local norm dip, while the standard deviation reflects the overall fluctuation. Additionally, we plot the gravity-norm behavior over runtime on seq. SlopeStair in Figure 14.

Table 8.

Gravity magnitude statistics (m/s²) without and with the $r_{S^{2}}$ factor.

Sequence	w/o $r_{S^{2}}$				w/ $r_{S^{2}}$
Sequence	Min	Max	Mean	Std.	Min	Max	Mean	Std.
Atrium	9.7250	9.8100	9.8046	6.43 × 10⁻³	9.8088	9.8100	9.8099	7.53 × 10⁻⁵
BridgeLoop	9.6489	9.8100	9.8004	1.11 × 10⁻²	9.8069	9.8100	9.8099	1.90 × 10⁻⁴
CorriLoop	9.6888	9.8100	9.7960	1.43 × 10⁻²	9.8081	9.8100	9.8099	8.87 × 10⁻⁵
BiCorridor	9.7335	9.8100	9.7974	1.08 × 10⁻²	9.8070	9.8100	9.8099	1.41 × 10⁻⁴
Downstair	9.7123	9.8100	9.8044	9.14 × 10⁻³	9.8047	9.8100	9.8099	2.75 × 10⁻⁴
Upstair	9.6334	9.8100	9.7866	2.27 × 10⁻²	9.8026	9.8100	9.8099	2.18 × 10⁻⁴
SlopeStair	9.6722	9.8100	9.8013	1.03 × 10⁻²	9.8071	9.8100	9.8099	2.08 × 10⁻⁴
Overpass	9.7189	9.8100	9.8036	8.90 × 10⁻³	9.8060	9.8100	9.8099	2.57 × 10⁻⁴
Tunnel	9.6534	9.8100	9.8015	1.04 × 10⁻²	9.8073	9.8100	9.8099	1.10 × 10⁻⁴
Quad	9.7016	9.8100	9.8053	8.17 × 10⁻³	9.8076	9.8100	9.8099	1.34 × 10⁻⁴
MoCap-E	9.7940	9.8100	9.8087	1.74 × 10⁻³	9.8085	9.8100	9.8099	1.30 × 10⁻⁴
MoCap-H	9.6579	9.8100	9.7988	1.44 × 10⁻²	9.7959	9.8100	9.8098	4.44 × 10⁻⁴

Figure 14.

Gravity-norm behavior over time in SlopeStair sequence (w/vs. w/o $r_{S^{2}}$ ). Enabling equation (20) suppresses local norm dips of ‖g(t)‖ between adjacent control points, keeping ‖g(t)‖ much closer to 9.81 m/s².

As shown in Figure 14, soft $S^{2}$ factor $r_{S^{2}}$ suppresses local norm dips between adjacent control points. Table 8 further quantifies this effect across every sequence: without $r_{S^{2}}$ , the minimum ‖g(t)‖ can drop to 9.633 m/s² and the standard deviation is on the order of 10⁻² m/s², whereas with $r_{S^{2}}$ the minimum dip improves to 9.796 m/s² and the standard deviation decreases to the order of 10⁻⁴ m/s². The maximum stays at 9.81 m/s² because the B-spline evaluates g(t) as a convex weighted combination of control points with nonnegative weights that sum to one, and all control points are clamped to have magnitude 9.81 m/s².

7.4.2. Effect of the Post-Optimization

In the second sub-experiment, we compared the direct vertical drift APE_z with and without the rotation refinement via post-optimization. As mentioned in the previous sub-experiment, this level of accurate local gravity is directly translated into more precise observability of roll and pitch. As shown in Figure 15, the addition of the r_post factor notably improves odometry accuracy in the vertical direction. Specifically, APE_z is consistently reduced on every sequence; for example, on Quad, the vertical drift is reduced about 5 m through the addition of post-optimization process. Example graphs of BiCorridor and Tunnel sequences are presented in Figure 16. Notably, even without loop closure or point cloud-based registration schemes, GaRLILEO achieves APE_z lower than 1 m on most sequences.

Figure 15.

Effect of r_post on vertical pose estimation. Vertical odometry accuracy is notably enhanced when r_post is added to refine the rotation using the estimated local gravity in a post-optimization step. This result supports the idea that precise local gravity estimation can provide roll and pitch observations, thereby mitigating vertical drift in odometry.

Figure 16.

Z-time odometry plot including comparison between with and without the post-optimization. r_post factor prevents odometry from diverging in the vertical direction.

Taken together, these two sub-experiments demonstrate that GaRLILEO maintains a small, single-digit level of accuracy in local gravity estimation, even under aggressive legged robot locomotion, and that this precision is sufficient to deliver robust, low-drift vertical-state estimation on stairs, slopes, and even on slippery, potentially deformable surfaces.

7.5. Effect of velocity bias

In this subsection, we evaluate the odometry performance of GaRLILEO with and without the velocity bias. The results are summarized in Table 9, where the error metric is APE_t projected onto the xy plane. This 2D metric is used because the velocity bias operates only in the horizontal directions, where foot slip occurs parallel to the ground plane, resulting in horizontal velocity discrepancies.

Table 9.

Effect of velocity bias on APE_t projected onto the xy plane.

As shown in Table 9, more than half of the sequences exhibit improved accuracy with the bias, while the others perform better without it. Specifically, in the Downstair, Upstair, MoCap-E, and MoCap-H sequences, excluding the velocity bias yields more accurate xy plane odometry. A common feature of these datasets is the presence of long indoor stairs or challenging environments that frequently degrade leg odometry.

By contrast, in most indoor sequences, adding the bias improves accuracy, as radar sensors acquire denser, more reliable measurements from nearby static objects. Qualitative odometry is shown in Figure 17, which demonstrates enhanced odometry accuracy for both indoor and outdoor sequences.

Figure 17.

xy plane plot of odometry estimation with and without velocity bias. On Atrium, SlopeStair, and CorriLoop sequences, which are presented in (a), (b), and (c), velocity bias makes odometry more robust as radial velocity information of each radar target point mitigates the error of leg velocity occurring from inaccurate contact and joint encoder measurement. A more detailed zoomed view of each subfigure, showing the effect of the velocity bias term, is included in (d).

On indoor flat sequences, such as Atrium and CorriLoop, the velocity bias term improves odometry accuracy by accounting for potential contact slip using radar target radial velocity information. Due to the relatively rich radar targets in indoor environments, velocity bias may provide more reliable velocity information to the system’s velocity factors. Similarly, on Tunnel and Overpass sequences, where the surrounding library building or overpass (including the ceiling) enables the radar to collect sufficient reflections, the result is that odometry got enhanced even if a part of the trajectory traverses open outdoor space.

In BridgeLoop and BiCorridor sequences, where short stairs are included in the trajectory, present partially enhanced odometry estimation, but their enhancement proportion is a slight fall back compared with simple flat indoor sequences. This is due to the direction of the radar sensor when the robot is traversing staircases. As the radar sensor points upward, it may detect fewer radar targets, since fewer objects may potentially exist in that direction. This phenomenon occurs in Downstair and Upstair sequences, where the longest stair is included in both upstairs and downstairs directions, which even degrades the final odometry estimation when a velocity bias is added. Still, as shown in Table 9, the performance degradation in those cases is less drastic than the performance enhancement on other sequences.

One interesting point about this experiment is that even the SlopeStair sequence includes a long upstairs region, similar to the downstairs area of Downstair; however, the final odometry estimate was enhanced by almost 50%. This indirectly demonstrates the effect of the velocity bias factor on mitigating mild leg odometry failures, such as temporal contact slip or contact impact, using radar target velocity information. Even though the possibility of odometry degradation in the stair region exists, the velocity bias factor meaningfully handles the contact failure in the slope area, where almost half of the SlopeStair sequence is designed. This can be qualitatively verified from Figure 17(b) and (d), as the odometry accuracy degrades after the downstair region, particularly without a bias term, due to temporal contact slip or impact. The Quad sequence also includes the upstair and down-slope regions during the traverse, but the velocity bias effect is shown comparably mild, as the slope in this sequence is much milder compared with the SlopeStair sequence.

In the two MoCap sequences, leg odometry deviates from ground truth in the xy plane due to the slippery zone. Because the bias term is designed to change smoothly, it cannot compensate for abrupt or inconsistent slips, resulting in slight performance degradation.

Overall, the velocity bias can enhance odometry by reducing the discrepancy between radar and leg kinematics when leg kinematics temporally fail due to unanticipated contact issues. Still, a potential limitation of the velocity bias on drastically varying discrepancies remains. This effect is particularly evident in places where the static floor assumption fails, such as MoCap sequences. To address this, future work may consider alternative radar configurations (e.g., including ground reflections within the field of view) or an adaptive mechanism that enables or disables the bias term temporally depending on the environment in which the system operates.

7.6. Real-time capability on edge device

To demonstrate the real-time capability and computational efficiency of our framework, we conducted an experiment by deploying GaRLILEO on an onboard NVIDIA Jetson AGX Orin for the Atrium Sequence. A key factor in achieving this lightweight performance is the sliding window marginalization scheme, detailed in Section §5.5.2. This ensures the factor graph size does not grow unbounded over time. As shown in Figure 18(b), an average of 54 residual factors processed per single optimization step, while graph size is internally bounded to a maximum of 21 active control points. This consistency is maintained throughout the entire estimation process.

Figure 18.

End-to-end edge device test of GaRLILEO on NVIDIA Jetson AGX Orin with Atrium sequence. (a) Optimization time per window during factor graph optimization. Red line denotes 50 ms, which stands for speed of SoC radar sensor input; (b) number of factors in the factor graph per single optimization; (c) CPU utilization of GaRLILEO; (d) memory consumption of GaRLILEO. (e) Power consumption of NVIDIA Jetson AGX Orin while running GaRLILEO.

This bounded graph size directly translates to efficient computation frequencies. During the experiment, the end-to-end computation time per sliding window optimization averaged 23.277 ms, with a maximum recorded value of 41.097 ms. Because the incoming radar scans operate at 20 Hz, the system is required to complete its optimization within a 50 ms window. As illustrated by the red threshold line in Figure 18(a), GaRLILEO’s computation time consistently stays below this, guaranteeing real-time deployment capabilities on the edge device without dropping sensor frames.

Furthermore, the overall system throughput and resource consumption demonstrates the stability of GaRLILEO during operation. On the NVIDIA Jetson AGX Orin, GaRLILEO’s CPU utilization averaged 55.856%, while memory consumption remained at an average of 51.013 MB. The average system power usage during execution was 11.521 W, while maximum at 17.198 W. As depicted in Figures 18(c)–(e), the CPU, memory, and power metrics do not exhibit drastic growth over the algorithm runtime. Instead, they maintain a stable and predictable level, confirming GaRLILEO is suitable for deployment on power-constrained systems.

8. Conclusion

In this work, we presented GaRLILEO, a continuous-time radar-leg-IMU odometry framework that leverages B-spline modeling, local gravity estimation, and horizontal velocity bias correction to achieve robust and accurate state estimation in challenging environments. By tightly coupling radar-derived ego-velocity with leg kinematics and inertial measurements, our method addresses the limitations of existing odometry pipelines that suffer from vertical drift, contact slip, and sensor modality degradation.

Extensive experiments across diverse sequences, including long indoor loops, multi-story staircases, outdoor slopes, and environments with severe contact disturbances, demonstrated that GaRLILEO consistently outperforms other SOTA baselines in odometry accuracy. Notably, its continuous local gravity estimation significantly reduces vertical drift. At the same time, the B-spline-based fusion scheme ensures robustness against modality-specific failures such as radar sparsity or leg kinematics slips.

To evaluate the complementary effect of radar and leg kinematics sensors, we tested the performance of the radar-only and leg-only versions of GaRLILEO at MoCap sequences and compared their odometry accuracy. Through this experiment, we verified that both sensor modalities have their own strength points in horizontal and vertical direction accuracy. Consequently, the B-spline-based sensor fusion of GaRLILEO effectively blends the different modalities to enhance the odometry estimation result.

Through two sub-experiments, we validated the effect of proposing local gravity factors on both the accuracy of local gravity estimation itself and odometry in the vertical direction. These experiments demonstrate how GaRLILEO can achieve sub-meter vertical odometry accuracy without relying on loop closure or point cloud registration algorithms, highlighting that reliable local gravity estimation is crucial for robust, vertically low-drift odometry estimation in stairs, slopes, and challenging terrains.

We further analyzed the effect of the velocity bias term and verified that it provides clear benefits in structured indoor environments, while in sparse or highly dynamic outdoor scenes, it may lead to potential performance degradation. This suggests that environment-aware adaptation or improved radar hardware configurations could further enhance robustness.

Overall, our results highlight the potential of continuous-time multi-modal fusion to enable legged robots to navigate reliably in complex real-world environments.

8.1. Limitations and future extensions

Though GaRLILEO demonstrates robust odometry estimation across diverse indoor and outdoor environments, several limitations remain. First, our framework prefers denser radar points for stable velocity spline estimation. In open outdoor areas such as the Quad sequence, sparse radar returns degrade velocity bias adaptation and limit accuracy. Furthermore, due to the limited observation of yaw orientation, which relies on the IMU measurements, an inevitable odometry drift occurs in the horizontal direction as the trajectory becomes longer.

To mitigate the above limitations, future work will explore adaptive gravity post-optimization, which dynamically enables or disables the term depending on the reliability of the optimized gravity vector. For classification, fusion with exteroceptive sensors, such as cameras and LiDAR, or friction estimation based on recent contact tactile sensors can be exploited. Furthermore, to expand the field of view and improve odometry robustness, a multi-SoC radar system can be leveraged, which may provide additional yaw orientation observations, as mentioned in Yoon et al. (2023), or enable point cloud registration.

Additionally, the current framework relies on static covariances during optimization. To address the unpredictability of legged robot environments, future work will explore adaptive covariance estimation that dynamically adjusts factor weights based on real-world environmental context and sensor degradation, thereby further enhancing robustness.

Supplemental material

Footnotes

ORCID iDs

Chiyun Noh

Sangwoo Jung

Ayoung Kim

Yafei Hu

Laura Herlant

Ayoung Kim

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Technology Innovation Program (1415187329, 20024355, Development of autonomous driving connectivity technology based on sensor-infrastructure cooperation) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea), and in part by the Robotics and AI (RAI) Institute.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online.

Note

References

Agha

Otsu

Morrell

, et al. (2021) Nebula: Quest for Robotic Autonomy in Challenging Environments; Team Costar at the Darpa Subterranean Challenge. arXiv preprint arXiv:2103.11470.

Arndt

Sabzevari

Civera

(2023) Do planar constraints improve camera pose estimation in monocular slam? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2221–2230. IEEE.

Barnes

Gadd

Murcutt

, et al. (2020) The Oxford radar robotcar dataset: a radar extension to the oxford robotcar dataset. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 6433–6438. IEEE.

Bloesch

Hutter

Hoepflinger

, et al. (2012) State estimation for legged robots-consistent fusion of leg kinematics and imu. In: Proceedings of the Robotics: Science & Systems Conference, pp. 17–24. IEEE.

Bloesch

Gehring

Fankhauser

, et al. (2013) State estimation for legged robots on unstable and slippery terrain. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 6058–6064. IEEE.

Bloesch

Burri

Sommer

, et al. (2017) The two-state implicit filter recursive estimation for mobile robots. IEEE Robotics and Automation Letters 3(1): 573–580. https://doi.org/10.1109/lra.2017.2776340

Burnett

Schoellig

Barfoot

(2025) Continuous-time radar-inertial and lidar-inertial odometry using a gaussian process motion prior. IEEE Transactions on Robotics 41: 1059–1076. https://doi.org/10.1109/tro.2024.3521856

Camurri

Ramezani

Nobili

, et al. (2020) Pronto: a multi-sensor state estimator for legged robots in real-world scenarios. Frontiers in Robotics and AI 7: 68. https://doi.org/10.3389/frobt.2020.00068

Chen

Liu

Cheng

(2023) Drio: robust radar-inertial odometry in dynamic environments. IEEE Robotics and Automation Letters 8(9): 5918–5925. https://doi.org/10.1109/lra.2023.3301290

10.

Chen

, et al. (2024) River: a tightly-coupled radar-inertial velocity estimator based on continuous-time optimization. IEEE Robotics and Automation Letters 9(7): 6107–6114. https://doi.org/10.1109/lra.2024.3400154

11.

Dhédin

Khorshidi

, et al. (2023) Visual-inertial and leg odometry fusion for dynamic locomotion. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 9966–9972. IEEE.

12.

Kim

Lee

, et al. (2024) Dero: dead reckoning based on radar odometry with accelerometers aided for robot localization. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 8547–8554. IEEE.

13.

Doer

Trommer

(2020) An ekf based approach to radar inertial odometry. In: Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligence System, pp. 152–159. IEEE.

14.

Doer

Trommer

(2021) Yaw aided radar inertial odometry using manhattan world assumptions. In: Proceedings of the IEEE Saint Petersburg International Conference on Integrated Navigation System, pp. 1–9. IEEE.

15.

Droeschel

Behnke

(2018) Efficient continuous-time slam for 3d lidar-based online mapping. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 5000–5007. IEEE.

16.

Fallon

Antone

Roy

, et al. (2014) Drift-free humanoid state estimation fusing kinematic, inertial and lidar sensing. In: Proceedings of the IEEE International Conference on Humanoid Robots, pp. 112–119. IEEE.

17.

Fink

Semini

(2020) Proprioceptive sensor fusion for quadruped robot state estimation. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 10914–10920. IEEE.

18.

Furgale

Barfoot

Sibley

(2012) Continuous-time batch estimation using temporal basis functions. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2088–2095. IEEE.

19.

Furgale

Tong

Barfoot

, et al. (2015) Continuous-time batch trajectory estimation using temporal basis functions. International Journal of Robotics Research 34(14): 1688–1710. https://doi.org/10.1177/0278364915585860

20.

Geiger

Lenz

Stiller

, et al. (2013) Vision meets robotics: the kitti dataset. International Journal of Robotics Research 32(11): 1231–1237. https://doi.org/10.1177/0278364913491297

21.

Grupp

(2017) evo: Python package for the evaluation of odometry and slam. https://github.com/MichaelGrupp/evo

22.

Harlow

Jang

Barfoot

, et al. (2024) A new wave in robotics: survey on recent mmwave radar applications in robotics. IEEE Transactions on Robotics.

23.

Hartley

Jadidi

Gan

, et al. (2018a) Hybrid contact preintegration for visual-inertial-contact state estimation using factor graphs. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3783–3790. IEEE.

24.

Hartley

Mangelson

Gan

, et al. (2018b) Legged robot state-estimation through combined forward kinematic and preintegrated contact factors. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 4422–4429. IEEE.

25.

Hartley

Ghaffari

Eustice

, et al. (2020) Contact-aided invariant extended kalman filtering for robot state estimation. International Journal of Robotics Research 39(4): 402–430. https://doi.org/10.1177/0278364919894385

26.

Zheng

, et al. (2024) Paloc: advancing slam benchmarking with prior-assisted 6-dof trajectory generation and uncertainty estimation. IEEE/ASME Transactions on Mechatronics 29(6): 4297–4308. https://doi.org/10.1109/TMECH.2024.3362902

27.

Huang

Liang

Qiao

, et al. (2024) Less is more: physical-enhanced radar-inertial odometry. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 15966–15972. IEEE.

28.

Hug

Chli

(2020) Hyperslam: a generic and modular approach to sensor fusion and simultaneous localization and mapping in continuous-time. In: Proceedings of the IEEE International Conference on 3D Vision, pp. 978–986. IEEE.

29.

Hug

Bänninger

Alzugaray

, et al. (2022) Continuous-time stereo-inertial odometry. IEEE Robotics and Automation Letters 7(3): 6455–6462. https://doi.org/10.1109/lra.2022.3173705

30.

Jung

Kim

(2023) Asynchronous multiple lidar-inertial odometry using point-wise inter-lidar uncertainty propagation. IEEE Robotics and Automation Letters 8(7): 4211–4218. https://doi.org/10.1109/lra.2023.3281264

31.

Jung

Yang

Kim

(2024) Co-ral: complementary radar-leg odometry with 4-dof optimization and rolling contact. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 13289–13296. IEEE.

32.

Kellner

Barjenbruch

Klappstein

, et al. (2013) Instantaneous ego-motion estimation using doppler radar. In: Proceedings of the IEEE Intelligent Transportation Systems Conference, pp. 869–874. IEEE.

33.

Kim

Hong

, et al. (2021) Legged robot state estimation with dynamic contact event information. IEEE Robotics and Automation Letters 6(4): 6733–6740. https://doi.org/10.1109/lra.2021.3093876

34.

Kim

Lee

, et al. (2022) Step: state estimator for legged robots using a preintegrated foot velocity factor. IEEE Robotics and Automation Letters 7(2): 4456–4463. https://doi.org/10.1109/lra.2022.3150844

35.

Kim

Jung

Noh

, et al. (2025a) Hercules: heterogeneous radar dataset in complex urban environment for multi-session radar slam. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 4649–4656. IEEE.

36.

Kim

Jung

Yang

, et al. (2025b) Sherloc: Synchronized Heterogeneous Radar Place Recognition for Cross-Modal Localization. arXiv preprint arXiv:2506.15175.

37.

Kubelka

Vaidis

Pomerleau

(2022) Gravity-constrained point cloud registration. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4873–4879. IEEE.

38.

Lang

Huang

, et al. (2022) Ctrl-vio: continuous-time visual-inertial odometry for rolling shutter cameras. IEEE Robotics and Automation Letters 7(4): 11537–11544. https://doi.org/10.1109/lra.2022.3202349

39.

Lang

Chen

Tang

, et al. (2023) Coco-lic: continuous-time tightly-coupled lidar-inertial-camera odometry using non-uniform b-spline. IEEE Robotics and Automation Letters 8(11): 7074–7081. https://doi.org/10.1109/lra.2023.3315542

40.

Lang

, et al. (2024) Gaussian-lic: real-Time photo-realistic slam with gaussian splatting and lidar-inertial-camera fusion. arXiv preprint arXiv:2404.06926.

41.

Brasch

Wang

, et al. (2020) Structure-slam: low-Drift monocular slam in indoor environments. IEEE Robotics and Automation Letters 5(4): 6583–6590. https://doi.org/10.1109/lra.2020.3015456

42.

Lin

Tong

, et al. (2023) Proprioceptive Invariant Robot State Estimation. arXiv preprint arXiv:2311.04320.

43.

Lang

, et al. (2023) Continuous-time fixed-lag smoothing for lidar-inertial-camera slam. IEEE/ASME Transactions on Mechatronics 28(4): 2259–2270. https://doi.org/10.1109/tmech.2023.3241398

44.

Michalczyk

Jung

Weiss

(2022) Tightly-coupled ekf-based radar-inertial odometry. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 12336–12343. IEEE.

45.

Nemiroff

Chen

Lopez

(2023) Joint on-manifold gravity and accelerometer intrinsics estimation for inertially aligned mapping. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1388–1394. IEEE.

46.

Nisticò

Soares

JCV

Amatucci

, et al. (2025) Muse: a real-time multi-sensor state estimator for quadruped robots. In: IEEE Robotics and Automation Letters. IEEE.

47.

Nobili

Camurri

Barasuol

, et al. (2017) Heterogeneous sensor fusion for accurate state estimation of dynamic legged robots. In: Proceedings of the Robotics: Science & Systems Conference. IEEE.

48.

Noh

Yang

Jung

, et al. (2025) Garlio: gravity enhanced radar-lidar-inertial odometry. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 9869–9875. IEEE.

49.

Nubert

Tuna

Frey

, et al. (2025) Holistic fusion: task-and setup-agnostic robot localization and state estimation with factor graphs. arXiv preprint arXiv:2504.06479.

50.

(2024) Leg-kilo: robust kinematic-inertial-lidar odometry for dynamic legged robots. In: IEEE Robotics and Automation Letters. IEEE.

51.

Ovrén

Forssén

(2019) Trajectory representation and landmark projection for continuous-time structure from motion. International Journal of Robotics Research 38(6): 686–701. https://doi.org/10.1177/0278364919839765

52.

Park

Shin

Kim

, et al. (2021) 3d ego-motion estimation using low-cost mmwave radars via radar velocity factor for pose-graph slam. IEEE Robotics and Automation Letters 6(4): 7691–7698. https://doi.org/10.1109/lra.2021.3099365

53.

Patrikalakis

Maekawa

(2002) Shape Interrogation for Computer Aided Design and Manufacturing. Springer, Vol. 15.

54.

Qin

Shen

(2018) Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics 34(4): 1004–1020. https://doi.org/10.1109/tro.2018.2853729

55.

Ramezani

Khosoussi

Catt

, et al. (2022) Wildcat: Online continuous-time 3d lidar-inertial slam. arXiv preprint arXiv:2205.12595.

56.

Roston

Krotkov

(1992) Dead reckoning navigation for walking robots. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. 1, 607–612. IEEE.

57.

Seo

Lim

Lee

, et al. (2022) Pago-loam: robust ground-optimized lidar odometry. In: Proceedings of the International Conference on Ubiquitous Robots. IEEE, pp. 1–7.

58.

Shan

Englot

(2018) Lego-loam: lightweight and ground-optimized lidar odometry and mapping on variable terrain. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4758–4765. IEEE.

59.

Shu

Xie

Rambach

, et al. (2021) Visual slam with graph-cut optimized multi-plane reconstruction. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 165–170. IEEE.

60.

Sibley

Matthies

Sukhatme

(2010) Sliding window filter with application to planetary landing. Journal of Field Robotics 27(5): 587–608. https://doi.org/10.1002/rob.20360

61.

Sommer

Usenko

Schubert

, et al. (2020) Efficient derivative computation for cumulative b-splines on lie groups. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11148–11156. IEEE.

62.

Talbot

Nubert

Tuna

, et al. (2024) Continuous-Time State Estimation Methods in Robotics: A Survey. arXiv preprint arXiv:2411.03951.

63.

Tranzatto

Dharmadhikari

Bernreiter

, et al. (2022a) Team cerberus wins the darpa subterranean challenge: technical overview and lessons learned. https://arxiv.org/abs/2207.04914

64.

Tranzatto

Miki

Dharmadhikari

, et al. (2022b) Cerberus in the darpa subterranean challenge. Science Robotics 7(66): eabp9742. https://doi.org/10.1126/scirobotics.abp9742

65.

Wang

Zhang

Shen

, et al. (2023) D-LIOM: tightly-coupled direct lidar-inertial odometry and mapping. IEEE Transactions on Multimedia 25: 3905–3920. https://doi.org/10.1109/TMM.2022.3168423

66.

Wang

Cao

Wang

, et al. (2024) Robust ground constrained slam for mobile robot with sparse-channel lidar. In: IEEE Transactions on Intelligent Vehicles. IEEE.

67.

Wei

Sun

, et al. (2021) Ground-slam: ground constrained lidar slam for structured multi-floor environments. arXiv preprint arXiv:2103.03713.

68.

Wisth

Camurri

Fallon

(2022) Vilens: visual, inertial, lidar, and leg odometry for all-terrain legged robots. IEEE Transactions on Robotics 39(1): 309–326. https://doi.org/10.1109/tro.2022.3193788

69.

Cai

, et al. (2022) Fast-lio2: fast direct lidar-inertial odometry. IEEE Transactions on Robotics 38(4): 2053–2073. https://doi.org/10.1109/tro.2022.3141876

70.

Yang

Zhang

Bokser

, et al. (2023a) Multi-imu proprioceptive odometry for legged robots. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 774–779. IEEE.

71.

Yang

Zhang

, et al. (2023b) Cerberus: low-drift visual-inertial-leg odometry for agile locomotion. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 4193–4199. IEEE.

72.

Yoon

Burnett

Laconte

, et al. (2023) Need for speed: fast correspondence-free lidar-inertial odometry using doppler velocity. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5304–5310. IEEE.

73.

Zhu

Ren

Zhang

(2022) Robust real-time lidar-inertial initialization. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3948–3955. IEEE.

74.

Zhuang

Wang

Huai

, et al. (2023) 4d iriom: 4d imaging radar inertial odometry and mapping. IEEE Robotics and Automation Letters 8(6): 3246–3253. https://doi.org/10.1109/lra.2023.3266669

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

GaRLILEO: Gravity-aligned radar-leg-inertial enhanced odometry

Abstract

Keywords

1. Introduction

2. Related works

2.1. SoC radar odometry

2.2. Leg kinematics odometry

2.2.1. Exteroceptive approach

2.2.2. Proprioceptive approach

2.3. Local gravity estimation

2.4. Continuous-time state estimation

3. Preliminary

3.1. Radar radial velocity and ego velocity

3.2. Leg kinematics and ego velocity

3.3. B-spline interpolation

4. Factor graph formulation

4.1. State definition and notation

4.2. IMU factor

4.3. Velocity factor

4.4. Gravity factor

5. Continuous radar-leg-IMU fusion

5.1. Initialization

5.1.1. SO(3) spline initialization

5.1.2. Global gravity initialization

5.1.3. Velocity spline initialization

5.1.4. Z-axis alignment with global gravity

5.1.5. Gravity spline initialization

5.2. Factor graph optimization

5.2.1. Incremental optimization

5.2.2. Marginalization

5.2.3. Post optimization

6. Radar-leg-IMU dataset

6.1. System configuration

6.1.1. SNU system

6.1.2. RAI system

6.2. Details of each sequence

6.3. Extrinsic calibration of sensor systems

6.4. Ground truth trajectory generation

6.4.1. SNU sequences

6.4.2 RAI sequences

7. Experiment results

7.1. Radar-IMU odometry comparison

7.1.1. Atrium

7.1.2. BridgeLoop

7.1.3. CorriLoop

7.1.4. BiCorridor

7.1.5. Downstair

7.1.6. Upstair

7.1.7. SlopeStair

7.1.8. Overpass

7.1.9. Tunnel

7.1.10. Quad

7.1.11. MoCap-E

7.1.12. MoCap-H

7.2. Leg kinematics odometry comparison

7.2.1. Atrium

7.2.2. BridgeLoop and CorriLoop

7.2.3. BiCorridor

7.2.4. Downstair and Upstair

7.2.5. SlopeStair Sequence

7.2.6.

7.2.7. Tunnel

7.2.8.

7.2.9. MoCap-E and MoCap-H

7.3. Complementary effects of radar and leg kinematics

7.3.1. SNU Sequences

7.3.2. RAI Sequences

7.4. Effect of gravity factors on state estimation accuracy

7.4.1. Effect of the Soft S 2 -Constrained Gravity Factor

7.4.2. Effect of the Post-Optimization

7.5. Effect of velocity bias

7.6. Real-time capability on edge device

8. Conclusion

8.1. Limitations and future extensions

Supplemental material

Footnotes

ORCID iDs

Funding

Declaration of conflicting interests

Supplemental material

7.4.1. Effect of the Soft $S^{2}$ -Constrained Gravity Factor