Abstract
Articulated tracked robots are well-suited for post-disaster rescue missions, but their navigation in large-scale, complex environments presents significant challenges. Current methodologies often face limitations, including high computational demands, inadequate real-time capabilities, and compromised safety during cross-story maneuvers. This paper introduces a novel navigation framework that addresses these challenges through a hierarchical global planner, a manifold-optimization-based local planner, and an on-manifold model predictive controller. The framework leverages various forms of terrain information extracted from the environment in all planning and control stages to facilitate global path generation, local path optimization, and robust whole-body motion control. With coherent interaction among different modules, the framework ensures reliable navigation performance. Extensive ablation simulations and comparative real-world experiments demonstrate that the proposed framework significantly improves the success rate, safety, and efficiency over state-of-the-art methods, while replacing any key method within our framework leads to a noticeable degradation in performance. Specifically, in large-scale scenarios with complex terrain spanning multiple acres, the hierarchical global planner can generate feasible paths within seconds, reducing the time cost by 78.81%. The manifold-optimization-based local planner effectively ensures obstacle avoidance while fully meeting the maneuvering safety requirements in various typical challenging terrains. The holistic controller enabled the robot to stably and reliably track paths on steep
Introduction
In recent years, various robots have been developed to mitigate potential harm to rescuers while improving the efficiency and success rate of disaster rescue missions (Delmerico et al., 2019). Based on their mechanical structure and operational modes, rescue robots can be broadly categorized into aerial and ground robots. Aerial robots, predominantly quadrotor drones, are characterized by their lightweight design and high maneuverability, enabling efficient environmental exploration and perception supported by relatively mature navigation strategies (Bircher et al., 2018; Faessler et al., 2016). Ground robots, in contrast, can carry larger payloads, enabling direct interaction with disaster environments and undertaking more complex rescue tasks.
Ground mobile robots with various configurations have been developed to navigate unstructured environments, including wheeled (Chiu et al., 2005), tracked (Choi et al., 2019; Schwarz et al., 2017), and bio-inspired legged robots (Hutter et al., 2017). In disaster-stricken areas, the terrain is often highly complex and varied, featuring rubble, steep slopes, and staircases, which pose significant challenges to the mobility of ground robots. Consequently, different locomotion paradigms exhibit distinct strengths and limitations. Wheeled robots exhibit both high stability and strong controllability, but struggle to maneuver over uneven terrains such as staircases and rubble. Legged and other bio-inspired robots exhibit superior mobility but are challenging to control and suffer from lower stability. In contrast, tracked robots can achieve a favorable balance between high maneuverability and outstanding controllability in complex terrains. This balance is particularly evident in articulated tracked robots incorporating bio-inspired auxiliary flipper designs, which simultaneously enhance mobility and stability. Therefore, articulated tracked robots have become the mainstream choice for small-scale ground rescue robots (Delmerico et al., 2019). This paper addresses the critical issues and challenges in the autonomous navigation of such robots within complex, three-dimensional rescue environments (see Figure 1). The articulated tracked robot navigating on the spiral staircase (a), the rubble (b), and the outdoor grassland (c).
Articulated tracked robots (ATRs) are well-suited for rescue tasks but face substantial navigation challenges in large-scale 3D rescue scenarios with multi-floor structures, significant elevation changes, and rugged terrains. The sparse and complex topological connectivity among traversable areas (e.g., staircases between collapsed floors) greatly reduces navigation efficiency. Compounding this, given the ATR’s diverse motion DOFs and its SE(3) centroid state in 3D space, it is insufficient to address such a high-dimensional and complex navigation problem using traditional 2D planar navigation methods. Furthermore, the traversability depends heavily on both the configuration of the ATR and the terrain, significantly complicating the planning process. Moreover, certain terrain features, such as steep slopes and staircases, demand careful assessment of the robot’s capabilities and constraints. The robot has to leverage terrain information to navigate safely. Existing research heavily depended on vanilla feedback control methods to maneuver ATRs in complex terrains (Chen et al., 2023; Yuan et al., 2019). However, these methods face substantial limitations when applied to complex multi-input, multi-output systems like ATR. And they also struggled to achieve whole-body maneuvers and safe path tracking while considering the robot’s kinematic constraints.
To address the aforementioned challenges, we propose a tightly integrated navigation framework for ATRs that leverages terrain information across different planning and control stages. A hierarchical global planner rapidly generates feasible robot centroid paths using discretized terrain maps, significantly improving planning efficiency in large-scale 3D environments while ensuring reliable reachability. This initialized path is then refined by a manifold-optimization-based local planner, which formulates a multi-objective problem to improve path quality under kinematic and terrain-specific constraints, enhancing safety and adaptability in complex terrains. Finally, the on-manifold model predictive controller utilizes only the local centroid path and local terrain perception to achieve real-time, whole-body coordination of the chassis and flippers. This control strategy avoids high-dimensional planning and enables robust execution under rugged terrain conditions. By decoupling the navigation process while ensuring close inter-module synergy, the framework balances completeness and optimality, thereby ensuring both computational efficiency and system robustness. The main contributions of this paper can be summarized as follows: • We introduced a hierarchical global planner ( • We designed a manifold-optimization-based local planner ( • We developed an on-manifold model predictive controller ( • We conducted systematic ablation simulations and real-world experiments to validate each module’s contribution and the framework’s overall performance, providing valuable insights and benchmarks for future research on ATR navigation in rescue scenarios.
Related works
This section reviews relevant works on autonomous ground navigation in 3D unstructured environments, which have inspired our research.
Navigation of normal ground robots in unstructured environments
For ground robots navigating unstructured environments, assessing the ground’s traversability is crucial. Krüsi et al. (2017) pioneered the use of planar fitting near surface points in point cloud maps to assess traversability, which was further used as a basis for state validity checks in sampling-based planners to generate initial 3D paths. Similarly, PUTN (Jian et al., 2022) employed a dimensional augmentation strategy during sampling, where poses are first sampled in 2D space and then projected onto the ground to recover 3D poses. This approach reduced invalid sampling and enhanced sampling efficiency. Xu et al. (2023) extended this idea by iteratively optimizing the robot’s pose in SE(2) space to obtain the corresponding pose in SE(3) space. Their work provided a more detailed and specific assessment of ground traversability, and the polynomial trajectory optimization improved planning efficiency in complex terrains. To address navigation in multi-floor environments, Wang et al. (2023) identified traversable areas directly in large-scale point cloud maps, eliminating the need for the 2D-to-3D recovery process.
Extracting more detailed maps with terrain information from point clouds is also a common approach. Wang et al. (2017) converted point clouds into multi-layer maps by abstracting spatial features and using slope information to create traversability maps for practical navigation. In contrast, STEP’s abstracted map incorporated more realistic environmental elements such as different terrains and obstacles, allowing the robot to find feasible paths in complex mining environments (Fan et al., 2021).
Moreover, learning-based methods can also assist robots in obtaining traversability maps in complex environments. Weerakoon et al. (2022) and Hu et al. (2021) employed reinforcement learning to identify traversable areas for robots and then used sampling-based planning methods to find feasible paths within these areas. Cai et al. (2023) proposed a more interpretable learning-based method that takes semantic and elevation maps as inputs to a neural network, producing a traction distribution map to guide robot navigation in outdoor environments. However, learning-based path planning methods often exhibit limited robustness and require multiple training rounds for specific scenarios, making them challenging to deploy in practical rescue missions.
Navigation of ATRs in unstructured environments
Tracked robots equipped with auxiliary flippers have more adaptability in unstructured environments. However, algorithms designed for wheeled and traditional tracked robots cannot be directly adapted to solve the ATR navigation problem.
Yuan et al. (2019) proposed a stability evaluation criterion, suggesting that tracked robots with auxiliary flippers should maintain maximum contact with the ground during maneuvers in complex environments to ensure stability. In their work, feasible ATR chassis poses were generated from discrete terrain point clouds, and the optimal path waypoints (including auxiliary flipper joint states) were selected from the corresponding possible configurations. A feedback controller was then used to execute the robot’s path. Similarly, Chen et al. (2023) introduced an evaluation system incorporating sequential considerations and pose prediction to ensure the robot’s motion stability and continuity. Xu et al. (2024) further classified ATR configuration states into several forms and achieved segmented whole-body motion planning and control through mixed optimization.
Gianni et al. (2016) proposed an ATR tracking control framework specifically designed for rescue scenarios. The framework utilized the robot’s centroid path and abstracted terrain planes to design the configuration of the auxiliary flippers. A complex kinematic solver and a feedback controller were then employed to achieve whole-body control of the robot.
Except for traditional control methods, reinforcement-learning-based control for ATRs has been validated in scenarios where sufficient prior knowledge is available (Mitriakov et al., 2021; Pan et al., 2023). These approaches abandon traditional methods’ manually defined, often intricate evaluation criteria and instead rely on the explore-exploit mechanism of reinforcement learning. Through extensive offline simulations, exclusive control strategies were trained for ATRs to perform tasks such as stair climbing and many others. To reduce the learning cost for robots, Azayev and Zimmermann (2022) utilized human supervision to guide the motion planning and control of the robot. Human-in-the-loop methods enabled the robot to achieve adaptive whole-body control through learning-based methods. However, these techniques require substantial training and complex environmental abstractions, making them challenging to deploy in complex rescue scenarios.
Previous research has commonly employed terrain point clouds to guide ground robots in unstructured environments. However, they often lacked sufficient abstraction and management of terrain information, leading to longer path generation times and more limited applicability. The safety and adaptability of robot paths across various terrains have also not been adequately addressed. Additionally, existing ATR control methods remain relatively simple and do not fully consider the safety of motion control in complex terrain situations. Although learning-based control methods avoided sophisticated logical inference and theoretical derivation, they entailed significantly higher deployment costs and time consumption than traditional methods. These methods also struggled to cover some common corner cases in real-world scenarios and suffer from serious generalization issues. Thus, this paper proposes a terrain-informed navigation framework that effectively addresses the limitations of existing approaches and offers practical solutions for deploying autonomous ATRs in complex 3D disaster environments.
Problem statement
Figure 2 illustrates the typical mechanical design of an ATR. The ATR’s tracked chassis has a centroid frame denoted as F
b
, with the front flipper frame labeled F
f
and the rear flipper frame labeled F
r
. The angles between these flippers and the chassis are [ϕ
f
, ϕ
r
]. The kinematic characteristics of the tracked chassis allow movements along the robot’s longitudinal direction and in-place rotation [v
x
, ω
z
]. The rotational speeds of the auxiliary flipper joint drives are denoted as [ψ
f
, ψ
r
]. The typical mechanical design of an ATR, with the corresponding frame definitions and the motion space.
Let • Traversable areas in unstructured cross-story environments are sparse and poorly connected, and assessing robot traversability over large complex terrains can be very difficult. • In large-scale 3D environments, classical planning methods struggle to rapidly explore and evaluate terrain, resulting in extended path generation times. • Conventional path optimization methods fail to realize obstacle avoidance while ensuring path-terrain adherence, and they cannot adjust the path according to specific terrain features. • The complex mechanism of ATRs makes decoupled or loosely coupled control methods inadequate for achieving whole-body control, which is necessary to ensure safe maneuverability on complex terrains.
Framework
In this paper, we propose a tightly integrated framework that leverages terrain-aware representations across all planning and control modules (see Figure 3) to address the challenges of the ATR navigation problem. The proposed framework comprises several tightly integrated modules for terrain information abstraction, global path generation, local path optimization, and whole-body control of the ATR in rescue scenarios.
First, the terrain information abstraction module efficiently extracts and manages ground terrain information in the form of a 3D terrain-info grid map, based on either online point cloud maps (obtained through popular SLAM methods like FAST-LIO2 proposed by Xu et al. (2022)) or pre-acquired prior point cloud maps. This well-organized terrain information is subsequently utilized by both the global path generation and local path optimization. Next, we present a hierarchical global planner to ensure both the completeness and efficiency of path generation. We further introduce a parallelization strategy that significantly accelerates the path generation process, while improving computational efficiency. Subsequently, we propose a manifold-optimization-based local planner that refines the global path into a local path suitable for the robot’s centroid tracking control. This optimized path respects the robot’s kinematic constraints, performs obstacle avoidance, and satisfies a range of scenario-specific objectives. Finally, we introduce an on-manifold model predictive controller (M2PC) specifically designed for ATRs. Leveraging the smooth and collision-free local path, M2PC integrates the extracted terrain features with the robot’s kinematic constraints to enable safe, smooth, and terrain-adaptive whole-body control during navigation execution.
The framework integrates different terrain representations at the planning and control stages to balance efficiency and optimality. Specifically, the global planner operates on the 3D terrain-info grid map enriched with discrete planar terrain approximations. The local planner also performs path optimization on the manifold spaces constructed from terrain planes in the grid map. These planar terrain models allow rapid and effective path planning even in large-scale environments. The whole-body controller uses a higher-fidelity terrain model, applying quadratic surface fitting to better capture terrain curvature, enabling more adaptive and stable motion over complex surfaces.
Discrete terrain information abstraction
In complex 3D rescue environments, ground robot path planning has to fully use the obtained terrain geometry information. However, for large-scale terrain information abstraction, general point cloud fitting methods often homogenize a lot of discontinuous terrain. For example, the floors of different disconnected rooms on the same level might have similar geometric information. Similarly, many traversable surfaces are at the same height but belong to different buildings. Moreover, many general terrain analysis methods fail to capture the topological relationships and distance information between traversable regions (Krüsi et al., 2017; Lee et al., 2022), which limits their utility for ground robot navigation in complex environments.
We adopted a grid-based terrain information abstraction method to address the abovementioned issue. The robot’s workspace Grid map with terrain information. White points represent the point cloud data in the map. Cyan planes denote the traversable terrain planes extracted within the grids, while red planes denote the non-traversable obstacle planes. The geometric information of these planes is used to further assess the passability of each grid cell. Green indicates traversable grids, and red indicates non-traversable grids.
Terrain-informed hierarchical global planner
Although ground robots typically operate near the seemingly low-dimensional ground surfaces in 3D rescue environments, their centroid poses in the global coordinate frame reside in SE(3) space. In addition to the extra DOFs introduced by the ATR’s flippers, global path generation for the rescue ATR is inherently a high-dimensional problem.
The combined effects of the high dimensionality of the robot’s state, the complexity of 3D environments, and the workspace constraints of the robot collectively make it extremely challenging to generate feasible paths for ATR using a direct, monolithic global planning approach. To address these issues, we first decouple the ATR’s flipper motion from the planning stage, instead relying on the robustness of subsequent controllers within the framework to achieve reliable flipper regulation and control. Meanwhile, we introduce a hierarchical global planner that employs a bi-level strategy to alleviate these inherent challenges effectively. Hierarchical strategies are commonly used to address large-scale planning problems. For example, TARE (Cao et al., 2021) generated a crude path based on the topological connectivity of the graph map and then refined this path within specific regions to meet the navigation requirements. However, such methods have to handle the connection between paths in different regions carefully; otherwise, gaps between path waypoints may occur, leading to fluctuations in path resolution that can negatively impact navigation.
In this paper, unlike traditional methods, the lower-level planner does not correct or optimize the crude path generated by the upper-level planner. Instead, the terrain information near the crude path was utilized to guide the lower-level planner’s operation space.
As illustrated in Figure 3, the upper-level planner quickly generates a crude path, CΓ base , using a graph-search method on the previously established terrain information grid map. At this level, the robot’s orientation is temporarily disregarded, reducing the search space dimensionality and enabling fast, approximate path generation while preserving path-finding completeness. This path, derived from grid topology and connectivity, is neither smooth nor capable of passing feasibility checks in complex environments. Additionally, the inaccessible areas in large-scale rescue scenarios negatively impact the robustness of the graph search, leading to redundant global paths. However, the region CΓ base traversed contains much feasible and valuable terrain information, which serves as a good prior for subsequent processing.
The lower-level planner utilizes a sampling-based planning (SBP) method to ensure path traversability while incorporating agnostic randomness. For ground robots engaged in rescue missions, most 3D space is non-traversable (e.g., regions far from the ground), with the robot’s pose constrained near the terrain. The algorithm utilizes terrain information P
terr
from the grid map M
grid
near the crude path CΓ
base
as priors for sampling, thereby limiting the sampled states to regions close to the ground while enhancing the completeness of global path generation with stochasticity. For a traversable terrain surface
The dependent random position in the z direction is calculated as:
The dependent random orientation qr is derived based on the projection of the direction vector
The time consumption of sampling-based planning algorithms is mainly concentrated on the feasibility checks of the sampled states. In a 3D environment, a feasible state has to ensure that the ATR maintains stable contact with the ground, avoids collisions within the environment, and stays away from dangerous edges (e.g., cliffs). Our algorithm simplifies the robot model into a convex rectangular cuboid to reduce the time spent on collision detection. By combining this convex primitive with a kDTree that stores map information (Șucan et al., 2012), the efficiency of collision detection is significantly enhanced.
Additionally, the stability can be determined by the projection of the robot’s footprint on the ground. As shown in Figure 5, since the simplified robot model is a rectangular cuboid, when its rectangular chassis footprint is vertically projected onto the ground, a 3D convex polygon is formed. Affected by the terrain, the polygon’s vertices set can be expressed as The typical stability evaluation cases for ground robots: (a) valid, (b) missing support, (c) uneven, and (d) unstable.
Furthermore, a divide-and-conquer approach can be employed to parallelize and accelerate the planning process in large-scale environments, as illustrated in Figure 6. For long-range paths, the causal relationships between distant waypoints are typically weak. Therefore, the grids within the sampling region are grouped according to the graph-searched crude path. Then, perform sampling for feasible segment endpoints Hierarchy global planner with divide-and-conquer strategy.
Thus, based on a terrain information grid map, this framework introduces a novel and fast hierarchical global path generation method, further enhancing planning efficiency by adopting a divide-and-conquer strategy, ensuring that the robot can quickly generate a feasible path in complex 3D environments.
Manifold-optimization-based local planner
For ATR navigation in large-scale environments, global paths often lack the precision required to accommodate the robot’s motion characteristics and adaptability to complex, fluctuating terrains. To ensure smooth, feasible, and terrain-adaptive local paths, this paper adopts a manifold-optimization-based local planner that exploits the terrain manifold to efficiently refine coarse global paths under various constraints.
Mathematical representation of SE(3)-path optimization
For optimization-based 3D ground robot path planning algorithms, the basic form of the optimization problem can be expressed as:
Kinematic cost
The robot’s mobility is constrained by its inherent kinematic limitations. The optimization has to incorporate a cost function that accounts for kinematics to produce a smooth optimized path. The kinematic model of a tracked chassis can be simplified to a differential drive model with non-holonomic motion constraints. The kinematic cost function, denoted as
Obstacle avoidance cost
The robot has to maintain a safe clearance from obstacles
Ground adherence cost
In a 3D environment, the waypoints
Pitch stability cost
In rugged terrain, maneuvering requires constraining the robot’s pitch angle to ensure safety and stability. Moving along a straight path on excessively steep slopes may result in insufficient traction or cause the robot to slip. The robot should follow a winding path, prioritizing safety over path length to mitigate these risks. Thus, the method constrains the robot’s pitch angle within a safe range of
Anisotropic traversability cost
In most cases, a robot’s maneuverability on normal terrain is consistent and isotropic. However, stair-like terrain, commonly found in buildings, introduces anisotropy in traversability. Empirical evidence indicates that tracked robots exhibit optimal performance when traversing perpendicularly to the edges of stairs. Because the chassis’s sinuous motion would make it difficult for the tracks to overcome the structural pressure at stair edges, and the chassis may be lifted by the edges, resulting in a temporary loss of maneuverability. Thus, we define an anisotropic traversability cost
The staircases’ pose and the steps’ geometric attributes can be precisely identified using LiDAR point clouds or RGB-D images (Perez-Yus et al., 2017; Sriganesh et al., 2023; Westfechtel et al., 2018). In this framework, terrain information in the grid map is updated and semantically annotated in real-time. If the traversability of the terrain P
terr
in the grid is anisotropic, the direction The anisotropic traversability on the staircases.
Mathematical representation of path optimization on the terrain manifold
Multi-objective SE(3)-path optimization may fail due to disturbances in certain optimization directions. For instance, the optimized path might deviate from the ground to avoid obstacles, failing to meet ground adherence. This issue arises because the ground adherence cost is a soft constraint in the optimization problem (3), and significantly increasing its cost weight does not guarantee path feasibility and may lead to a weight imbalance in the optimization problem. To address this, we propose a path optimization method based on terrain manifolds to integrate the ground adherence cost as an implicit hard constraint within the manifold.
Considering the ground adherence, the waypoints can be bijectively transformed between the terrain manifold space and SE(3) space within a small range. The bijective transformation allows the path optimization problem in SE(3) to be mapped to a path optimization problem on the terrain manifold.
On manifold
The optimization process on a manifold has to consider the gradient of the cost function on that manifold. If
Optimizing a waypoint The retraction map R

Optimization on manifold extensively utilizes the tangent space to perform iterations (Boumal, 2023). The manifold used here is derived from the terrain planes P
terr
stored in the grid map M
grid
. This approach ensures that tangent spaces are identical across all points on a manifold, thereby facilitating the calculation of the Riemannian gradients grad
For the local planning, the information of the waypoint
For path optimization on the terrain manifold, the problem (3) is transformed into:
The manifold optimization (ManiOpt) resolves the issue of the path deviation from the traversable ground while significantly receding the dimension of the optimization variables and further reduces the cost type
The efficiency of ManiOpt helps prevent overshooting in path-tracking control that may occur due to delayed reference updates. Moreover, the optimized paths generated by ManiOpt reduce the risk of environmental misinterpretation during execution. For instance, ManiOpt mitigates the likelihood of holistic control failures induced by obstacle points in terrain analysis. In addition, ManiOpt’s capability to incorporate terrain-specific optimization objectives in particular scenarios further decreases the complexity of ATR control. Finally, as the optimized paths remain close to the terrain surface, these paths provide high-quality information for terrain-aware motion control, thereby minimizing discrepancies between the reference terrain point cloud and the actual terrain.
On-manifold model predictive controller for ATR
The ATR should be able to adjust the flipper states adaptively, ensuring that the body adheres to the terrain and lowers its centroid to achieve stable maneuvering and better stress distribution in complex environments, thereby reducing the risk of malfunctions such as overturning or becoming trapped (Chen et al., 2023; Okada et al., 2009). Existing research often treats the flippers as a special mechanism of the chassis. After determining the centroid-joint space path for the next period, the robot utilizes a decoupled feedback control method to manage the tracks’ and flippers’ motion. However, suppose the flipper actuators do not perform as expected. In that case, the planned joint space path may not be accurately followed, potentially leading to terrain-configuration mismatches that could place the robot in an unsafe operational condition.
The primary concern for ATR navigation in disaster rescue missions is whether the robot can reach its assigned target locations. Inspired by some well-established navigation frameworks (Sleiman et al., 2023), the method proposed in this paper emphasizes mission-oriented whole-body control. To address the challenge of coordinated and stable ATR motion on rugged terrain, we propose an on-manifold model predictive controller (M2PC) that integrates terrain information with the robot’s whole-body kinematics. This controller enables the ATR’s chassis to follow the optimized local centroid path holistically and stably, without requiring an explicitly organized flipper joint space path.
Therefore, M2PC not only enhances whole-body control safety but also significantly reduces the dimensional burden on the planning phases within the framework. On the one hand, M2PC eliminates the need for the global planner to sample flipper’s joint states or perform complex feasibility checks during initial path generation. On the other hand, M2PC allows the local planner to optimize paths without considering optimization terms related to the flippers, thus reducing computational costs.
As illustrated in Figure 9, the core conception of M2PC involves mathematically abstracting the terrain into a continuously differentiable manifold and then leveraging the manifold’s additive properties to couple the terrain manifold with the robot’s motion manifold. The coupled manifold acts like a system model that guides the model predictive control of ATRs navigating rugged terrain. Within this manifold, the proposed controller adaptively coordinates the robot’s whole-body motion under terrain and robot constraints, significantly enhancing safe maneuverability in extreme environments. The holistic motion of the ATR on the manifold.
The manifold
For the optimization on the manifold, the operation
The optimization problem (13) is constrained by both variable bounds and the system model defined on the manifold, where the current state depends on the previous state and control inputs’ projection on the homeomorphic tangent space of the manifold (Kalabić et al., 2017; Lu et al., 2023). As the robot’s configuration space definition shapes the system model (12), we propose a system model incorporating terrain manifold curvature and robot mechanism to keep the ATR adhered to the ground.
The ground point clouds in the local submap are utilized for terrain manifold abstraction, and the surfaces are fitted near each path waypoint, with a quadratic form of
We emphasize that the safe path tracking of the ATR’s centroid is the primary focus for maneuvering over rugged terrain, with the flipper state adjusted through adaptive coordination control to regulate the ATR’s configuration. In the proposed method, the desired flipper state depends not only on the characteristics of the terrain manifold but also on the ATR’s motion. Conversely, the alignment between the ATR’s configuration and the terrain manifold should influence the ATR’s motion to prevent reckless maneuvers in unstable configurations.
As illustrated in Figure 10, the geometric relationship shows that the arc fitted to the ATR’s configuration when maneuvering over rugged terrain should continuously align with the ground’s normal curvature to ensure the ATR adapts effectively to terrain variations. To ensure the alignment, the joint angles of the front and rear flippers in the chassis frame should be ϕ
i
= 1/2α ≈ 1/2KS. The robot’s total span length S is approximately equal to the arc enveloped by the ATR’s configuration, and K is the normal curvature of the manifold at the waypoint. According to equations (14) and (15) and the invariance theorem of surfaces, the normal curvature of the surface near the ATR depends solely on its centroid’s pose. The state variables for the flippers are defined as the differences between the actual and desired flipper joint angles in the chassis frame ϵ
i
, i ∈ {f, r}, establishing the relationship between the ATR’s centroid pose and the desired configuration for coordinated control. The differences are intended to be maintained at zero in the form of: The geometrical relationship between the ATR configuration and the terrain manifold normal curvature.
Since the robot is constrained to the terrain surface, not only does its centroid position adhere to the terrain manifold, but the rotational axis of its body frame also aligns with the normal vector at that position of the terrain manifold. Thus, the robot’s chassis position and orientation on the terrain manifold can be represented by
In summary, the state variables and control variables of the M2PC are set as follows:
The robot’s chassis motion model can be represented by (Lu et al., 2023):
And the motion relationship between
The state transition function for the M2PC problem (13) is formulated as follows:
The QP problem (13) is addressed using the OSQP (Stellato et al., 2020) open-source solver. In this problem, the linear constraint matrix is derived from the differential composition of the state transition function (21). Specifically, the expressions for the partial derivatives ∂
Simulation and experiment
Preparation
We validated and quantified the framework’s effectiveness by deploying the algorithms on our self-developed articulated tracked robot, Slugcat: a swift articulated tracked robot for rescue missions. The orange sections represent the coaxial front flippers, the green sections represent the coaxial rear flippers, and the blue sections indicate the LiDARs.
Although the proposed framework is compatible with online point cloud maps constructed by the real-time SLAM method, for rigorous comparative evaluations, we employed prior point cloud maps in all simulations and in several real-world experiments. In addition, we validated the framework’s performance using online point cloud maps generated by the SLAM module in an unknown environment. In the simulations, the robot’s poses were directly provided by Gazebo. In all real-world experiments, the robot’s odometry was provided by a SLAM module equipped with a re-localization function. Specifically, for the discrete terrain information abstraction module deployed throughout the work, the resolution of the 3D terrain-info grid map was uniformly set to 0.5 m (considering the robot’s physical size). The threshold for planar traversability determination, ɛ vP , was set to 45°. Related experimental videos are available in the supplemental materials.
Validation of the hierarchical global planner
As shown in Figures 12(a)–(c), we conducted efficiency evaluations in large-scale 3D simulation environments to validate the effectiveness of the hierarchical global planner, with consistent start and goal positions for every scenario, respectively. In this work, the maximum angular deviation ɛ
vs
between the normal vectors of triangulated planes within a feasible ATR footprint was set to 15° (see Figure 5). The typical path generation results of the hierarchical global planner across various Gazebo-simulated scenarios. In the restricted building scenario shown in (a), there is only one possible path to the rooftop. In the wild scenario shown in (b), the optimal route requires crossing a narrow bridge. In the DARPA cave scenario (c), the start and goal points are separated by a considerable distance.
The six-story building depicted in Figure 12(a) had a floor height of 2.5 m, a floor area of 900 m2 per level, and a total volume of 1.35 × 104 m3. There were continuous staircases at the corners of the building. The start point was located at one corner outside the building, with global coordinates (−16.0 m, 14.0 m, 0.1 m), while the goal point was at the center of the rooftop, with global coordinates (0.0 m, 0.0 m, 15.1 m). The shortest feasible path length was approximately 60 m. Additionally, the possible path solution was restricted by setting only two staircases for each floor in the opposite corners of the building, significantly increasing the length of the traversable path (with the shortest feasible path length of approximately 250 m). The restricted six-story building scenario was also used to evaluate the adaptability of the proposed framework to various sampling-based planning algorithms in long-range planning scenarios.
The simulated wild scenario included ponds, mountainous areas, a narrow bridge, and mine caves. The scene depicted in Figure 12(b) covered an area of approximately 1.0 × 104 m2. The start point was located behind the solar panel, with global coordinates (−19.0 m, 58.0 m, 3.0 m), and the goal point was located in the cave at global coordinates (30.0 m, − 10.0 m, 4.0 m). The shortest feasible path length was approximately 120 m.
The scenario in Figure 12(c) was adapted from DARPA’s publicly available simulation scene, DARPA_cave_02. This scenario consisted of a massive underground cave, with the size of approximately 300 m × 300 m × 70 m, which occupied a volume of 6.3 × 106 m3. The start point was located in the open area outside the cave, with global coordinates (0.0 m, 0.0 m, 0.0 m), and the goal point was located in the mine at global coordinates (200.2 m, − 36.5 m, − 19.8 m), with the shortest feasible path length being approximately 450 m. The large-scale and complex connectivity of the simulation environments presented significant challenges for the planning task. During the experiments, the upper-level planner employed the A* algorithm with a heuristic function based on Euclidean distance. The lower-level planner employed the RRT algorithm.
Quantifications of different global path generation algorithms in the simulations: L p denotes the average generated path length, L p .σ2 denotes the variance of the generated path length, and T p denotes the average planning times (In the DARPA_cave_02 scenario, our method reduced 78.81% of the time cost compared to the SOTA methods).

The typical path generation performance in the wild scenario with the proposed and the existing SOTA methods.
Colas et al. (2013) constructed a topological map for graph-search planning by identifying feasible points and connectivity in raw point clouds. Thus, its performance is highly sensitive to the scale of the environment, and the resulting paths are often rugged due to limited terrain details. Nevertheless, the deterministic nature of graph-search methods results in lower variance in path lengths, providing more consistent planning outcomes. Krüsi et al. (2017)’s work achieved smooth paths through post-processing and optimization; however, the initial solution had a significant impact on path length and trends, leading to greater variance and extended planning times, which would negatively affect execution efficiency. PUTN (Jian et al., 2022) increased the sampling step size in the SBP and used a 2D space sampler to enhance planning efficiency. However, due to the non-uniqueness of restoring 2D poses into 3D space, it cannot effectively handle path planning in multi-floor building structures. Our method (CT-DC) relies on well-organized terrain information, ensuring consistency and completeness from the upper-level planner as well as efficiency and randomness from the lower-level planner. The generated paths exhibit strong homotopy across multiple tests while maintaining superior performance in large-scale and complex environments.
As illustrated in Figure 14, when the lower-level planner sampled all terrain information (AT) within the map, the planning process suffered from inefficiency (planning time up to 29.11 s) and path redundancy. In contrast, the proposed method, which sampled only the terrain information near the crude plan (CT), significantly reduced average planning time to 0.51 s by limiting the randomness of SBP. Moreover, the divide-and-conquer strategy (DC) further accelerated path generation without altering the sampling range, reducing the average planning time to 0.27 s (see Table 2). The typical path generation results in the six-story building with different strategies (from left to right: AT, CT, CT-DC). The average planning time (s) for different deployed sampling-based planning (SBP) methods in the six-story building scenarios (RRT-C refers to RRT-connect).
The choice of the sampling-based planner plays a crucial role in the final results. For instance, the RRT-connect method can significantly speed up path generation, while the RRT* method reduces path redundancy and improves overall path quality (see Figure 15). Although the time required for path generation varies depending on the sampling-based algorithm used, as shown in Table 2, the proposed CT-DC strategy notably accelerates path generation. The contribution of agnostic randomness to path generation: In complex 3D environments where numerous impassable regions are present, a graph-search-based planner may yield suboptimal paths (depicted in green). In contrast, the sampling-based planners leverage randomness, allowing the generated paths (depicted in red) to escape the limitations of inadequate heuristic functions and thus produce more optimal solutions.
A series of ablation studies was conducted across different planning strategies in four real-world environments to evaluate their impacts on the quality of generated paths. We performed 3D reconstructions of an exterior landscape of a library, a fire-fighting rescue stairway, and a litchi forest using a handheld MID-360 LiDAR scanner (see Figures 16(a)–(c)). Additionally, we reconstructed a typical campus scenario at SYSU (see Figure 16(d)) based on the airborne LiDAR data from The typical path generation performance under ablation experiments across four real-world scenarios with distinct challenges. These challenges include: (a) confined passageways near the library, (b) staircases, (c) densely obstructed tree regions, and (d) sparse near-terrain point clouds caused by canopy and building occlusions, all of which significantly hinder ground robot path generation.
In addition to the challenges posed by large-scale environments and complex topological connectivity, each scenario was designed with a specific test focus. The library exterior evaluated planning ability in confined passageways. The rescue stairway scenario assessed planning capability in multi-floor environments with staircases. The litchi forest tested planning robustness in cluttered, unstructured terrain. Finally, the campus scenario examined planning performance in sparse near-terrain point clouds with minimal prior traversability information.
The path quality of different strategies was evaluated across all scenarios. To quantify discrete path smoothness, a specific metric (Smot.) was adapted from the work of Dolgov and Thrun (2009) as:
Quantifications of the global path generation ablation experiments in real-world scenarios: L p denotes the average generated path length, Smot. denotes the average generated path smoothness (a smaller number indicates a smoother path), and T p denotes the average planning time.
Validation of the manifold-optimization-based local planner
The path optimization problems introduced in Section Manifold-optimization-based local planner involved complex nonlinear cost functions. Thus, we employed a graph-based optimization method (Kümmerle et al., 2011) to address problem (3) and problem (11). Apart from the inherent differences in the ground adherence cost, the weights in these two optimization problems were consistent (the weight
The capability comparison between different optimization-based local planners: The marker √ indicates that a safe path was successfully optimized, while the marker ◦ signifies that a perilous path was produced. The marker × denotes that the local planner failed to provide a feasible path.

The typical path or trajectory optimization results in the simulation scenarios. The red trajectories represent the pre-assigned initial paths, while the optimized trajectories of 3D2M are indicated in a gradient rainbow color. The optimized paths of SE(3)Opt and ManiOpt are indicated in blue. The obstacles in the scenarios are colored purple.
In the first two simulated scenarios shown in Figures 17(a) and (b), 3D2M-L’s reliance on highly accurate terrain analysis became a significant bottleneck during optimization, as steep slopes and sharp stair edges were often misclassified as non-traversable regions. Thus, the parameters of 3D2M-L required careful fine-tuning to ensure a successful process. However, when the terrain analysis thresholds were relaxed, 3D2M-L lacked a proper cost formulation to reasonably handle steep inclines and anisotropic traversability, resulting in little practical improvement in optimized trajectories over the initial paths.
In contrast, the 3D terrain-info grid map effectively addressed challenges in complex terrain representation, mitigating the roughness of the point cloud and ensuring better optimization performance. The results of SE(3)Opt in the first two simulations were less pronounced, primarily due to weight imbalances, which hindered the planner’s ability to optimize other objectives while maintaining ground adherence. As depicted in the lower subfigure of Figure 17(a), by introducing slight perturbations to the initial path on a steep slope (45°), the path optimized by ManiOpt is shown to be meandering and safe. In the scenario illustrated in Figure 17(b), the initial path diagonally crossed a wide staircase with anisotropic traversability. To avoid dangerous maneuvers on the stairs, ManiOpt adaptively adjusted the path along the direction of the stairs.
As illustrated in Figure 17(c), the presence of initial global paths that occasionally penetrate obstacles severely disrupted the optimization process. In this scenario, both SE(3)Opt and 3D2M-L failed to produce feasible paths. Although 3D2M demonstrated effective obstacle detection, its soft-constraint design and high sensitivity to initial paths led to solutions that traversed obstacles. SE(3)Opt avoided obstacles to some extent, but due to its soft-constraint formulation of ground adherence, the optimized paths deviated significantly from the actual traversable terrain. In contrast, ManiOpt, with its terrain-manifold-based hard constraints, successfully generated feasible paths that closely adhered to the terrain while avoiding obstacles.
The scene in Figure 17(d) illustrates a typical spiral staircase scenario. The inner region of this structure features steeper slopes and pronounced changes in anisotropic traversability. In this test, the trajectory optimized by 3D2M-L deviated substantially from the spiral staircase, resulting in an entirely infeasible solution. The path optimized by SE(3)Opt is more zigzagged, closer to the central axis of the stairway, and involves perilous pitch angles. In contrast, the path optimized by ManiOpt is smoother, more gradual, and better adheres to the ground. The benefits of ManiOpt also include fewer optimization variables and cost types, leading to shorter computational times compared to SE(3)Opt. In the planning task depicted in Figure 17(d), the optimized path spanned approximately 10 m with around 30 waypoints. The average optimization time of ManiOpt was about 31.83 ms, while SE(3)Opt averaged 41.22 ms, saving 22.76% of the time cost.
Validation of M2PC for ATR
This subsection evaluates the proposed M2PC for path tracking of an ATR in both a Gazebo-simulated scenario and a real-world staircase setup with a slope of 33.7° (see Figure 19). For comparison, we reproduced the control algorithm presented by Gianni et al. (2016), which utilizes feedback-based control to track the robot’s centroid and flippers, with controller gains set to K
p
= 10.0, K
i
= 0.005, and K
d
= 0.1. Moreover, we implemented an integrated planning-control method based on configuration geometry and terrain analysis from Chen et al. (2023)’s work for further comparison (denoted as
In the optimization problem of equation (13) in M2PC for ATR, the state penalty matrices
According to the chassis and flipper size of Slugcat, the span S of the robot was set to 1.3 m. Considering the robot’s physical dimensions, the sensing range of its onboard sensors, and the accuracy of surface fitting, terrain manifolds in M2PC are locally extracted from point clouds around each waypoint. Specifically, only the down-sampled ground points within a 1.3 m radius around each waypoint were used to efficiently extract spatial uniformity and reliable local terrain manifolds for M2PC.
An excessively long horizon for M2PC is neither necessary nor beneficial. On the one hand, too many waypoints dilute the optimization cost near significant terrain transitions, potentially compromising robot stability. On the other hand, the limited update range of the local submap hinders the robot from accurately acquiring terrain information around distant waypoints, which can lead to inaccurate terrain estimation that may interfere with the controller’s system model. Therefore, the prediction horizon of M2PC was limited to 10. The influence factors c v and c ω in the system model (21) were set to 10.0 and 0.0, respectively, to ensure stable maneuvering on stair-like terrain, which has anisotropic traversability.
The same reference path was utilized for consistent simulations. The robot would first climb up the staircase and then descend from the other side. Considering the actuator capabilities and the transmission design of the flipper, the maximum flipper rotation velocities were set as 0.1 rad/s. To evaluate how well the robot aligned with the terrain under different control methods, 100 points on the robot’s base footprint were uniformly sampled, and the average distance between these points and the terrain surface was computed. Furthermore, the robot’s centroid position along the z direction and the variations in pitch angle throughout the maneuver were monitored and recorded.
The quantitative performances of the control algorithms in the simulated scenario featuring severe terrain variations are presented in Figure 18. Gianni’s method adjusts the flipper angles in real-time based on the terrain plane fitted from the point cloud around the robot, aiming to maintain close contact with the ground. As a result, under conditions where the flipper motion speed is limited, Gianni’s method tends to suffer from delayed configuration adaptation, leading to issues such as free-spinning or robot impact against the terrain across various phases. Meanwhile, GEO’s core assumption posits that ATR’s stability is not solely dependent on robot-ground adherence. Considering the reference robot centroid path, GEO would sample flipper configurations near the waypoints and evaluate them based on the robot’s constraints and the dense prior point cloud map. An elaborately defined cost function was then employed to select the optimal flipper configuration, and an integrated controller was utilized for general tracking. Consequently, during the transition from flat ground to stair climbing, although the robot body does not closely adhere to the terrain, the ascent in the z direction and pitch angle variation remains relatively smooth (see Figure 18(a)). However, in the transition from stairs to elevated platforms and throughout all descent phases, GEO fails to adapt the configuration in response to rapid terrain changes, resulting in abrupt changes in the robot’s posture. Moreover, during the descent from stairs to flat ground, GEO is prone to misinterpret terrain configurations, leading to flipper misoperations (see Figure 18(d)). The typical performance of ATR in crossing stairs within the simulation scenarios: In the orange box, the proposed M2PC method significantly smoothed the robot’s maneuvers through severe terrain changes compared to baseline methods. The malfunction effects of GEO are shown in the red boxes. The ATR maneuvering in the Gazebo-simulated scenario (a) and the lab-built staircase (b).

Compared to the baseline methods, M2PC enabled smoother maneuvering of the robot, more continuous and safer adjustments of the body pitch angle, and better contact performance between the robot and the ground. By incorporating information about surface curvature changes, M2PC allowed the robot to achieve better whole-body control. Especially when the flipper angle has not yet reached the desired position, the chassis actively slows down, achieving coordinated stair climbing. This whole-body movement mitigates impacts in the z direction and suppresses rapid changes in pitch angle. Additionally, when descending, the robot adjusted the joint configuration and reduced body oscillations.
Figure 20 illustrates the robot’s performances on a lab-built staircase using various control strategies. During the ascent phase, Gianni’s feedback control method lacked coordination, resulting in the robot’s inability to adjust its configuration in response to terrain changes and leading to impacts with the stairs. When the robot moved onto the landing, this strategy also failed to coordinate the flippers’ posture, causing the robot to experience a significant impact of up to 8G. This issue also occurred during the descent phase, where shaking in the z direction further confirms the problem. GEO exhibited similar performance, with the chassis experiencing significant vibrations throughout the stair-crossing. The typical acceleration and angular velocity profiles of the ATR on the lab-built staircase.
In contrast, M2PC effectively satisfied both terrain manifold constraints and robot kinematic limitations, thereby mitigating the aforementioned issues. During the stair-crossing, M2PC adjusted the robot’s configuration to ensure a smoother transition between terrains. Compared to the baseline methods, M2PC’s maneuvering strategy was more conservative, safer, and prevented severe impacts.
Validation of the framework in simulated scenarios
We comprehensively validated the proposed framework in two challenging simulation environments. The wild simulation scenario (shown in Figure 12(b)) features a lake, a bridge, caves, and significant ground undulation (an elevation difference of approximately 10 m). The robot has to cross the lake to reach the goal point in the inner cave. The building simulation scenario (the simple version of the six-story building, as shown in Figure 12(a)) features extensive and complex connectivity, requiring the robot to traverse multiple stairways to reach the goal point that is elevationally distant from the start point. Throughout the simulations, the robot’s pose, flipper joint states, global point cloud maps of the environment, and the local submap surrounding the robot were directly obtained from Gazebo.
Quantifications of the exhaustive ablation simulations in the Gazebo-simulated scenarios: Succ. denotes the success rate, dmin denotes the minimal Euclidean distance from the robot to the navigation goal, and Asc. denotes the maximum ascent in the building, L e denotes the average executed trajectory length, and T e denotes the average execution time.

The best-performing executed trajectories of different combinations in the Gazebo-simulated scenarios.
Analysis of simulation results in the wild
In the wild scenario, the tested combinations exhibited varied performances, illustrating the impact of module-level design on navigation quality. The simulation results (see Table 5 and Figure 21(a)) indicate that five combinations completed the navigation task at least once. For combinations A and B, the global paths generated by the graph-search-based planner during the narrow bridge were almost entirely located within infeasible, obstacle-occupied regions, directly leading to inevitable collisions for combination B at the bridge. Although with the support of a relatively robust ManiOpt local planner, combination A may successfully traverse the narrow bridge and complete the navigation task, its overall success rate was relatively low, mainly restricted by the extreme proximity of the initial global paths to obstacles.
Combination C completed two navigation trials. However, due to significant randomness in its global path generation process, the executed trajectories were highly redundant, substantially prolonging task completion time and indirectly increasing the failure risk, thus lowering the overall success rate. Combination D consistently failed to complete navigation before the bridge due to the absence of local obstacle avoidance capabilities. Combination E employed the 3D2M framework (excluding its original controller). However, 3D2M’s A* global planner consistently failed due to the limitations of its terrain analysis module, rendering the navigation task impossible. Combination F, which used the 3D2M local planner (3D2M-L), showed moderate performance in rugged terrain with its soft-constraint trajectory optimization method. However, it failed to handle narrow terrains, causing the robot to stall before the bridge due to unsuccessful path optimization. Combination G completely disabled local path optimization; since the CT-DC global planner still had some randomness, the robot failed to avoid the guardrails on the bridge, ultimately leading to navigation failure.
In the wild scenario, terrain variations were relatively mild, and the rigid simulated ground provided generous friction allowances. Although Gianni’s feedback controller exhibited some slippage during navigation, its maneuvering capability was comparable to that of M2PC. Due to the conservative nature of M2PC, the proposed framework J exhibited slightly longer execution times but shorter execution trajectories compared to combination H (which used Gianni’s controller). Combination I further evaluated the performance of GEO in standard online simulations. GEO required path interpolation to smooth the flipper motions, and it took seconds to analyze the robot’s configurations, leading to poor real-time control performance. It often failed to respond promptly to the current reference path, frequently exhibiting significant delays and overshoots during navigation. Overall, the proposed framework J demonstrated the best comprehensive performance, achieving the highest success rate and the shortest actual execution trajectories.
Analysis of simulation results in the building
Significant height differences, consecutive stairway turns, and complex connectivity in the six-story building posed even greater challenges for ATR navigation. The simulation results (see Table 5 and Figure 21(b)) indicate that four combinations completed the navigation task at least once. The graph-search-based global planners used in combinations A, B, and E often generated paths dangerously close to stair edges or walls, increasing the risk of optimization or execution failure. Intuitively, combination C should outperform combination D; however, simulation results showed that combination C consistently failed in all 10 trials. Combination D succeeded once, but the randomness in its global path generation led to highly redundant paths with frequent detours, significantly increasing task duration.
In combination F, the 3D2M terrain analysis module failed to correctly identify obstacles at stairway corners, causing the robot to fall or get stuck. Combination G failed all 10 trials due to unrecoverable collisions. Although the CT-DC global planner considered vehicle stability on the ground and the controller generally tracked the path well, the absence of local path optimization led to occasional collisions at the stairway turns.
In Gazebo, the relaxed friction and collision criteria allowed Gianni’s method to complete the navigation task despite experiencing slippage and impacts on the stairs. Compared to combination H, which deployed Gianni’s controller, the proposed framework J achieved slightly longer execution times but shorter execution paths due to M2PC’s more conservative and safer control strategies. Combination I (with GEO) also failed to achieve smooth online navigation, frequently experiencing stop-and-go behavior during stair climbing, significantly prolonging task duration. Among all combinations that completed navigation in the building scenario, the proposed framework J demonstrated superior performance in both success rate and execution efficiency.
Discussion on simulation
As illustrated in Figure 21, a high-quality global path generated by CT-DC provides a favorable initialization for local optimization, reduces the robot’s execution distance and time, and avoids unnecessary detours. Meanwhile, the comparison of success rates among different combinations further highlights the importance of local path refinement. In both scenarios, ManiOpt serves multiple purposes, such as enabling obstacle avoidance on the narrow bridge, smoothing the path, and satisfying the anisotropic traversability constraints on staircases. Disabling local optimization or initializing it with a poor path often leads to task failure. Moreover, locally optimized paths that satisfy multiple objectives can help prevent the controller from being misled by obstacles and reduce the difficulty of control over terrain with complex traversability. For instance, if the local point cloud map for quadratic terrain analysis includes structures such as railings or walls, terrain-aware controllers may misinterpret these as drivable surfaces, resulting in control failures. Similarly, directly following overly coarse initial paths (such as winding paths over stairs) may trigger excessive motion responses from the controller. Finally, safe maneuvering of the ATR over complex and rugged terrain requires not only a controller (like M2PC) capable of generating coordinated whole-body motions under both terrain and robot-specific constraints but also strong real-time performance at both planning and control levels to avoid overshoot caused by delayed responses and redundant motions arising from stochastic variations of the planned path.
Validation of the framework in real-world scenarios
We selected several real-world scenarios characterized by considerable challenges, to systematically evaluate the proposed framework alongside baseline methods. Each experiment presented in Architecture complex, Challenging uneven terrains, Debris and Interlaced stairways was conducted using prior point cloud maps with an identical size and a uniform resolution of 0.1 m. The experiment in Unknown construction site, however, utilized online point cloud maps generated by the SLAM module. Throughout all real-world experiments, the robot’s localization was provided by a modified version of FAST-LIO2 (Xu et al., 2022), which includes a re-localization function. The joint states of the flippers and the local submaps were continuously updated based on data collected from the onboard sensors.
Architecture complex
We conducted a series of experiments within an architecture complex (see Figure 22). The primary objective of this experimental setup was to evaluate the practical effectiveness of various global path planning algorithms (including CT-DC, Colas’ method, Krüsi’s method, and parts of the 3D2M planning pipeline) in large-scale, complex-connected environments with varying heights of traversable ground. In these experiments, M2PC is consistently used as the tracking controller. Except for the comparative combination involving the 3D2M framework, all experimental combinations employ the ManiOpt local path optimizer. The start point for all navigation tasks are located near (12.0 m, 3.0 m, 0.2 m) in the global coordinate system, with goal point near (96.3 m, − 45.1 m, 5.9 m). After conducting five trials for each combination, the best-performing executed trajectories were visualized in Figure 23. The corresponding key quantitative metrics are summarized in Table 6. A typical architecture complex on a campus: (a) the reconstructed point cloud map (including the first three floors), (b) the general CG rendering, and (c) a corner of the scenario. The best-performing executed trajectories of different combinations in the architecture complex. The deterministic nature of CT-DC and Colas’ methods allowed the robot to efficiently navigate through stairways within the building and reach the goal via significantly shorter trajectories. In contrast, the stochastic Krüsi’s method resulted in detoured trajectories. Quantifications of the experiments in the architecture complex: Succ. denotes the success rate, L
e
denotes the average executed trajectory length, T
p
denotes the average global planning time at the start point, and T
e
denotes the average execution time.

Except for one failure caused by crowd interference, the proposed framework consistently outperformed the comparative combinations in terms of success rate, average executed trajectory length, global planning time at the start point, and overall navigation duration. Colas’ method also achieved relatively efficient execution trajectories, but its planning time at the start point was longer due to the large-scale environment. Moreover, local path optimization occasionally failed in certain areas (like long corridors) due to poor initial paths from Colas’ method, resulting in collisions with obstacles and ultimately compromising the overall navigation success rate.
Krüsi’s method, affected by the superfluous traversable areas in the environment, produced highly stochastic global paths. This randomness resulted in frequent changes in heading direction (failed four times on the staircases and one success with an offline-generated path), substantially increasing the executed trajectory length, planning time, and overall execution duration. The terrain analysis module in 3D2M imposed heavy computational demands. At practical resolutions (below 0.5 m), it failed to complete terrain processing on onboard computers. At coarser resolutions (around 1.0 m), the algorithm produced inaccurate traversability estimates, resulting in global planning failures.
Challenging uneven terrains
We conducted a series of experiments on a steep grassy slope and a set of multi-segment staircases to further evaluate the effectiveness of different local path or trajectory optimization algorithms in real-world environments. The grassy slope, shown in Figure 24(a), featured soft, wet terrain with a high risk of slippage, posing significant challenges to ATR navigation. This experimental setup aimed to assess how well different local optimization algorithms could refine aggressive initial paths in steep environments. The multi-segment staircases, shown in Figure 24(b), were used to test each algorithm’s ability to optimize initially direct, non-adaptive paths in terrains with pronounced anisotropic traversability. Two types of terrains with distinct challenges: (a) grassland with steep slope and (b) multi-segment wide staircases.
In these experiments, M2PC was consistently employed as the tracking controller. The initial paths were unified, adhered to the ground, and designated manually to ensure experimental consistency. Each algorithm was tested five times on both environments, and the best-performing executed trajectory for each was selected for visual presentation (see Figure 25). The corresponding key quantitative metrics for each method are summarized in Table 7. The best-performing executed trajectories of different combinations on the two challenging uneven terrains. Quantifications of the experiments on the challenging uneven terrains: Succ. denotes the success rate, dmin denotes the minimal Euclidean distance from the robot to the navigation goal, amax denotes the maximum linear acceleration norm during the best-performing navigation, and Mal. denotes the average robot’s malfunction index (a smaller number indicates better control performance).
To evaluate the execution performance of the robot, we defined a malfunction metric (Mal.) based on the deviation between the desired commands and actual velocity over a certain time horizon. Let
This metric serves as an indicator of how accurately the robot follows the intended control commands. A lower Mal. value reflects better control consistency.
On the grassland, the initial path descended rapidly, which posed stability challenges. The paths optimized by ManiOpt on the grassy slope effectively circumvented steep slopes and significantly reduced the pitch stability cost
On the staircases, ManiOpt employed an adaptive optimization objective tailored to the specific terrain, ensuring safe traversal on the anisotropically traversable stairs. It also enabled coordinated adjustment on flat, isotropically traversable grounds, helping to shorten the overall path length. Throughout all test runs, ManiOpt consistently avoided path-terrain detachment, demonstrating advantages in mitigating potential controller malfunctions and failures. These results strongly validate the effectiveness of the manifold optimization approach. In contrast, the 3D2M-L optimizer failed to produce a trajectory that maintains good ground adherence or accounts for anisotropic traversability, resulting in lateral slips and partial loss of control on the second-segment staircase. In experiments conducted without optimization, the robot was unable to perform adequate obstacle avoidance, resulting in perilous collisions.
Debris
We conducted a series of experiments at a post-demolition site to evaluate the performance of different ATR whole-body control methods in real-world scenarios, as shown in Figure 26. The experiment area covered approximately 400 m2 and presented significant challenges, including unstable rubble piles, soft soil slopes, and numerous hard protrusions, realistically simulating a post-disaster environment. Additionally, we compared the performance of the 3D2M framework when integrated with M2PC. Each algorithm combination was tested five times, with a fixed start and goal point located near (0.0 m, 0.0 m, 0.1 m) and (16.5 m, 0.0 m, 3.4 m), respectively. The best-performing executed trajectories were selected for visual presentation (see Figure 27). The corresponding key quantitative metrics for each combination are summarized in Table 8. The post-demolition debris: (a) the reconstructed point cloud map and (b) the front side of the debris. The best-performing executed trajectories of different combinations on the debris. Quantifications of the experiments on the debris: Succ. denotes the success rate, dmin denotes the minimal Euclidean distance from the robot to the navigation goal, amax denotes the maximum linear acceleration norm during the best-performing navigation, T
e
denotes the average execution time, L
e
denotes the average executed trajectory length, Smot. denotes the average executed path smoothness, and Mal. denotes the average robot’s malfunction index.

In the experiments, the global path generated by 3D2M was relatively straight and consistent. However, its local optimization was heavily influenced by terrain roughness analysis, resulting in a highly tortuous path. This path caused the robot to perform excessive twists over the rubble, leading to entrapment on small, rigid protrusions (see Figure 27). The combination using Gianni’s controller frequently experienced poor contact between the robot’s body and the soft soil slopes, which often resulted in track subsidence. Moreover, when encountering minor rigid obstacles on the ground, Gianni’s controller also tends to get stuck. Furthermore, GEO failed in all five trials due to the significant computation failure of flipper configurations in rough terrain.
In contrast, the proposed framework is the only solution capable of completing the rubble terrain navigation task. Although two failures occurred due to gravel slides and rebar obstructions, the overall performance remained robust. With the integration of the extracted quadratic-form terrain information and the robot’s inherent constraints, M2PC demonstrated safer control behaviors during execution, effectively reducing the risks of the robot becoming trapped in soft soil or immobilized by rubble.
Interlaced stairways
We conducted a series of experiments on a fire rescue stairway and an irregular spiral stairway (see Figure 28), to evaluate the capabilities of different ablation combinations in multi-floor scenarios. Both scenarios consisted of multiple staircases and platforms, with a maximum traversable elevation of approximately 6 to 7 m. The horizontal overlap of the stair structures and the complex traversability conditions provided an excellent setting for assessing the performance of each algorithm combination. Two stairways with multi-floor structure: (a) rescue stairway and (b) spiral stairway.
Due to the constrained connectivity of the experiment environments, Krüsi’s method and the CT exhibited similar path generation patterns; thus, further comparison with Krüsi’s method was not conducted. In each experiment, the robot started from a low-elevation start point, climbed the stairs to reach goal point 1 on a high platform, and then descended to return to goal point 2 near the original start point. For the rescue stairway scenario, the start point and goal point 2 were located near (0.0 m, 0.0 m, 0.1 m), while goal point 1 was placed at (5.6 m, 1.9 m, 6.4 m), as the colored circle illustrated in Figure 29. For the spiral stairway scenario, the start point and goal point 2 were located near (0.0 m, 0.0 m, 0.1 m), with goal point 1 set at (11.0 m, 4.5 m, 7.1 m), as the colored circle illustrated in Figure 30. Each algorithm combination was tested five times, and the best-performing executed trajectories were selected for visual presentation (see Figures 29 and 30). The corresponding key quantitative metrics for each combination are summarized in Table 9. The best-performing executed trajectories of different combinations on the rescue stairway. The best-performing executed trajectories of different combinations on the spiral stairway. Quantifications of the ablation experiments on the stairways: Succ. denotes the success rate, Asc. denotes the maximum ascent on the staircases, amax denotes the maximum linear acceleration norm during the best-performing navigation, and Smot. denotes the average executed path smoothness.

During the rescue stairway experiments, only combinations A, G, H, and J completed the navigation task at least once. Our proposed framework J achieved the highest success rate, exhibiting the smoothest executed trajectory and robustness during rapid terrain shifts. For combinations A and B, the graph-search-based Colas’ method generated extreme paths that were close to the guardrails. Thus, in combination B, collisions were inevitable. However, in combination A, ManiOpt was able to partially mitigate this issue (with a 20% success rate).
For combination E and F, 3D2M’s terrain analysis struggled in environments containing irregular under-surfaces, such as beams on the ceiling, mistakenly classifying them as obstacles. This misinterpretation led to failures in both global path planning and local path optimization. Even initialized with a reasonable global path, the optimized trajectory in combination F tended to deviate from the ground. In combination G, without local optimization, CT-DC alone produced coarse and stochastic global paths, resulting in unsafe slippage on anisotropically traversable staircases.
Combination H, employing Gianni’s controller, exhibited inadequate response, causing robot-ground impacts at every first and every final stage of staircase climbing. The two failures of combination H were attributed to the onboard computer crash due to these impacts. Meanwhile, the combination I with GEO required excessively high resolutions in both local path and map representations. In practical online navigation, GEO consistently failed to process valid flipper configuration analysis, resulting in flipper misbehavior and subsequent entrapment of the robot on stair edges, accompanied by high-frequency oscillations of the body.
The spiral stairway provides slightly more navigable space compared to the tightly confined scissor-style rescue stairway. However, it introduces significant challenges due to steep tangential inclinations on the inner arc. In the experiments, four combinations completed at least one navigation test. The proposed framework J again demonstrated the best overall performance. Although the global paths generated by Colas’ method in combinations A and B were feasible, they remained risky. Successful navigation was only possible when coupled with ManiOpt to refine the path and avoid collisions.
Due to the refractive properties of glass, parts of the spiral stairway’s glass guardrails were missing from the point cloud map. 3D2M failed to detect the hazardous obstructions along both sides of the ground-adjacent staircases, leading its global planner to erroneously produce paths that passed through the missing glass walls, resulting in navigation failure. This failure in obstacle detection also impaired its local trajectory optimization. Even when using global paths generated by CT-DC, 3D2M-L’s local optimizer frequently produced trajectories too close to the handrails, which again led to several unsuccessful navigation attempts.
Combination H, which utilized Gianni’s controller, also completed the task. However, as the robot tended to follow paths along the outer arc of the stairway, frequent critical impacts (even causing SLAM drift, see Figure 30(b)) between the robot chassis and the staircase were observed on the larger inter-step distance staircases. That is because this type of control strategy is incapable of accommodating long-term processing and lacks the responsiveness required to adapt to rapidly changing terrains. Combination I with GEO exhibited similar issues to those in the rescue stairway scenario, struggling with the configuration analysis of flippers and failing to maintain smooth and stable motion.
Unknown construction site
We conducted a real-world demonstration at an abandoned construction site (see Figure 31), to validate the framework’s ability to navigate using real-time online point clouds generated by the SLAM module in an unknown environment. The site, approximately 1000 m2 in size, featured various obstacles, construction debris, and large traversable areas connected by staircases. The abandoned construction site: (a) the online point cloud map from the SLAM module and (b) a corner of the abandoned site.
During the test, the robot explored the entire area by following goal points manually assigned in real time. The terrains encountered by the robot were representative of typical rescue environments. The total duration of the demonstration was approximately 10 minutes, and the robot’s executed trajectory is illustrated in Figure 32. The executed trajectory of the proposed framework in an unknown complex scenario.
The results show that the proposed framework demonstrated strong stability during online planning tasks in unknown environments and adapted well to real-time point cloud maps provided by the SLAM module. Additionally, the framework exhibited high efficiency in both short-range and long-range planning and maintained robust whole-body control throughout the navigation process, successfully traversing various challenging terrains to complete the mission.
Discussion on real-world experiments
Through extensive multi-scenario validations, we systematically demonstrate the synergistic effectiveness and interdependence of the modules within the proposed framework. The global path generation (CT-DC), the local path optimization (ManiOpt), and the holistic control (M2PC) cooperate efficiently across diverse and challenging environments, where the absence of any single module leads to a significant degradation in overall framework performance. Furthermore, through independent evaluations under unified conditions, we quantitatively verified the rationality of global path generation, the stability of local path optimization, and the robustness of the controller, clearly illustrating the critical contributions of each subsystem to the overall navigation performance. The adaptability of the framework to auxiliary modules (such as the SLAM module) and varying map sources further confirmed the effectiveness of the modular decoupling design and the necessity of systematic integration.
Conclusion
This paper presents a novel navigation framework for articulated tracked robots (ATRs) operating in rescue scenarios, leveraging terrain information throughout the planning and control processes. The framework consists of three tightly integrated components: a hierarchical global planner for efficient path generation in large-scale, unstructured environments; a manifold-optimization-based local planner that refines the global path according to kinematic and scenario-specific objectives; and an on-manifold model predictive controller that enables robust centroid path tracking through coordinated whole-body motion on the combined manifold spaces. Together, these components form a cohesive system that addresses the core challenges of ATR navigation across diverse and complex environments.
Extensive simulations and real-world experiments across complex scenarios demonstrate that the proposed methods outperform state-of-the-art methods. In general, the proposed framework consciously separates the notions of completeness and optimality across different levels of planning and control. The hierarchical global planner (CT-DC) emphasizes completeness, ensuring the ability to generate feasible paths in large-scale and complex environments in time, even if the resulting path is suboptimal. This guarantees robust reachability and serves as a reliable initialization for downstream modules, while the downstream local planner and controller can effectively correct suboptimality. In contrast, the manifold-optimization-based local planner (ManiOpt) focuses on improving the quality of the initial path by formulating an optimization problem that integrates multiple objectives. The refined reference path provided to the controller satisfies obstacle avoidance requirements, the robot’s kinematic constraints, and various terrain-specific demands. Furthermore, by relying solely on the robot’s centroid reference path and its perception of the terrain within the horizon, the proposed M2PC can achieve whole-body coordinated motion in complex terrains. This design allows the framework to avoid high-dimensional state searches and feasibility checks in the global planning stage and significantly reduces the need for flipper-specific objectives in the local planning stage. In this way, the framework effectively balances completeness and optimality across different levels while carefully designed modularization reduces computational load and system fragility.
Although effective, the proposed framework could benefit from learning-based enhancements of terrain models. In future work, we plan to explore more efficient map representations, such as implicit neural networks, and further extend the framework toward data-driven end-to-end navigation across diverse robot platforms, potentially enabling multi-robot collaborative navigation.
Supplemental Material
Supplemental Material - TiFA: A terrain-informed navigation framework for articulated tracked robots in rescue missions
Supplemental Material for Trust in TiFA: A terrain-informed navigation framework for articulated tracked robots in rescue missions by Yifei Wang, Weifan Zhang, Yuxiang Li, Jiancheng Wang, and Haoyao Chen in The International Journal of Robotics Research.
Footnotes
Acknowledgments
Our sincere appreciation to Mr. Lingxu Chen, Mr. Jidong Huang, Dr. Zhihao Wang, Dr. Yu Wang, and any other personnel who assisted in the experiments carried out in this paper.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China under Grant No. U21A20119 and 62573156, the Shenzhen Science and Technology Program under Grant No. RCJC20231211090050082, JCYJ20241202123714019, and SZXJP20230703093206015, and Guangdong S&T programme under Grant No. 2023B1212010005.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
The videos that demonstrate the results of our simulations and real-world experiments can be found in the supplementary materials.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
