An efficient beam search algorithm for active perception in mobile robotics

Abstract

Active perception is a fundamental problem in autonomous robotics in which the robot must decide where to move and what to sense in order to obtain the most informative observations for accomplishing its mission. Existing approaches either solve a computationally expensive traveling salesman problem over heuristically selected informative nodes, or adopt a more efficient but overly constrained shortest path tree formulation. To address these limitations, we explore beam search algorithms as scalable alternatives. While the standard beam search provides scalability by preserving the top-B paths at each depth level, it is prone to local optima and exhibits parameter sensitivity. Our first contribution is a node-wise beam search (NBS) algorithm, which maintains top-B candidates per node to enable more effective exploration of the solution space. Systematic benchmarking on graphs shows that NBS consistently outperforms other baselines and maintains strong performance even at low beam widths. As a second contribution, we integrate the concept of frontiers into the path selection criterion, introducing the expected gain metric, which better balances exploration and exploitation compared to existing alternatives. Our third contribution proposes the rapidly-exploring random annulus graph (RRAG), a novel graph construction method that preserves full orientation sampling and ensures connectivity in cluttered environments through a fallback local sampling-based planner. Extensive experiments demonstrate that NBS combined with RRAG achieves the highest performance across all three representative active perception tasks, outperforming state-of-the-art algorithms by at least 20% in one or more tasks. We further validate the approach on real robotic platforms in different scenarios. Project page: https://efficient-beam-search.github.io/

Keywords

active perception beam search autonomous mobile robots autonomous exploration path planning view planning

1. Introduction

Autonomous robots are increasingly deployed for tasks requiring intelligent information gathering, such as environment monitoring (Cao et al., 2013; Ghaffari Jadidi et al., 2019; Popović et al., 2020; Singh et al., 2009; Yu et al., 2016), autonomous exploration (Bircher et al., 2016b; Cao et al., 2021; Meng et al., 2017; Zhou et al., 2021), and surface reconstruction (Bircher et al., 2015; Kompis et al., 2021; Schmid et al., 2020; Yoder and Scherer, 2016). Unlike conventional shortest path planning, which aims to minimize traversal cost to reach a goal, these applications pose a more complex challenge: how to enable robots to autonomously plan sensor trajectories that maximize accumulated information under resource constraints such as time or energy.

Furthermore, robots often operate under partial observability, with neither the environment nor the information distribution known a priori. This motivates active perception scenarios, where the robot must simultaneously explore the environment and plan informative trajectories in the free space. Sampling-based planners such as rapidly-exploring random tree (RRT) and RRT* (Karaman and Frazzoli, 2010; Kuffner and LaValle, 2000) have been adapted to this setting (Bircher et al., 2016b; Moon et al., 2022). However, tree-based structures have inherent limitations: each node stores only one incoming path, and extensive rewiring or pruning is required as the robot moves, often leading to degraded solution quality or high computational overhead.

Graph-based representations offer greater flexibility through multi-query planning capabilities. Since planning the most informative path on a graph is NP-hard (Ahuja et al., 1983), prior work has employed simplifying heuristics, such as restricting the search to a shortest path tree (SPT) (Dang et al., 2020; Xu et al., 2021). While computationally efficient, this severely limits the search space. Another common approach decouples node selection from routing: informative nodes are selected or sampled heuristically and then connected via shortest paths using a traveling salesman problem (TSP) solver (Arora and Scherer, 2017; Meng et al., 2017; Zhou et al., 2021). However, this separation often leads to suboptimal paths and high computational cost.

In this work, we adopt the term informative path planning (IPP) to refer broadly to graph-based path planning problems that seek a path maximizing a given gain metric subject to a cost budget. Importantly, we do not address information-theoretic objectives, such as those in Bai et al. (2016), Chen et al. (2024a), Lim et al. (2016), and Singh et al. (2009), but instead treat IPP as a path optimization problem defined over a graph.

Given the combinatorial nature of IPP, we turn to beam search—a well-established algorithm in sequential decision-making (Meister et al., 2020; Rush et al., 2013; Vijayakumar et al., 2018). It strikes a balance between the efficiency of greedy search and the optimality of exact inference by iteratively pruning the search space to focus only on the most promising partial solutions. We first explore the standard variant, termed depth-wise beam search (DBS), which maintains the top B paths at each search depth. Although more flexible than SPT- or TSP-based methods, this approach remains sensitive to parameter choices and prone to local optima. To address these limitations, we propose node-wise beam search (NBS), which maintains the top B paths per node. This design enhances robustness to parameter variation and enables more effective exploration of the search space.

We first benchmark path planning algorithms in two abstract graph settings: planning informative paths on a fully known graph and on a graph incrementally discovered from sensor data. Most algorithms require a path selection criterion, that is, a metric that generalizes across both settings for ranking candidate paths. The two most commonly used metrics are path gain and path ratio (gain-to-cost ratio). To better balance exploration of new information against exploitation of known gains, we introduce the expected gain metric, which incorporates frontier (Yamauchi, 1997) information to estimate potential future gains. Through extensive evaluations, we show that NBS consistently outperforms TSP, SPT, and DBS on graphs. Our proposed expected gain criterion achieves a superior balance between exploration and exploitation compared to existing metrics. We also observe that frequent replanning further improves performance, especially in online perception settings.

Transitioning from graph environments to active perception requires online and incremental graph construction. To this end, we propose rapidly-exploring random annulus graph (RRAG), an incrementally built graph structure inspired by rapidly-exploring random graph (RRG) (Karaman and Frazzoli, 2011). The name annulus graph reflects its spacing rules: nodes are accepted only if sufficiently distant from existing ones, and edges are formed only when nodes are within a suitable proximity, preventing excessive node density and edge connectivity. Unlike previous incremental graph construction methods that retain a single orientation per position (Schmid et al., 2020; Xu et al., 2021), RRAG preserves multiple orientations, enhancing directional coverage. Moreover, while most approaches assume straight-line edges—which often fail in narrow or cluttered environments—RRAG introduces a fallback local sampling-based planner to ensure connectivity even in highly constrained spaces.

We validate our approach through comprehensive simulations on three representative active perception tasks: point collection, surface reconstruction, and volumetric exploration, across eight scenarios from the Habitat Synthetic Scenes Dataset (HSSD) (Khanna et al., 2024). NBS combined with RRAG achieves the highest performance in all three tasks, outperforming state-of-the-art methods by at least 20% in one or more tasks. These results, complemented by real-world robotic experiments (see Figure 1 for an example), demonstrate the effectiveness and generalizability of our approach. Alongside this work, we release OpenApero (Open Active Perception Library for Robotics)¹, an open-source library implementing the algorithms presented in this paper and providing a modular framework for active perception research.

Figure 1.

Surface reconstruction in hardware experiments. The figure illustrates the robot’s navigation sequence alongside the corresponding reconstructed scene. The completed trajectory is visualized with a color gradient from dark blue to bright yellow, while the currently planned path is shown as a sequence of light green arrows. The camera’s view frustum is depicted, and yellow voxels represent surface frontiers visible within it. Each node is represented by a color-coded arrow: low-gain nodes are blue, and high-gain nodes are red. The bottom left panel illustrates the planning pipeline: the planner identifies the optimal path using NBS and, upon reaching the next node, updates and expands the graph with new information before computing the next-best path.

We summarize our main contributions as follows:

(1) We propose NBS, an efficient node-wise beam search algorithm that maintains a bounded set of top candidate paths per node. Extensive experiments demonstrate that NBS is robust to parameter choices and consistently outperforms state-of-the-art approaches in both graph environments and active perception tasks.

(2) We integrate frontier information into the path selection criterion by introducing the expected gain metric, which balances exploration of unknown space and exploitation of gains in free space. Empirical results show that this formulation surpasses other alternatives, such as path ratio, across a variety of graph environments.

(3) We introduce RRAG, a novel incremental graph construction method for active perception. RRAG preserves multiple orientations at each position, enabling richer path optimization. It also incorporates a fallback local sampling-based planner to ensure graph connectivity in cluttered environments.

This paper is structured to progressively build both intuition and formal understanding of the proposed approach. Section 2 reviews relevant prior work in active perception, autonomous exploration, informative path planning, and sampling-based planning. Section 3 formulates the active perception problem across three increasingly realistic settings: planning on a fully known graph, planning on an incrementally discovered graph, and active perception with incremental environment exploration. Section 4 introduces key components such as path selection criteria and replanning strategies, and discusses properties of optimal paths. Section 5 presents different IPP algorithms, including the proposed NBS algorithm, along with theoretical analysis of their computational complexity. Section 6 evaluates the proposed methods against established baselines in different graph environments. Section 7 details RRAG, our incremental graph construction method that enables the application of IPP algorithms to active perception. Section 8 presents empirical results from extensive simulations and real-world robotic experiments, demonstrating the effectiveness of our approach in active perception tasks. Finally, Section 9 discusses limitations and future work, and Section 10 concludes the manuscript. An overview of the paper’s structure is provided in Figure 2.

Figure 2.

Paper organization overview. Circled numbers indicate the corresponding sections. In the graph a priori and online perception settings, the environment is represented as a graph, either fully known or partially known. In contrast, active perception necessitates graph construction from the collision-free state space. The path selection strategy encompasses both selection criteria and replanning frequencies, which are benchmarked in the graph-based settings. The best-performing combination is subsequently applied to the active perception tasks.

2. Related work

One of the most fundamental active perception tasks is autonomous exploration, in which a robot aims to explore as much of the environment as possible. A seminal approach is frontier-based exploration (Yamauchi, 1997), where the robot repeatedly navigates to the boundaries between known free space and unknown regions to incrementally complete a map. Subsequent work has addressed the challenge of high-speed exploration on multi-rotor platforms (Cieslewski et al., 2017). Building on the frontier concept, recent methods have integrated TSP-style formulations to achieve more globally efficient exploration policies. FUEL (Zhou et al., 2021) incrementally maintains a frontier information structure (FIS) by computing axis-aligned bounding boxes for frontier clusters and then solving an asymmetric TSP to generate a global tour that visits representative viewpoints. RACER (Zhou et al., 2023) extends FUEL to multi-UAV settings and uses coverage paths to guide exploration while avoiding repeated visits to the same regions. Other works, however, argue against casting exploration directly as a TSP problem. FAEL (Huang et al., 2023) introduces heuristic strategies that solve a path optimization problem jointly considering movement distance, information gain, and coverage. Yu et al. (2023) proposed ECHO, an efficient heuristic for next viewpoint determination that selects the next viewpoint via a lightweight cost function accounting for boundary awareness, environmental structure, and motion consistency, achieving both computational efficiency and improved performance.

Recently, many researchers have explored applying IPP formulations to a wide range of active perception problems. The objective of informative path planning is to compute a path that maximizes information gain while satisfying a given cost budget. Classical IPP formulations typically assume a fully known environment represented as a weighted graph, where vertices encode information gains and edges represent traversal costs (Lim et al., 2016; Singh et al., 2009). A major application is environment monitoring, where Gaussian processes are commonly used to model uncertainty and guide planning to reduce terrain uncertainty (Chen et al., 2024a; Ma et al., 2018; Popović et al., 2020; Singh et al., 2009). Yu et al. (2016) employ a mixed-integer quadratic program to solve the planning problem, while Ott et al. (2024) introduce an approximate sequential path optimization method that constructs trajectories segment by segment using dynamic programming. More recently, Jakkala and Akella (2025) propose a fully differentiable IPP formulation that offers significant gains in speed and scalability. Reinforcement learning-based approaches have also attracted increasing attention (Cao et al., 2023; Rückin et al., 2022). Beyond known environments, several works extend IPP to unknown settings for autonomous exploration and active mapping (Asgharivaskasi et al., 2022; Bai et al., 2016; Chen et al., 2024b; Ghaffari Jadidi et al., 2019).

To optimize information gain under a cost budget, one major line of work adopts a TSP-based approach: sampling a set of candidate nodes and then using established TSP solvers to determine an efficient visitation sequence. Arora and Scherer (2017) first applied TSP to generate a route through all sampled nodes and then iteratively evaluated the effect of adding or removing nodes to further improve the solution. This method leverages mature TSP solvers but assumes a predetermined endpoint, limiting its applicability in realistic scenarios. Meng et al. (2017) improved upon this by eliminating the dependency on a fixed endpoint and incorporating a heuristic cost function to accelerate the optimization. Cao et al. (2021) adopted a hierarchical framework, applying TSP both within a local planning horizon to sequence dense viewpoints and at the global level to visit sparse sub-spaces. Similar ideas appear in structural inspection and surface coverage tasks. Bircher et al. (2015, 2016a) proposed generating a minimal set of viewpoints that collectively cover a structure (analogous to the Art Gallery Problem (Lee and Lin, 1986)), then solved the TSP to connect them. Rather than solving the Art Gallery Problem directly, they used multiple iterations of random sampling to refine the solution. The TSP-based paradigm has two key limitations. First, it employs a two-stage optimization: candidate nodes are selected (or sampled) first, and only afterward is a near-optimal tour computed—sequencing and node selection are not jointly optimized under the cost constraint. Second, TSP quickly becomes computationally prohibitive as the number of candidate nodes increases, limiting its scalability.

Another mainstream approach to applying the IPP formulation in active perception is via sampling-based planning. In this paradigm, a tree or graph is incrementally grown in the free space, with the information gain of each sampled node evaluated to ultimately extract the most informative path. Sampling-based planning has become the de facto standard for path planning in high-dimensional configuration spaces. These methods avoid the explicit construction of obstacles, instead relying on collision-checking modules to build a connectivity graph or tree. Kavraki et al. (1996) proposed the probabilistic roadmap (PRM), which constructs a persistent graph of the free space, enabling efficient multi-query planning. For single-query problems, the RRT was developed (LaValle, 1998; LaValle and Kuffner Jr., 2001), which utilizes a Voronoi bias to rapidly expand a tree from the start state toward the goal. Several extensions improve planning efficiency, such as RRT-Connect (Kuffner and LaValle, 2000), which grows two trees simultaneously (from the start and the goal), and expansive space trees (EST) (Hsu et al., 1997), which bias expansion toward sparsely explored regions. To address the suboptimality of these early approaches, Karaman and Frazzoli (2011) introduced the RRG and its tree-based counterpart, RRT*, both of which are asymptotically optimal. RRT* employs a rewiring mechanism to dynamically maintain the lowest-cost parent for each node while preserving a tree structure. RRT^X (Otte and Frazzoli, 2016), another asymptotically optimal algorithm that grows from the goal toward the start, was proposed to enable real-time replanning in dynamic environments. However, these seminal sampling-based planners focus on finding the shortest path to a goal in a fully known environment and do not inherently address mapping or information-gathering objectives.

Recently, RRTs have been adapted to extend traditional motion planning algorithms to active perception tasks. Bircher et al. (2016b) adapted RRTs for exploration by sampling viewpoints to construct a tree and selecting the branch with the highest gain, with a weight affected by the cost, for execution. After each planning step, only the selected immediate branch is retained, and the tree is re-expanded. This approach has been further extended to surface inspection in unknown environments (Bircher et al., 2018). Witting et al. (2018) proposed sampling from pruned nodes to rapidly recover node density. Schmid et al. (2020) enhanced this method by rewiring the non-executed branch to the new root, enabling direct reuse of previously sampled nodes, following the connection strategy in RRT* (Karaman and Frazzoli, 2011). To address large and high-dimensional search spaces, Kompis et al. (2021) and Moon et al. (2022) introduced informed sampling strategies to sample nodes with high expected information gain. Tree-based planners inherently sacrifice optimality due to the limited number of candidate paths and the need for frequent tree updates. Additionally, shortest-path formulations typically exclude repeated nodes, whereas revisiting nodes can be beneficial.

Graph-based algorithms relax the constraint of maintaining a single path to each node by constructing a graph representation of the environment and searching for informative paths within it. Moreover, such approaches are inherently better suited for online replanning, as they do not rely on a rooted structure and thus eliminate the need for tree rewiring. Xu et al. (2021) and Duberg and Jensfelt (2022) propose incrementally constructing a graph during exploration. GBPlanner (Dang et al., 2020) introduced a dual-graph structure for large-scale exploration, where local graphs are built within bounded regions and incrementally merged into a global graph. All these methods reduce the search space by applying shortest-path algorithms (e.g., Dijkstra) over selected nodes to generate shortest-path trees, effectively transforming the problem into a tree search embedded in an underlying graph. Hollinger and Sukhatme (2014) proposed RIG-roadmap and RIG-graph, in which path planning is integrated with graph construction under the assumption of prior knowledge of cost and gain. Although they incorporate a pruning strategy, they do not impose explicit bounds on graph size, which can result in poor scalability in large environments. Additionally, these methods lack fine-grained control over path selection, as graph growth and path planning are tightly coupled—that is, path selection is performed only on the currently constructed subgraph rather than the complete graph. In this work, we focus on algorithms designed for fast online replanning, capable of handling online exploration with partial information while maintaining scalability in large environments.

3. Problem formulation

We consider three progressively more realistic and complex formulations of this problem: (i) planning on a fixed, known graph; (ii) planning with online perception over a partially observed graph; and (iii) active perception scenarios in which the robot autonomously constructs a graph within the discovered free space to achieve its objective. We are interested in finding an algorithm that works well across all three setups. Before delving into these formulations, we introduce the terminology used throughout the discussion.

Let $G = (V, E)$ denote a directed, simple graph, where $V$ is the set of vertices and $E \subseteq V \times V$ is the set of directed edges, with each edge $e_{i, j} = (v_{i}, v_{j}) \in E$ representing a feasible connection from vertex v_i to vertex v_j. Throughout this manuscript, we use the terms vertex and node interchangeably. A path p is defined as a sequence of vertices $(v_{0}, v_{1}, \dots, v_{n})$ such that $(v_{i}, v_{i + 1}) \in E$ for all i. A path is called simple if it visits each vertex at most once, that is, all vertices in the sequence are distinct. Let $P$ denote the set of all paths in $G$ .

Each edge $e_{i, j} = (v_{i}, v_{j}) \in E$ is assigned a positive traversal cost $c (e_{i, j}) > 0$ , representing the cost of moving from vertex v_i to vertex v_j. The cost of a path $p = (v_{0}, v_{1}, \dots, v_{n})$ is defined as the sum of the costs of its constituent edges:

c (p) = \sum_{i = 0}^{n - 1} c (e_{i, i + 1}) .

(1)

Each vertex $v \in V$ is associated with a non-negative information gain $g (v) \geq 0$ . The total gain along a path $p = (v_{0}, v_{1}, \dots, v_{n})$ is defined by a set-based function g_set, which aggregates gains over the set of unique vertices visited:

g (p) = g_{set} ({v_{i} ∣ v_{i} \in p}) .

(2)

By definition, g_set is invariant under repeated visits to the same vertex. With $g : P \to ℝ_{\geq 0}$ assigning path gains and $c : P \to ℝ_{> 0}$ assigning path costs, we extend the graph notation as:

G = (V, E, g, c) .

(3)

An illustrative example is provided in Figure 3(a).

Figure 3.

Examples of IPP graphs. Each node has an associated gain, each edge a cost, and v_s denotes the start node. Revisiting nodes can be advantageous. For example, the optimal solution in (b) requires multiple traversals of v₃.

3.1. Graph a priori formulation

We first consider the case in which the environment is fully known in advance and represented as a directed simple graph as in equation (3). In this formulation, the gain accumulated along a path is computed as the sum of gains from the unique vertices visited:

g (p) = \sum_{v \in {v_{i} ∣ v_{i} \in p}} g (v) .

(4)

Note that this complies with equation (2).

Let $p_{k} = (v_{0}, v_{1}, \dots, v_{k})$ denote the sequence of vertices visited after k steps. Given the fully known graph $G = (V, E, g, c)$ , a starting vertex $v_{s} \in V$ , and a cost budget C > 0, the objective is to determine a policy

π^{*} : p_{k} ⟼ e_{k, k + 1} = (v_{k}, v_{k + 1}) \in E,

(5)

such that the resulting path

p^{*} = (v_{0} = v_{s}, v_{1}, \dots)

maximizes the total accumulated gain under the budget constraint:

p^{*} = \underset{p \in P}{argmax} g (p) subject to c (p) \leq C .

(6)

The initial vertex v₀ is fixed at v_s, while the final vertex is unconstrained. Paths may revisit vertices and edges, which can be advantageous (see Figure 3(b), where the robot must traverse v₃ multiple times to collect all gains). This formulation assumes complete a priori knowledge of the graph’s topology and associated cost and gain functions. Note that the policy π* may correspond to a one-shot plan computed at the beginning or a receding-horizon strategy that replans at every node. The notation here is general and accommodates all cases.

3.2. Graph online perception formulation

We next consider the case in which the graph is not known a priori and must be discovered incrementally through local perception. This online perception setting introduces significant challenges, particularly in balancing exploration and exploitation with partial information.

Let the environment be modeled as a directed simple graph $G = (V, E, g, c)$ , where the path gain is defined as in equation (4). The robot perceives its surroundings within a fixed radius ρ > 0, assuming no occlusions. For any vertex $v \in V$ , define its observable neighborhood as

N_{ρ} (v) = \{v_{i} \in V ∣ ‖ v_{i} - v ‖ \leq ρ\},

(7)

where

‖ v_{i} - v ‖

denotes the distance between v and v_i. Similarly, we say an edge

(v_{i}, v_{j}) \in E

is observable from v if

v_{i}, v_{j} \in N_{ρ} (v)

Let $p_{k} = (v_{0}, v_{1}, \dots, v_{k})$ denote the sequence of vertices visited after k steps, with accumulated cost satisfying the budget constraint $c (p_{k}) \leq C$ for all k. The cumulative discovered subgraph at step k is denoted by

G_{k} = (V_{k}, E_{k}, g_{k}, c_{k}),

(8)

where

V_{k} = ⋃_{i = 0}^{k} N_{ρ} (v_{i}), E_{k} = \{(v_{i}, v_{j}) \in E ∣ v_{i}, v_{j} \in V_{k}\},

(9)

and g_k, c_k are the restrictions of the global functions g and c to the current subgraph.

Given a perception radius ρ > 0, cost budget C > 0, and initial vertex $v_{s} \in V$ , the objective is to determine a policy

π^{*} : (G_{k}, p_{k}) ⟼ e_{k, k + 1} = (v_{k}, v_{k + 1}) \in E_{k},

(10)

that selects the next edge based on the currently discovered subgraph and executed path. The resulting path p* is chosen to maximize the total accumulated gain under the budget constraint, as in equation (6).

This formulation reflects a setting in which the policy π* is inherently adaptive, continually updated with new local information. Note that when the perception radius ρ exceeds the graph’s diameter, the entire graph $G$ becomes observable at the initial step (k = 0), effectively reducing this problem to the fully known a priori case described in Section 3.1.

3.3. Active perception formulation

We generalize the online perception framework to a more realistic active perception setting, where a robot operates autonomously in a continuous, bounded 3D workspace using a depth sensor with limited range, completing tasks without prior knowledge of the environment. In contrast to prior formulations that assume a directed graph as the environment, in this formulation, the robot incrementally constructs a graph as an internal abstraction solely to support planning and navigation.

Let $X \subseteq ℝ^{n}$ denote the bounded state space and $X_{free} \subseteq X$ its collision-free subset. The environment or world is represented as $W$ , and the robot is equipped with a sensor model $S$ (with depth up to ρ). In this work, we consider an RGB-D camera, which typically has a limited field of view. For any configuration $x \in X_{free}$ , the observable portion of the environment is given by $S (x; W) \subseteq W$ .

The robot begins at an initial configuration $x_{0} \in X_{free}$ and executes a sequence of velocity commands under a velocity-controlled motion model. At discrete timestep k, the control input $u_{k} \in U$ specifies linear and angular velocities applied over a fixed duration Δt. The resulting state transition is

x_{k + 1} = x_{k} \oplus (u_{k} \cdot Δ t),

(11)

where ⊕ adds the integrated translation to the position and compounds the integrated rotation with the current orientation. At time step k, the trajectory τ_k is denoted by

τ_{k} = (x_{0}, u_{0}, x_{1}, u_{1}, \dots, x_{k - 1}, u_{k - 1}, x_{k}) .

(12)

In this work, we use the term trajectory τ to denote a sequence of states and control actions generated under the robot’s velocity model with fixed timestep Δt, whereas a path p refers to an abstract sequence of vertices on a graph. Each action incurs a traversal cost $c (x_{i}, u_{i})$ , leading to a cumulative trajectory cost:

c (τ_{k}) = \sum_{i = 0}^{k - 1} c (x_{i}, u_{i}) .

(13)

If the cost is time, then the cost is simply $c (x_{i}, u_{i}) = Δ t$ and the trajectory cost is kΔt. At each step k, the robot integrates all previous observations for an estimated environment:

W_{k} = ⋃_{i = 0}^{k} S (x_{i}; W),

(14)

from which an estimated collision-free space

X_{free}^{k} \subseteq X

is inferred. The true task-dependent objective

f (τ; W)

is not available a priori; instead, the robot approximates it using an information-gain function g (⋅) conditioned on

W_{k}

g (τ; W_{k}) = g_{set} ({x_{i} ∣ x_{i} \in τ}; W_{k}) .

(15)

This function is sometimes instantiated as a sum over state-specific gains:

g (τ; W_{k}) = \sum_{x \in {x_{i} ∣ x_{i} \in τ}} g (x; W_{k}),

(16)

where

g (x; W_{k}) \geq 0

denotes the estimated information gain at x given the current map.

Given the sensor model $S$ , a cost budget C > 0, and initial configuration x₀, the objective is to determine a policy

π^{*} : (W_{k}, X_{free}^{k}, τ_{k}) ⟼ u_{k} \in U,

(17)

that selects the next velocity informed by the current map and trajectory history. The resulting trajectory

τ^{*} = (x_{0}, u_{0}, x_{1}, u_{1}, \dots)

is chosen to maximize the true task-dependent objective under the cost constraint:

τ^{*} = \underset{τ}{argmax} f (τ; W) subject to c (τ) \leq C .

(18)

In this paper, we focus exclusively on graph construction and navigation within the graph. This means that we grow the graph (both nodes and edges) within

X_{free}^{k}

and apply a velocity that helps the robot navigate on the graph. This setup is analogous to that in Section 3.2, wherein the trajectory cost

c (τ)

is computed as the corresponding path cost

c (p)

, and the trajectory gain

g (τ)

(equation (15)) is approximated by the path gain

g (p)

of the traversed vertices. Planning proceeds in discrete time, while control actions u_k are selected from a continuous control space.

4. Preliminaries

This section discusses the fundamental design choices of selection criteria and replanning strategies that underlie all subsequent algorithms for informative path planning. The choice of selection criterion is pivotal. In the graph a priori formulation (Section 3.1), the planner focuses solely on exploitation—maximizing gain within the cost constraint. In contrast, the graph online perception formulation (Section 3.2) and active perception formulation (Section 3.3) require balancing exploration (discovering new informative areas) and exploitation (maximizing collected gain). This work proposes a unified selection criterion designed to be effective across these settings. In addition, we investigate how the frequency of replanning influences performance, ranging from replanning at arbitrary frequency to single-shot planning (without replanning). Finally, we analyze the structure of optimal paths under symmetric graphs, offering insights for algorithmic design.

4.1. Selection criteria

The selection criterion defines a quality function of a path $q (p)$ , and the search algorithm returns the path with the highest quality. We consider three distinct path selection criteria, as illustrated in Figure 4: (i) path gain, (ii) gain-to-cost ratio (path ratio), and (iii) expected gain. In this work, we explicitly differentiate these selection criteria from the principal objective of informative path planning, which is formalized in equation (2). The introduction of these criteria is motivated by the exploration–exploitation trade-off, particularly in settings where the graph structure or environmental model is partially or entirely unknown. In such cases, selecting paths based solely on the principal objective (path gain) may lead to suboptimal performance.

Figure 4.

Illustration of path selection criteria. Each path is represented by a circle: open circles denote paths leading to frontier nodes $P_{F}$ , while solid circles represent non-frontier paths. The x-axis represents cost c, and the y-axis represents gain g. The optimal path under each criterion is highlighted with a green check mark. For the path gain criterion, the best path corresponds to the one with the highest g-value. Under the path ratio criterion, the optimal path is the one with the highest slope. For the expected gain criterion, the best path is the one whose extrapolated g-value is maximal.

4.1.1. Path gain

The most straightforward quality function is the path gain $g (p)$ as in equation (2), aligning directly with the problem formulation.

4.1.2. Path ratio/gain-to-cost ratio

Since the full environment may not be known a priori, a commonly used heuristic is to optimize for path efficiency, defined as the gain collected per unit cost:

r (p) = \frac{g (p)}{c (p)} .

(19)

Note that this quality function implicitly assumes that the current gain-to-cost ratio provides a reliable estimate of the future gain over the entire budget, which can be formalized as:

\underset{p \in P}{argmax} r (p) = \underset{p \in P}{argmax} r (p) \cdot C,

(20)

where

r (p) \cdot C

estimates the total gain achievable if the gain-to-cost efficiency remains constant.

This ratio has been proposed as the principal objective in several informative path planning frameworks (Popović et al., 2017; Schmid et al., 2020; Xu et al., 2021). However, it does not constitute a well-defined optimization problem. Once the path with the highest ratio is identified, the robot has no incentive to continue beyond that path, as doing so would reduce the overall ratio. In contrast, practical scenarios typically impose a maximum cost budget, which should ideally be fully utilized for collecting maximum gains.

4.1.3. Expected gain

To explicitly account for exploration, we propose the expected gain, which incorporates frontier information:

g_{e} (p) = \{\begin{cases} r (p) \cdot C, & if p \in P_{F} \\ g (p), & otherwise \end{cases}

(21)

Here, $F$ represents the set of frontier nodes, that is, nodes that lead to graph expansion, and $P_{F}$ is the set of paths ending at frontier nodes. For frontier paths, we approximate the potential future gain by assuming that after the current path is finished, the robot can continue from the frontier node into the newly discovered region, collecting gains with the current ratio $r (p)$ over the remaining cost budget. For all other paths, the expected gain defaults to the path gain $g (p)$ . This quality function balances exploitation and exploration: it favors exploitation in fully explored graphs while incentivizing exploration when frontier regions remain. Note that maximizing equation (21) is equivalent to

\max_{p \in P} g_{e} (p) = \max \{\max_{p \in P_{F}} r (p) \cdot C, \max_{p \notin P_{F}} g (p)\} .

(22)

A natural question that arises is whether the extrapolation in equation (21) should use the maximum gain-to-cost ratio across all frontier paths instead of individual path-specific ratios. The following theorem establishes that both approaches yield the same path selection. Consequently, the optimal path can be determined without precomputing the maximum ratio, reducing computational overhead while preserving optimality.

Theorem 1

Let $\tilde{P}$ be a set of candidate paths, and let C > 0 denote th e available budget. Define r * to be the maximal gain- to- cost ratio among the candidate paths. Then

\underset{p \in \tilde{P}}{argmax} r (p) \cdot C = \underset{p \in \tilde{P}}{argmax} g (p) + r^{*} (C - c (p)) .

(23)

Proof. We prove this theorem by showing the path that achieves the highest ratio r* maximizes both path quality functions. Define, for any $p \in \tilde{P}$ ,

q_{1} (p) = r (p) \cdot C, q_{2} (p) = g (p) + r^{*} (C - c (p)) .

(24)

Let p* maximize r, meaning $r (p^{*}) = r^{*}$ . Since C > 0 is constant,

q_{1} (p) \leq r^{*} C,

(25)

with equality if and only if

r (p) = r^{*}

. Rewriting

q_{2} (p)

q_{2} (p) = r^{*} C + (g (p) - r^{*} c (p)) .

(26)

The term r*C is constant, so maximizing $q_{2} (p)$ is equivalent to maximizing $g (p) - r^{*} c (p)$ . Observe that

g (p) - r^{*} c (p) = c (p) (r (p) - r^{*}) \leq 0,

(27)

with equality if and only if

r (p) = r^{*}

. Thus,

q_{2} (p) \leq r^{*} C

, with equality if and only if

r (p) = r^{*}

. Therefore, both objectives are maximized by the same set of paths. □

Corollary 1

Let $P_{F}$ denote the set of frontier paths, and let $r_{F}^{*} = \max_{p \in P_{F}} r (p)$ be the highest gain- to- cost ratio among these paths. Then,

\underset{p \in P_{F}}{argmax} r (p) \cdot C = \underset{p \in P_{F}}{argmax} g (p) + r_{F}^{*} (C - c (p)) .

(28)

This shows that extrapolating each frontier path using its own gain-to-cost ratio yields the same optimal path as using the maximum frontier ratio $r_{F}^{*}$ .

Additionally, the maximum expected gain $\max_{p \in P} g_{e} (p)$ can be expressed as:

\max \{\max_{p \in P_{F}} g (p) + r_{F}^{*} (C - c (p)), \max_{p \notin P_{F}} g (p)\},

(29)

or equivalently,

\max_{p \in P} g (p) + 1_{P_{F}} (p) \cdot r_{F}^{*} (C - c (p)),

(30)

where

1_{P_{F}} (p)

is the indicator function that equals 1 if

p \in P_{F}

and 0 otherwise.

Notably, our formulation equation (21) allows the optimal path to be determined without explicitly computing $r_{F}^{*}$ in advance, thus reducing computational overhead.

4.1.4. Complexity class of selection criteria

Consider the case where the path gain is defined as the sum of node gains along a path. Under this formulation, maximizing the path gain reduces to the classical longest path problem in a weighted graph, which is known to be NP-hard (Hardgrave and Nemhauser, 1962). In contrast, maximizing the gain-to-cost ratio corresponds to the maximum ratio path problem (Radzik, 2013). Ahuja et al. (1983) showed that the longest path problem can be reduced to a special case of the maximum ratio path problem. Consequently, any algorithm that solves the maximum ratio path problem can also solve the longest path problem, implying that the former is NP-hard. As the expected gain integrates both path gain and path ratio, its complexity subsumes theirs. Therefore, maximizing the expected gain is also NP-hard.

It is important to note that while the maximum ratio cycle problem is solvable in polynomial time (Dantzig et al., 1966; Lawler, 1972), this variant does not accommodate start-vertex constraints and is therefore not directly applicable to most informative path planning scenarios. That problem can be cast as detecting negative cycles (Cherkassky and Goldberg, 1999; Lawler, 1972) or solved using linear programming (Dantzig et al., 1966), but it does not generalize to the path-based formulations considered in this work.

4.2. Replanning strategies

When an underlying graph structure is available, the robot navigates by planning and executing paths over this graph. It may (i) follow a precomputed path without any replanning, (ii) replan only upon reaching the terminal node of the current path, or (iii) replan at each visited node along the trajectory, as illustrated in equations (5) and (10). In active perception scenarios, where planning and navigation are grounded in an internally constructed graph, the same three replanning strategies are applicable. Moreover, since the robot operates under velocity control, replanning can, in principle, occur at any time step. For example, the robot may execute a partial trajectory (e.g., five steps along the current path) before reevaluating its policy at an intermediate configuration. The frequency of replanning introduces a trade-off between computational cost and responsiveness to new sensory inputs.

4.2.1. No replanning

At the start node, the planner computes an optimal path

p^{*} = (v_{0}, v_{1}, \dots, v_{n - 1}, v_{n}) .

(31)

The robot then executes this plan to completion. This approach is computationally efficient but does not incorporate newly acquired information during execution and may underutilize the available cost budget.

4.2.2. Replanning at the goal

Once the robot completes the current optimal path p*, the remaining cost budget is updated, and the gain from previously visited nodes is zeroed:

C \leftarrow C - c (p^{*}), g (v_{i}) \leftarrow 0, i = 1, 2, \dots, n .

(32)

The planner then computes a new optimal path based on the updated budget and updated graph. This strategy is typical in frontier-based exploration, where the robot selects the most promising frontier and replans upon reaching it (Gervet et al., 2023; Xu et al., 2021; Yamauchi, 1997).

4.2.3. Replanning at every node

Instead of executing the entire path, the robot moves to the next node v₁ along edge e_0,1 and then replans. The cost budget is updated, and the gain from the visited node is zeroed:

C \leftarrow C - c (e_{0, 1}), g (v_{1}) \leftarrow 0 .

(33)

The planner then computes a new optimal path from v₁. This strategy enhances adaptability to new information but incurs a higher computational cost.

4.2.4. Replanning at arbitrary frequency

In active perception settings, the robot constructs and updates its internal graph representation online, based on sensory observations. At a given configuration, the planner computes an optimal path $p^{*} = (v_{0}, v_{1}, \dots, v_{n})$ , or

p^{*} = (x_{0} = v_{0}, x_{1}, x_{2}, \dots, x_{N}),

(34)

if we consider all the configurations along the edge. The time difference between consecutive configurations x_i and x_i+1 is Δt. Suppose the robot replans every kΔt; it would then replan upon reaching configuration x_k. The cost budget is updated, and the gain from the traversed configurations is zeroed:

C \leftarrow C - \sum_{i = 0}^{k - 1} c (x_{i}, u_{i}), g (x_{i}) \leftarrow 0, i = 1, 2, \dots, k .

(35)

This strategy enables replanning at arbitrary time steps, unconstrained by discrete node transitions. Such flexibility allows for the highest replanning frequency and, consequently, the most rapid adaptation to newly acquired information, albeit at the expense of maximal computational cost.

4.3. On the structure of optimal paths

A key challenge in informative path planning is that the optimal path may revisit vertices (see Figure 3(b)), leading to an infinite number of candidate paths. However, we show that it suffices to search only for paths that do not revisit edges in a symmetric directed graph, significantly reducing the search space.

Theorem 2

Let $G = (V, E, g, c)$ be a symmetric directed graph, meaning that for every $e_{i, j} = (v_{i}, v_{j}) \in E$ , the reverse edge $e_{j, i} = (v_{j}, v_{i}) \in E$ also exists and satisfies $c (e_{i, j}) = c (e_{j, i}) > 0$ . Suppose the gain of a path is determined solely by the set of nodes it visits (equation ( 2 )), and its cost is defined as the sum of edge costs along the path (equation ( 1 )). Then, for any path that traverses an edge more than once, there exists an alternative path that achieves the same gain with strictly lower cost and does not revisit any edge.

Proof. Let p be a path that traverses the edge $e_{m, n} = (v_{m}, v_{n})$ more than once. Then p can be decomposed as

p = p_{a} + e_{m, n} + p_{b} + e_{m, n} + p_{c},

(36)

where p_a and p_c are (possibly empty) subpaths, and p_b is a non-empty subpath from v_n back to v_m. The + operator here denotes path concatenation, defined only when the terminal node of the first segment matches the initial node of the second. For example,

(v_{0}, v_{1}, v_{2}) + (v_{2}, v_{3}) + (v_{3}, v_{4}, v_{5}) = (v_{0}, v_{1}, v_{2}, v_{3}, v_{4}, v_{5})

(37)

illustrates valid concatenations: the edge

(v_{2}, v_{3})

extends the path ending at v₂, and the subsequent path

(v_{3}, \dots)

continues from v₃.

We construct a modified path $\tilde{p}$ by removing both occurrences of the edge e_m,n and replacing the subpath p_b with its reversal $\tilde{p_{b}}$ . The resulting path is:

\tilde{p} = p_{a} + \tilde{p_{b}} + p_{c},

(38)

which remains a valid path since

\tilde{p_{b}}

starts at v_m and ends at v_n, and all reversed edges are present due to symmetry. This construction is illustrated in Figure 5.

Figure 5.

Repeated edges can be removed to construct a lower-cost path with identical gain.

Because $\tilde{p}$ visits the same set of nodes as p, its total gain remains unchanged. The cost, however, is strictly reduced due to the removal of both traversals of e_m,n:

g (\tilde{p}) = g (p), c (\tilde{p}) = c (p) - 2 c (e_{m, n}) < c (p) .

(39)

By applying this transformation iteratively to each repeated edge, we progressively eliminate all edge repetitions. Although reversing a subpath may introduce new repetitions, each application strictly reduces the total number of edge traversals by two. Since the initial path is finite, the process terminates after a finite number of steps, yielding a path with no repeated edges.

Thus, any path that revisits an edge can be transformed into a trail (i.e., a path with no repeated edges) with identical gain and strictly lower cost, completing the proof. □

This theorem directly leads to the following corollary.

Corollary 2

In informative path planning over symmetric directed graphs, it suffices to restrict the search space to trails—that is, paths that do not traverse any edge more than once.

5. Informative path planning algorithms

For informative path planning, we consider two widely used baseline algorithms: one based on the TSP and the other on the SPT. To overcome their limitations and improve computational efficiency, we propose beam search algorithms for solving the IPP problem. We first implement the standard beam search, referred to as depth-wise beam search (DBS), which retains the top B partial solutions at each search depth for further expansion. We then introduce a modified variant, node-wise beam search (NBS), which retains up to B candidate paths per visited node, rather than per depth level. This modification enhances the planner’s ability to explore the graph more globally and avoid becoming trapped in local optima.

To compare the potential of different paths, we define a path-preference relation that first prioritizes the gain-to-cost ratio, then the path gain, and finally penalizes higher cost when all other factors are equal.

Definition 1

Path Preference Relation. Let p₁ and p₂ be two candidate paths. We say that p₁ is preferred over p₂, denoted $p_{1} > p_{2}$ , if one of the following conditions holds:

(1) $r (p_{1}) > r (p_{2})$ ;

(2) $r (p_{1}) = r (p_{2}), g (p_{1}) > g (p_{2})$ ;

(3) $r (p_{1}) = r (p_{2}), g (p_{1}) = g (p_{2}), c (p_{1}) < c (p_{2})$ .

5.1. Existing algorithms

5.1.1. Traveling salesman problem approach

A common foundation underlying various TSP-based approaches (Arora and Scherer, 2017; Cao et al., 2021) is a two-stage pipeline: first, a subset of nodes is sampled or selected from the full set for traversal; second, the resulting TSP instance is solved using an off-the-shelf solver. In the context of informative path planning, where the goal is to maximize information gain, the most natural strategy is to select nodes whose gain exceeds a threshold g_thr. To explicitly encourage exploration, the selected set should further include frontier nodes:

V_{s} = F \cup {v \in V ∣ g (v) > g_{thr}} .

(40)

The gain threshold is determined using a mid-range criterion:

g_{thr} = g_{\max} - α \cdot (g_{\max} - g_{\min}),

(41)

where α specifies the fraction of top-gain nodes to retain, and g_min and g_max denote the minimum and maximum node gains.

Given the selected nodes from the previous stage, we first construct a cost matrix representing the shortest distances between every pair of nodes. We compute this matrix using the Floyd–Warshall algorithm (Cormen et al., 2022), which finds all-pairs shortest paths with a time complexity of $Θ (| V |^{3})$ and a space complexity of $Θ (| V |^{2})$ . Afterward, we solve the TSP to obtain the optimal tour p* using Google OR-Tools (Google, 2025). As TSP is NP-hard, exact dynamic-programming approaches such as the Bellman–Held–Karp algorithm (Bellman, 1962; Held and Karp, 1962) become computationally intractable for large instances, requiring $Θ (| V |^{2} 2^{| V |})$ time and $Θ (| V | 2^{| V |})$ space. Consequently, heuristic methods (e.g., Tabu Search (Glover and Laguna, 1998), Simulated Annealing (Kirkpatrick et al., 1983), or Guided Local Search (Voudouris, 1997)) are commonly employed to provide near-optimal solutions in practice. In our implementation, we allow OR-Tools to automatically select the solver. Since the complexity of the selected heuristic solver is not explicitly controlled, we report the lower bound given by the Floyd–Warshall preprocessing step and the upper bound given by the exact Bellman–Held–Karp algorithm.

5.1.2. Shortest path tree approach

The shortest path tree approach (Xu et al., 2021) comprises two computational stages: (1) constructing the SPT, and (2) selecting the optimal path within the SPT. We employ the Bellman–Ford algorithm (Cormen et al., 2022) to compute the SPT that encodes the minimum-cost path from the source to every node. This step dominates the computational complexity, requiring $O (| V ‖ E |)$ time, while the space complexity is $O (| V |)$ for storing distances and predecessors. Since the SPT is a tree, the number of paths to be evaluated equals $| V |$ . This algorithm generalizes prior tree-based IPP approaches using RRT (Bircher et al., 2016b; Witting et al., 2018) and RRT* (Schmid et al., 2020).

5.2. Depth-wise beam search

Beam search is a variant of breadth-first search that selectively retains the most promising partial solutions at each depth level. In the standard formulation, a fixed parameter $B \in N$ , known as the beam width, determines the number of candidate paths preserved at each iteration. We refer to this method as DBS, as it limits the number of paths expanded per depth. Unlike classical beam search, which typically terminates upon reaching a goal state, informative path planning lacks a well-defined endpoint since an optimal path can end at any node. To regulate the search, we introduce a search depth $D \in N$ to constrain the search space, which also mirrors the look-ahead horizon in practical applications.

Algorithm 1 outlines the DBS procedure for identifying the most informative path in a given graph. The search begins with a beam open , which maintains the top B paths at each depth level. During expansion, a min heap heap —a priority queue structure—is used to store candidate paths, where the least preferred path (worst-ranked) is accessible via heap .top(). Each path p in the current beam is expanded using all outgoing edges $δ^{+} (v)$ of its last vertex v. Expansion is performed through path concatenation, denoted by +. To prevent revisiting edges already traversed (as justified in Corollary 2), we check whether the edge $(v, v')$ is present in p. If the newly extended path p′ remains within the cost budget C, we evaluate its quality $q (p')$ under different selection criteria (Section 4.1) and update the current best path p* if p′ is superior. The min heap is updated based on the path preference operator $≻$ , ensuring that at most B paths are retained. If the heap has available capacity, the new path is directly inserted; otherwise, it replaces the worst-ranked path if it is preferred. This replacement operation maintains the min-heap property, guaranteeing that heap .top() always provides the least promising path for efficient pruning. Once all paths at the current depth are processed, the contents of heap are transferred to open to initialize the next beam. The search proceeds for D iterations, ultimately returning the best path found.

DBS exhibits a time complexity of $O (| V | D B (D + \log B))$ , which stems from its iterative deepening structure and the operations executed at each search level. The algorithm performs D iterations, and at each depth, it explores up to B paths by extending them with a neighboring vertex (with at most $| V |$ possibilities), resulting in $O (| V | D B)$ path expansions. Each path extension involves revisitation checks and path construction, both incurring a cost of $O (D)$ . The cost, gain, and ratio computations are performed in constant time, leveraging cached values for all paths in the beam open . Heap operations (push, pop) for maintaining the beam require $O (\log B)$ time. Accounting for all these operations leads to the stated time complexity. The space complexity of DBS is $O (D B)$ , as it maintains a beam open of up to B paths, each of length at most D.

5.3. Node-wise beam search

Depth-wise beam search prioritizes expanding the most promising paths at each depth level, but this makes it easily fall prey to local optima. Once the entire beam converges on regions with high immediate gains, it will neglect other promising candidates that initially appear less favorable but could ultimately lead to better solutions. In the worst case, all paths in the beam may even terminate at the same node. To address this, we propose an alternative algorithm, NBS. Instead of maintaining the top B path candidates per depth, NBS stores up to B paths for each visited node. This modification ensures that at least one promising path remains available at every visited node, encouraging exploration across the graph rather than focusing on immediate high-gain regions.

Algorithm 2 outlines the NBS procedure. The search maintains a per-node beam, open [v], storing up to B paths that lead to node v. At each iteration, all nodes with stored paths are expanded: for each path p ending at node v, we extend it to all successors v′, provided the edge $(v, v')$ has not been traversed in p. Throughout this process, we update the best path found according to the given selection criterion q. The structure heap serves as a temporary map to collect these extended paths; specifically, each heap [v′] is a min-heap that tracks the least promising candidate (accessible via heap [v′].top()) to enforce the beam limit. Other than this key difference, NBS follows a similar expansion strategy to DBS, progressing in increasing depth from 1 to D.

To better analyze the computational complexity, we can adjust the implementation by iterating the depth in the outer loop, then iterating over the edges $(v, v^{'}) \in E$ , and finally iterating over the paths inside open [v] in the inner loop. In this way, we can see that there are in total $| E | D B$ iterations. Inside each iteration, the complexity is the same as that of DBS; therefore, the total time complexity is $O (| E | D B (D + \log B))$ . For NBS, open contains a list of up to B paths for each vertex $v \in V$ . Each path can have at most D nodes, resulting in a space complexity of $O (| V | D B)$ .

5.4. Computational complexity

Table 1 presents a comparative summary of the computational characteristics of TSP, SPT, DBS, and NBS. For each algorithm, we report the time and space complexity, the maximum number of candidate paths considered during the search, and the average time and space complexity per path.

Table 1.

Asymptotic time and space complexity of the evaluated algorithms.

Alg.	Time comp.	Space comp.	Num. of paths	Time comp. per path	Space comp. per path
TSP	$Ω (\| V \|^{3}), O (\| V \|^{2} 2^{\| V \|})$	$Ω (\| V \|^{2})$ , $O (\| V \| 2^{\| V \|})$	-	-	-
SPT	$O (\| V ‖ E \|)$	$O (\| V \|)$	$\| V \|$	$O (\| E \|)$	$O (1)$
DBS	$O (\| V \| D B (D + \log B))$	$O (D B)$	$\| V \| D B$	$O (D + \log B)$	$O (1 / \| V \|)$
NBS	$O (\| E \| D B (D + \log B))$	$O (\| V \| D B)$	$\| E \| D B$	$O (D + \log B)$	$O (\| V \| / \| E \|)$

For the SPT approach, since the tree contains exactly one path to each vertex, the number of evaluated paths is $| V |$ . This yields an average time complexity of $O (| E |)$ and a space complexity of $O (1)$ per path. In the case of TSP, as the number of enumerated candidate paths is not explicitly bounded, per-path statistics are not applicable. DBS exhibits a per-path time complexity of $O (D + \log B)$ and a per-path space complexity of $O (1 / | V |)$ . On the other hand, NBS exhibits the same per-path time complexity but incurs a different space complexity of $O (| V | / | E |)$ .

Among all methods, TSP exhibits the highest asymptotic time and space complexity with the upper bound. While direct comparison between SPT and beam search methods in terms of overall computation time is challenging due to their dependence on beam width and search depth, their per-path complexities offer clearer insight. In most scenarios, D and log B are expected to be significantly smaller than $| E |$ , and thus both DBS and NBS exhibit lower per-path time and space complexity than SPT. Although NBS incurs higher total asymptotic complexity than DBS for a fixed beam width, it often demonstrates superior empirical performance under equivalent computational budgets. This advantage stems from its node-wise beam design, which promotes broader exploration and more effective graph coverage, yielding high-quality solutions with fewer path expansions (i.e., smaller B). We provide empirical support for this claim in Section 6.

6. Benchmarking on graphs

We evaluate all four algorithms, each with a range of key parameters. For beam search, the primary parameter is the beam width B. The maximum beam width is chosen so that DBS and NBS have comparable computation times. For DBS, the beam widths are 1, 100, and 10000; for NBS, they are 1, 10, and 100. For TSP, the relevant parameter is α, as defined in equations (40) and (41). While the shortest path tree typically does not involve tunable parameters, in Xu et al. (2021) the same parameter α is used to reduce the search space and computation time. We set α to 0.5, 0.75, and 1.0. All algorithms are implemented in C++, run in a single-threaded environment, and use the same graph representation to ensure a fair comparison. Experiments are conducted on an Intel i9-12900 CPU.

We evaluate each planner on two graph structures: clustered gains and scattered gains (examples in Figure 6). Both structures are generated at two scales: a small graph spanning 25 m in the x- and y-directions, and a large graph spanning 50 m. In the scattered scenario, vertex gains are uniformly sampled from [0,100]. In the clustered scenario, these gains are assigned only to eight randomly placed clusters (all other vertices receive zero gain). Edge costs equal Euclidean distances, with the start vertex fixed at (0, 0). For each graph type, we generate five instances and average the results for gain and computation time.

Figure 6.

Illustrations of four types of graphs used for benchmarking.

We consider both sensing scenarios: the a priori case (Section 3.1), where the robot has full knowledge of the graph from the beginning, and online perception (Section 3.2), where the robot observes only nodes within a given perception radius ρ, gradually discovering the graph as it navigates. We evaluate all applicable pairings of selection criteria and replanning strategies. In the a priori scenario, the expected gain equals the path gain ( $g = g_{e}$ ) since no frontiers exist, and the ratio objective is inapplicable in the absence of replanning. Thus, five path selection combinations are feasible in this case. In the online perception scenario, replanning is necessary, making the no-replanning condition invalid; this results in six combinations per graph type.

In the plots, the y-axis represents different path selection pairings, while the x-axis shows either the range of computation time or the final accumulated gain across parameter settings. Figure 7 presents the average computation time of various algorithms under different parameters. For this metric, we consider only the fully known graph case, as planning under exploration introduces significant uncertainties that can affect planning time. Figures 8 and 9 present the results for the a priori and online perception scenarios, respectively. We evaluate algorithmic performance along two principal dimensions: (i) the maximum achievable gain, represented by the rightmost extent of each horizontal bar, and (ii) sensitivity to parameter settings, indicated by the bar’s length. An algorithm is considered best if it consistently yields high gain with low sensitivity, corresponding to a short bar positioned to the far right. We prioritize the algorithm that achieves higher gain over those with merely lower sensitivity. Beyond identifying the top-performing algorithms, our analysis also aims to determine the most effective path selection combination across both formulations.

Figure 7.

Average computation time in the graph a priori experiments.

Figure 8.

The range of final collected gain in the graph a priori scenario. NBS achieves the highest and most consistent gains, as shown by short horizontal bars positioned to the far right.

Figure 9.

The range of final collected gain in the graph online perception scenario. The expected gain selection criterion delivers the strongest results overall, and replanning at every node further improves performance compared to other replanning strategies. With this setup, NBS offers the most reliable performance with little sensitivity to parameter choices.

6.1. Graph a priori benchmarking

Figure 7 shows the average computation time per algorithm, in seconds, plotted on a logarithmic scale. We consider two replanning strategies: no replanning and replanning at every node; the “replan-at-goal” case is excluded due to its inconsistent replanning frequency. Results are shown only for scattered gain graphs, as clustered graphs exhibit qualitatively similar trends. We observe that computation time increases with both graph size and replanning frequency. TSP scales the worst, exceeding 300 seconds per episode on large graphs, whereas SPT remains consistently efficient. DBS achieves very low planning times at small beam widths. NBS is slightly slower than DBS at a beam width of 1, but its absolute computation time remains small: approximately 0.1 seconds for small graphs and about 1.0 second for large graphs under the every-node replanning strategy.

We visualize algorithm performance in the a priori formulation in Figure 8. NBS consistently achieves the best performance across all conditions. It exhibits low sensitivity to both parameter settings and path selection, achieving high gains—as evidenced by short horizontal bars positioned to the far right. While DBS can potentially achieve comparable gains, its performance is more sensitive to both parameter and path selection setups. For example, it performs much worse in the clustered-gain graph without replanning. Across most settings, the long horizontal bars indicate that DBS’s performance is highly dependent on careful parameter tuning. SPT is insensitive to its parameters but shows considerable sensitivity to path selection. Without replanning, SPT fails to function effectively and, overall, achieves much lower gains than DBS and NBS. TSP lacks an explicit selection criterion, and its performance varies only with the replanning strategy. It is the most sensitive algorithm: while it can occasionally match SPT’s performance, it often achieves the lowest gains due to repeatedly traversing the same edge back and forth.

Regarding path selection, the top-performing algorithm, NBS, exhibits minimal sensitivity to this choice. This suggests that the specific path selection strategy is less critical in the fully known graph scenario when a robust algorithm is employed. Consequently, we omit further discussion of path selection for this scenario.

6.2. Graph online perception benchmarking

In the online perception scenario, the robot can only perceive nodes within a fixed radius (5 m in this case). Figure 9 illustrates the algorithms’ performance with this setup. The analysis parallels that of the a priori case, focusing on maximum gain, parameter sensitivity, and the path selection combination that yields the best performance.

We observe that the expected gain objective consistently yields superior results compared to the path ratio or path gain. This advantage arises because the expected gain explicitly encourages exploration by incorporating frontier information. In contrast, the other selection criteria lack such exploratory incentives, leading planners to over-exploit known areas and become trapped in local optima. Under the expected gain criterion, replanning at every step generally outperforms replanning only at the goal. Replanning exclusively at the goal ignores new information acquired en route, resulting in suboptimal behavior in three of the four evaluated graph types.

With this path selection setup, NBS consistently achieves the best performance and exhibits low sensitivity to parameter choices. DBS can attain similar performance but is more sensitive to parameter tuning. SPT performs worse overall, limited by its tree-based structure. TSP shows the weakest performance and exhibits high sensitivity to parameter settings.

6.3. Performance analysis on graphs

The effectiveness of selection criteria depends on the underlying graph formulation. When the graph is fully known a priori, the top-performing algorithms achieve strong results across all criteria. In contrast, in the online perception setting, the expected gain consistently outperforms other criteria across all planners. Regarding replanning frequency, in the a priori case, DBS and SPT benefit significantly from frequent replanning to improve performance. While TSP can also benefit from higher replanning frequency, it often exhibits oscillatory behavior between competing paths. Notably, NBS maintains strong performance across all replanning frequencies (even without any replanning), highlighting its robustness and computational efficiency. In the online perception setting, when paired with the expected gain criterion, high-frequency replanning mostly improves gain accumulation. This combination enables planners to effectively balance exploration and exploitation, adapting in real time to newly acquired information.

NBS demonstrates exceptional robustness to parameter variations, enabling reliable deployment with minimal tuning effort. Even at the smallest beam width (B = 1), it delivers strong performance. In contrast, DBS is more sensitive to beam width selection—poor parameter choices can significantly degrade its performance. SPT remains the most computationally efficient but sacrifices solution quality due to its restricted search space. TSP performs poorly overall: it is highly sensitive to its parameters, prone to oscillations between suboptimal paths, and exhibits poor scalability, with prohibitively high computation times on larger graphs.

Overall, our results show that NBS is the top-performing IPP algorithm. The combination of the expected gain metric with every-node replanning demonstrates the most effective selection strategy. Together, they form the best-performing planning approach, consistently achieving superior results in both a priori and online perception scenarios. In the following section, we shift our focus to active perception. Based on our findings, we adopt the expected gain criterion and apply a high replanning frequency. NBS is selected with a beam width of 1, as it is the most robust and reliable algorithm across all settings. DBS is configured with a larger beam width (B = 100). For SPT, given its computational efficiency, we evaluate the entire tree search space by setting the threshold α to 1.0. The TSP algorithm is configured with a threshold of 0.5.

7. Graph construction for active perception

As introduced in Section 3.3 and illustrated in Figure 2, active perception does not assume a predefined underlying graph environment, unlike prior IPP problems defined on graphs. Therefore, to apply IPP algorithms, we incrementally build a graph as the robot explores the environment.

Following prior work (Schmid et al., 2020; Xu et al., 2021), we enforce a minimum edge length constraint l_min: a newly sampled, collision-free configuration is accepted only if its distance to all existing graph nodes exceeds l_min. This helps avoid oversampling that would yield highly redundant information and improves computational efficiency during information-gain evaluation and path planning. Once accepted, bidirectional edges are attempted between the new node and its neighbors, subject to a maximum edge length l_max and the existence of a collision-free path. The maximum edge length prevents excessive edge density, such as $| V | (| V | - 1)$ edges in a complete graph. When a sampled configuration lies beyond l_max from any node, it is relocated to a position at a distance l_min along the direction toward the nearest neighbor, increasing the likelihood of establishing connections with nearby nodes. This sampling procedure is detailed in Algorithm 3, where n_sample denotes the maximum number of sampling attempts and v_null represents an invalid sampled node.

The resulting structure, in which edges exist only between nodes separated by distances within the interval [l_min, l_max], is known as an annulus graph (Lichev and Mihaylov, 2024). In such graphs, the number of outgoing edges per node—referred to as the out-degree—is inherently bounded. This bound follows from a geometric sphere-packing argument: specifically, the maximum number of non-overlapping spheres of radius $l_{\min} / 2$ that can be packed into a larger sphere of radius l_max². This yields the following conservative upper bound on the out-degree:

| δ^{+} (v) | \leq 8 {(\frac{l_{\max}}{l_{\min}})}^{3}, \forall v \in V,

(42)

where

δ^{+} (v)

is the set of outgoing edges of v. While tighter bounds exist (see Conway and Sloane (2013)), this estimate provides a sufficiently conservative, closed-form upper bound.

We introduce the rapidly-exploring random annulus graph (RRAG), a variant of the RRG that incorporates both minimum and maximum edge length constraints. The graph construction procedure is described in Algorithm 4. The auxiliary function Near $(V, v, l)$ returns the subset of vertices in $V$ within distance l of vertex v, computed efficiently using the nanoflann implementation of the k-d tree (Bentley, 1975; Blanco and Rai, 2014). The sampling routine is detailed in Algorithm 3 and the collision-checking procedure between two nodes is discussed in Section 7.1. For completeness, we also implement two related planners: RRAT (Algorithm 5), analogous to RRT in (Bircher et al., 2016b), and RRAT* (Algorithm 6), which parallels RRT* in (Schmid et al., 2020). We denote Cost(v) as the lowest cost from the start node to node v.

7.1. Collision checking between nodes

To determine whether a trajectory between two nodes is feasible, we must verify that it lies entirely within the free space $X_{free}$ , that is, the robot avoids collisions with obstacles and (if applicable) self-collisions along the path. Let $ξ : X \to R$ denote the clearance function, which assigns to each configuration $x \in X$ the signed Euclidean distance between the robot at x and the nearest obstacle or self-collision, whichever is closer. A configuration x is collision-free if and only if $ξ (x) > 0$ .

When connecting two configurations via straight-line interpolation in $X$ , the actual path traced by points on the robot depends on the robot’s geometry and kinematics. The following lemma bounds the length of such paths.

Lemma 1

Path length bound (Barraquand et al., 1997). For any bounded robot, there exists a constant η > 0, such that when the robot moves along the straight-line path between any two configurations x₁ and x₂, every point on the robot traces a curve of length at most η‖x₁ − x₂‖₂ in the workspace.

This bound allows us to derive a sufficient condition under which linear interpolation between two collision-free configurations remains entirely within $X_{free}$ . This concept is formalized as adjacency.

Definition 2

Adjacency (Barraquand et al., 1997). For a self-collision-free robot, two configurations $x_{1}, x_{2} \in X_{free}$ with $ξ (x_{1}), ξ (x_{2}) > 0$ are adjacent if

η ‖ x_{1} - x_{2} ‖_{2} < \max {ξ (x_{1}), ξ (x_{2})} .

(43)

If self- collisions are possible, the right- hand side should be scaled by 1/ 2.

This sufficient condition leads to the following guarantee for collision-free edges in annulus graphs.

Theorem 3

For robots without self-collision risks, any attempted trajectory verification in RRAG between $x_{1}, x_{2} \in X_{free}$ is collision-free if $ξ (x_{1}) > η l_{\max}$ or $ξ (x_{2}) > η l_{\max}$ . When the distance between x₁ and x₂ reaches the lower bound l_min, a collision-free path is then guaranteed if $ξ (x_{1}) > η l_{\min}$ or $ξ (x_{2}) > η l_{\min}$ .

Proof. In RRAG, edges are only formed between nodes satisfying ‖x₁ − x₂‖₂ ≤ l_max. To ensure that the linear path lies entirely in $X_{free}$ , it suffices to satisfy the adjacency condition, which holds whenever

η ‖ x_{1} - x_{2} ‖_{2} \leq η l_{\max} < \max {ξ (x_{1}), ξ (x_{2})} .

(44)

Requiring $ξ (x_{1}) > η l_{\max}$ or $ξ (x_{2}) > η l_{\max}$ ensures this inequality, yielding a conservative sufficient condition.

Similarly, when ‖x₁ − x₂‖₂ = l_min, we require

η ‖ x_{1} - x_{2} ‖_{2} = η l_{\min} < \max {ξ (x_{1}), ξ (x_{2})} .

(45)

Imposing $ξ (x_{1}) > η l_{\min}$ or $ξ (x_{2}) > η l_{\min}$ guarantees this condition.☐

Therefore, in narrow or heavily cluttered workspaces, where clearances are typically small, l_min and l_max must be reduced accordingly to guarantee sufficient graph coverage. However, this adjustment results in a denser graph, leading to highly overlapping information gains and significantly increased computational burden. If not properly tuned, the robot may fail to establish new edges, stalling graph expansion. To address this, we introduce a fallback local sampling-based planner (FLS) to recover valid transitions in cluttered or geometrically complex regions where straight-line interpolation would otherwise fail, as illustrated in Figure 10. The FLS is integrated into all three graph construction methods (RRAG, RRAT, RRAT*). Specifically, we employ the standard RRT* algorithm, implemented via the OMPL library (Sucan et al., 2012), to explore alternative collision-free paths locally. The FLS operates within an axis-aligned bounding box that tightly encloses the start and goal configurations, expanded by a margin of l_max in each dimension. It is invoked with a planning time budget to minimize path length. Thanks to the probabilistic completeness of RRT*, the planner will eventually find a valid path if one exists, given sufficient time.

Figure 10.

The necessity of the FLS in narrow passages. The FLS can find a path through a narrow “L”-shaped corridor when a straight-line connection fails.

7.2. Connectivity of annulus graph

While Galhotra et al. (2019) provided initial theoretical insights into the connectivity of random annulus graphs, their analysis primarily focuses on point sets sampled on a unit sphere (e.g., ${x \in X ∣ ‖ x ‖_{2} = 1}$ ) and applies only to specific connectivity regimes. More recent work by Lichev and Mihaylov (2024) highlights that the structural properties of annulus graphs are sensitive to the choice of l_min and l_max. To ensure connectivity among nodes, the annular region used for establishing edges must be sufficiently thick. The following theorem formalizes this requirement under natural geometric conditions: the maximum connection radius should be at least twice the minimum separation distance, that is, l_max ≥ 2l_min.

Theorem 4

Let $X_{free}$ be a connected subset of the configuration space, and let $V \subset X_{free}$ be a finite set of configurations satisfying:

(1) l_min-packing: Nodes in $V$ are mutually separated by at least l_min, that is, $\forall v_{i}, v_{j} \in V, v_{i} \neq v_{j} \Rightarrow ‖ v_{i} - v_{j} ‖ \geq l_{\min}$ ;

(2) l_min-covering: Every point in $X_{free}$ lies within distance l_min of some node in $V$ , that is, $\forall x \in X_{free}, \exists v \in V such that ‖ x - v ‖ \leq l_{\min}$ .

Let $G = (V, E)$ be the undirected annulus graph where two nodes with a distance between l _min and l _max are connected if there exists a collision- free path between them. Then, $G$ is guaranteed to be connected for any such $V$ if and only if

l_{\max} \geq 2 l_{\min} .

(46)

Proof. We first prove the sufficiency condition by contradiction. Suppose l_max ≥ 2l_min, but $G$ is disconnected. Then the vertex set $V$ can be partitioned into two non-empty, disjoint subsets $V_{a}$ and $V_{b}$ .

Since $X_{free}$ is connected, there exists a continuous path $τ : [0,1] \to X_{free}$ connecting any $v_{a} \in V_{a}$ to $v_{b} \in V_{b}$ . The absence of edges implies that ‖v_a − v_b‖ > l_max for all such pairs (the lower bound ‖v_a − v_b‖ ≥ l_min is satisfied due to the packing condition).

By the l_min-covering property, $X_{free}$ is contained in the union of closed balls of radius l_min centered at nodes in $V$ :

X_{free} \subseteq \underset{v \in V}{\cup} B (v, l_{\min}) .

(47)

Define the covered regions:

X_{a} = ⋃_{v_{a} \in V_{a}} B (v_{a}, l_{\min}), X_{b} = ⋃_{v_{b} \in V_{b}} B (v_{b}, l_{\min}) .

(48)

Then

X_{free} \subseteq X_{a} \cup X_{b}

, and since

τ (0) \in X_{a}

τ (1) \in X_{b}

, continuity ensures the existence of a point

\tilde{x} = τ (t)

for some

t \in (0, 1)

lying on the boundary between

X_{a}

and

X_{b}

At this boundary point, there exist nodes ${\tilde{v}}_{a} \in V_{a}$ and ${\tilde{v}}_{b} \in V_{b}$ such that $‖ \tilde{x} - {\tilde{v}}_{a} ‖ \leq l_{\min}$ and $‖ \tilde{x} - {\tilde{v}}_{b} ‖ \leq l_{\min}$ . By the triangle inequality,

‖ {\tilde{v}}_{a} - {\tilde{v}}_{b} ‖ \leq ‖ \tilde{x} - {\tilde{v}}_{a} ‖ + ‖ \tilde{x} - {\tilde{v}}_{b} ‖ \leq 2 l_{\min} .

(49)

Given l_max ≥ 2l_min, it follows that $‖ {\tilde{v}}_{a} - {\tilde{v}}_{b} ‖ \leq l_{\max}$ , which contradicts the assumption of disconnection. Hence, $G$ is connected.

For necessity, consider a counterexample in one dimension. Let $X_{free} = [0, d]$ with l_max < d < 2l_min, and define $V = {0, d}$ . The packing condition holds because ‖0 − d‖ = d > l_max > l_min. The covering condition also holds: every point in [0, d] is within l_min of either 0 or d, because d < 2l_min implies the balls $B (0, l_{\min})$ and $B (d, l_{\min})$ cover the interval.

However, ‖0 − d‖ = d > l_max, so no edge exists between the two nodes. Thus, $G$ is disconnected despite $X_{free}$ being connected. This completes the proof.

Theorem 4 requires two conditions. The packing condition is inherently enforced by our sampling procedure (Algorithm 3), which rejects any new sample within distance l_min of existing nodes. The covering condition means that the vertices must be sufficiently densely sampled to ensure effective coverage of the free space, which encourages using a sufficiently large n_new in Algorithm 4.

7.3. Preserving all orientations

In prior work, orientation is typically managed by discretizing the yaw space and selecting a single orientation per sampled position based on the highest gain, while discarding all other candidates (Schmid et al., 2020; Witting et al., 2018; Xu et al., 2021). Consequently, once a node is selected by the planner, its assigned orientation must be reached through additional turning—even if such rotation is less advantageous than proceeding directly to other nodes. Both RRAT and RRAT* adhere to this strategy.

In contrast, our graph representation provides greater flexibility. For each sampled position, all discrete orientations are retained and treated as distinct nodes. The set of nodes corresponding to different orientations at the same position is referred to as a node cluster. Within each cluster, adjacent orientation nodes are connected by bidirectional edges, provided the transitions are feasible. For connections between different clusters (i.e., inter-cluster edges), connectivity is simplified by retaining only two directed edges (one in each direction) rather than creating a fully connected graph. These edges are constructed by selecting, in each cluster, the node whose orientation is most aligned with the direction of traversal, thereby minimizing unnecessary reorientation during inter-cluster motion. An example of RRAG with all orientations preserved is shown in Figure 11.

Figure 11.

Example of the RRAG with all orientations preserved. Only the edges are visualized. An offset perpendicular to the edge direction is applied to inter-cluster edges to distinguish overlapping connections. For intra-cluster edges, an offset is applied along the node’s yaw direction, forming a polygonal pattern around the actual node. Since FLS can be used to connect nodes, the edges are not strictly straight lines.

7.4. Graph updating

Prior to each replanning step, the planner updates both node gains and the graph structure. Node gains are recomputed within a local neighborhood of radius l_gain centered at the robot’s current position. For RRAT and RRAT*, the best yaw is reselected at each update. Graph structure updates are handled differently depending on the planner. In all cases, edges are revalidated within a local region of radius l_edge; any edge that no longer satisfies feasibility constraints is removed. For RRAT, graph updating further includes pruning all child edges of the current root node except the one selected for traversal. For RRAT*, there is a rewiring mechanism that rewires all nodes in the tree relative to the new root. While effective, this step is computationally expensive due to the need for collision checks across numerous node pairs. In contrast, RRAG benefits from its graph structure: since edges are not constrained to form a tree, no additional rewiring is required during replanning. After edge revalidation, the graph expansion process continues as in Algorithms 4, 5, and 6.

Different planners handle replanning slightly differently. For RRAT and RRAT*, which maintain a strict tree structure, this is done by introducing an intermediate node at which replanning is triggered, and then splitting the current edge at this intermediate node. Once the robot reaches this intermediate node, replanning proceeds as discussed, including gain updates, edge revalidation, graph expansion, and path selection. In the RRAG setting, the original edge is retained, and two new edges are added—connecting the start of the current edge to the intermediate node, and the intermediate node to the end of the current edge. Additionally, RRAG attempts to connect the intermediate node to nearby nodes within distance l_max via outgoing edges, enabling the robot to consider alternative paths beyond the original trajectory. The intermediate node is discarded once the robot reaches the next node.

8. Experiments on active perception

To evaluate the proposed planner, we conduct experiments in a simulated environment using Habitat (Savva et al., 2019) with the HSSD dataset (Khanna et al., 2024), a synthetic collection of diverse 3D environments. We consider eight distinct scenarios, which are visualized along with their IDs in Figure 12.

Figure 12.

Eight HSSD scenarios used in the active perception simulation environment. Note that the scenarios are displayed at different scales for visualization purposes.

The robot parameters, including maximum linear and angular velocities, are listed in Table 2. It moves and receives image data at a fixed timestep. The robot follows a walk-forward-only policy, consistent with standard practices in the Habitat Simulator, where each straight-line motion segment is executed in three phases: (1) rotate to face the target position, (2) translate to the target, and (3) rotate to the desired final orientation³. This strategy enforces forward-facing perception, enhancing environmental awareness during navigation. The onboard camera is configured with a resolution of 640 × 480 pixels, a field of view of 90 ° × 74 °, and an effective range of 3 m. Upon receiving sensor input, we use Voxblox (Oleynikova et al., 2017) with a voxel resolution of 0.1 m to construct an ESDF-based map for gain evaluation and collision checking. The robot’s collision body is modeled as a sphere of radius 0.3 m, centered at the base frame.

Table 2.

Robot parameters.

Parameter	Value
Max linear velocity v_max	0.5 m/s
Max angular velocity ω_max	1.6 rad/s
Timestep Δt	0.2 s
Move mode	Walk forward only
Collision body	Ball of radius 0.3 m

We consider three representative application scenarios: (i) point collection in an observed area, (ii) surface mapping for 3D reconstruction, and (iii) volumetric exploration. These tasks differ primarily in their respective gain definitions. In the point collection scenario, gain is computed by identifying all observed points within a local neighborhood of each node. The total path gain is defined as the sum of all unique points near the nodes visited along the path. For surface reconstruction, we adopt the surface frontier criterion introduced by Yoder and Scherer (2016). In the volumetric exploration setting, gain corresponds to the number of unknown voxels within the sensor’s field of view. For both surface reconstruction and volumetric exploration, inter-node gain correlations are neglected; node-level gains are simply aggregated along the trajectory, following established practice (Schmid et al., 2020; Witting et al., 2018; Xu et al., 2021).

We further analyze the relationship between the modeled gain g and the realized gain f (the principal objective). In the point collection scenario, the gain estimate is exact, as it accounts for all visible points and captures spatial correlations. For surface reconstruction, the estimate is conservative and constitutes a lower bound, as the surface frontier only considers unknown voxels adjacent to occupied regions as potential surface continuations. In contrast, the volumetric exploration model provides an upper bound, assuming visibility through free space while ignoring occlusions. By evaluating different algorithms under these differing modeling assumptions, we aim to demonstrate the robustness and effectiveness of our proposed algorithm.

We compare our proposed method against several baselines. These include the approaches of Bircher et al. (2016b) and Schmid et al. (2020), which we adapt to use RRAT (Algorithm 5) and RRAT* (Algorithm 6) as their underlying tree representations, respectively. We also include SPT/DEP (Xu et al., 2021), which is used in conjunction with RRAG (Algorithm 4). For TSP-based planning, we consider both the TSP algorithm described in Section 5.1.1 and the FUEL planner (Zhou et al., 2021), which performs active mapping using only frontier information. For clarity, each planner is named according to its core conceptual contribution. Table 3 summarizes the graph structure, planning algorithm, and reference for each method.

Table 3.

Planner description.

Planner name	Graph structure	Planning algorithm	References and other names
NBS	RRAG	NBS	-
DBS	RRAG	DBS	-
SPT	RRAG	SPT	Xu et al. (2021): DEP
TSP	RRAG	TSP	Arora and Scherer (2017)
RRT*	RRAT*	-	Schmid et al. (2020)
RRT	RRAT	-	Bircher et al. (2016b): NBVP
FUEL	-	TSP	Zhou et al. (2021)

Parameter settings for RRT, RRT*, SPT, and TSP are selected based on the results in Section 6. The parameters of the FUEL planner are tuned to better align with our robot’s sensing capabilities and the characteristics of our experimental environments. In all experiments, we define frontier nodes

F

as those with a substantial number of unobserved voxels within their field of view. This indicates that navigating the robot to such a node is likely to enable further expansion of the graph. This behavior is governed by two parameters: the minimum ratio of unknown voxels within the field of view, and the minimum ratio of currently visible voxels relative to the historical maximum. Table 4 summarizes the planner parameters.

Table 4.

Planner parameters.

Parameter	Value
Num yaw candidates	8
Frontier min unknown ratio	0.8
Frontier min visible ratio	0.6
Selection criterion	g _e
Replan frequency	1.0 Hz
FLS planning time	2 ms
Max new nodes tries n_new	50
Update radius l_gain, l_edge	5.0 m
NBS beam width B	1
NBS search depth D	100
DBS beam width B	100
DBS search depth D	100
TSP top node percentage α	0.5
SPT top node percentage α	1.0

The remainder of this section details each application scenario and presents corresponding benchmarking results. We then discuss validation of the pipeline on a physical robot platform using onboard computation.

8.1. Point collection

In the point collection scenario, the robot observes a set of collectible points and must navigate to within a Euclidean distance l_col of each point (orientation-independent) to collect it. Let $Z_{obs}$ denote the set of observed points. For any node v, the collection neighborhood is defined as

Z_{col} (v) = \{z \in Z_{obs} ∣ ‖ v - z ‖_{2} \leq l_{col}\} .

(50)

When the robot visits node v, all points

z \in Z_{col} (v)

are considered collected, and their associated gains

g (z)

contribute to the cumulative reward.

Because the collection neighborhoods of different nodes may overlap, we impose a non-redundancy constraint: once a point z is collected (i.e., its gain has been accounted for), any future visit to nodes within distance l_col of z yields no additional reward. Formally, for a path $p \in P$ , the set of collected points is

Z_{col} (p) = ⋃_{v \in p} Z_{col} (v),

(51)

where each point is included only once. The total gain along path p is then given by

g (p) = \sum_{z \in Z_{col} (p)} g (z),

(52)

which complies with the general path gain definition in equation (2).

This setup parallels the object-goal navigation (Batra et al., 2020) task, albeit without relying on semantic mapping. The core mechanism is similar: after observing a target, the robot must navigate to its vicinity to realize the reward. Consequently, the robot must balance exploration of unobserved regions against exploitation of known collectibles. An example of an agent collecting points is visualized in Figure 13. For evaluation, points are uniformly sampled within the bounding volume and assigned integer gains g(z) ∈ [0, 10], with both positions and gains held constant for all algorithms to ensure fair comparison. The collection radius l_col is set to 1 m, as in the Habitat ObjectNav Challenge (Yadav et al., 2023). Accordingly, we set the minimum and maximum distance thresholds to l_min = 1 m and l_max = 2 m, respectively.

Figure 13.

Visualization of an agent navigating to collect points represented as golden rings. These points are initially unknown and become visible only when the corresponding regions are perceived by the agent. The trajectory is visualized using a temporal gradient, transitioning from dark blue (start) to bright yellow (end). Each node is visualized as an arrow, with clusters of nodes at identical positions forming node clusters. Node gains are color-coded, with blue indicating the lowest gain and red the highest. Frontier nodes are emphasized with larger markers, while transparent nodes indicate non-frontier nodes with zero gain.

8.2. Surface reconstruction

In surface reconstruction, the objective is to scan and recover the geometry of a scene as completely as possible, quantified by the total number of voxels ultimately classified as occupied. However, this true objective f, as used in equation (18), cannot be directly computed because it requires complete knowledge of the environment—information unavailable to the robot during exploration.

Therefore, we define our gain function g in equation (15) using the concept of surface frontiers (Yoder and Scherer, 2016). Surface frontiers are unknown voxels that are adjacent to currently known occupied voxels (i.e., the reconstructed surface). For each node v, the gain is calculated as the number of such frontier voxels visible from v:

g (v) = |\{o \in O (v) \cap U ∣ A (o) \cap S \neq \emptyset\}|,

(53)

where

O (v)

is the set of voxels observable from v,

U

denotes unknown voxels,

S

represents the current set of occupied (surface) voxels, and

A (o)

is the set of voxels adjacent to o. Since nodes do not need to be densely distributed for this task, we set l_min = 1.5 m and l_max = 3 m. The total gain of a trajectory is the sum of the gains of its constituent nodes, as in equation (16). A qualitative comparison of various approaches on this task is shown in Figure 14.

Figure 14.

Qualitative comparison of surface reconstruction for scene 102816852. Each planner is executed 10 times, and the visualization shows the worst-case result among the trials, representing a lower-bound performance. The total reconstructed surface area (in square meters) for each planner is as follows: NBS: 445.73, DBS: 419.10, SPT: 349.96, TSP: 432.21, FUEL: 341.03, RRT*: 301.75, and RRT: 348.43. The trajectory is visualized using a temporal gradient, from dark blue to bright yellow.

8.3. Volumetric exploration

In the task of volumetric exploration, the objective is to observe as much of the unknown environment as possible, which can be measured by the total number of voxels whose occupancy state is known after the exploration. This includes both free and occupied voxels that have been successfully classified. Unlike other tasks, volumetric exploration directly equates gain collection with exploration itself. Thus, there is no intrinsic need to trade-off between exploration and exploitation.

As with surface reconstruction, the true objective f is not accessible during execution. Therefore, the gain g at each node is defined as the number of unknown voxels visible from that node, a commonly used metric due to its simplicity and practical effectiveness. Formally, for any node v, the gain is given by:

g (v) = | O (v) \cap U | .

(54)

Following the setup for surface reconstruction, we set l_min = 1.5 m and l_max = 3 m, and compute the total gain along a trajectory as the sum of node gains.

8.4. Performance analysis in active perception

Each algorithm is evaluated across the three application scenarios over 10 independent trials, with execution terminated upon reaching a predefined maximum planning time. The resulting performance curves, showing average gain as a function of cost, are presented in Figure 15 for five large environments and Figure 16 for three small environments. For each curve, the mean and standard deviation of accumulated gains are calculated from linearly interpolated gain values over a cost range from 0.0 to the smallest final cost achieved by the corresponding algorithm.

Figure 15.

The mean and half the standard deviation of accumulated gain across large HSSD scenarios. Trajectory time serves as the cost function with a 10-min budget. Planning terminates at the 15-min time limit.

Figure 16.

The mean and half the standard deviation of accumulated gain across small HSSD scenarios. Trajectory time serves as the cost function with a 5-min budget. Planning terminates at the 7-min time limit.

Since gain varies across environments and tasks, we use normalized gain to enable comparisons across different scenarios (Figure 17). For each task and environment, we run both NBS and RRT twice for a sufficient duration until the gain plateaus. We take the maximum observed gain from these four runs as an estimate of the ground truth (see Figures 15 and 16) and normalize the gain with respect to this value.

Figure 17.

Comparison of normalized gain across methods and tasks.

The three tasks present different levels of difficulty. Exploration is the easiest, as all algorithms achieve more than 0.50 normalized gain. This is intuitive, since this task does not require the robot to exploit the known environment. The other two tasks are more challenging, as they require a balance between exploration and exploitation. Surface reconstruction has a medium level of difficulty, with the worst performance around 0.40. Point collection is the hardest, due to the smaller collection radius and consequently denser graph. The worst-performing algorithm achieves only 0.22 normalized gain.

NBS achieves the highest performance across all three tasks: 0.50 in point collection, 0.49 in surface reconstruction, and 0.77 in exploration. Moreover, NBS outperforms other algorithms by at least 20% in one or more tasks, highlighting its robustness and effectiveness. In point collection, it demonstrates a significant improvement, surpassing the second-best algorithm by 16% (increasing normalized gain from 0.43 to 0.50).

DBS performs best in surface reconstruction, primarily because this task often requires the robot to thoroughly map its immediate surroundings before exploring farther. In such cases, DBS’s preference for locally optimal paths becomes an advantage. However, it performs notably worse in point collection. This degradation arises from the sparser gain distribution in this setting, where DBS is prone to getting trapped in local regions with low or zero gains. This behavior reveals a fundamental limitation of DBS: its susceptibility to local optima.

SPT underperforms relative to NBS, primarily due to its constrained search space. The performance gap is particularly pronounced in point collection and surface reconstruction, where the limited candidate set often yields trajectories biased toward either exploration or exploitation, without effectively balancing the two.

RRT* exhibits similar limitations, further exacerbated by the need to rewire the tree when the root node changes, which incurs significant computational overhead. Moreover, it lacks full orientation coverage, often requiring inefficient back-and-forth maneuvers to execute simple turns. RRT inherits some of these drawbacks: although computationally efficient, its continuous node pruning and sparse graph structure can lead to suboptimal decisions.

The TSP algorithm should, in principle, be well-suited for surface reconstruction, as TSP tends to prioritize high-gain nodes in the vicinity before expanding to other regions. However, we find that it suffers from excessive computational demands. In practice, it often fails to complete within the allotted time budget, resulting in the lowest average performance among the evaluated methods. Additionally, its performance is even worse in the other two tasks, as the decoupling of node selection and path optimization can lead to poor node choices that the TSP is then forced to traverse.

In contrast, the FUEL planner is driven by frontier information and generates a relatively sparse set of viewpoints to traverse, contributing to its time efficiency. Its overall performance trend closely resembles that of the other TSP-based planner. However, FUEL performs poorly on the point collection task, as this requires specific point-level information rather than just frontier awareness. Because FUEL actively maps all frontiers, it tends to exploit nearby viewpoints before moving to distant regions, leading to comparatively better performance in surface reconstruction than in pure exploration. Since the robot assigns only one viewpoint per location, it would need to travel back and forth to perform simple turns, which could consume the cost budget compared to our approach.

8.5. Hardware experiments

We deploy our planner on the ANYmal quadruped robot (Hutter et al., 2016), equipped with a ZED X Mini camera mounted on the front-facing end-effector, offering a field of view of [74°, 45°]. On the physical platform, depth estimation from a ZED X camera is processed onboard by an NVIDIA Jetson Orin, while the mapping and planning modules execute on an Intel Core i7 CPU. Captured RGB-D images are converted into colorized, downsampled point clouds, which are subsequently used by Voxblox for volumetric map generation. Unlike the simulation setup, the real-world deployment constrains the planner to a two-dimensional search space at a fixed height. The collision model is adapted to better encapsulate the legged robot’s body, resulting in larger and additional collision spheres. Consequently, collision checking needs to be more optimistic: if at least one collision sphere is verified to be collision-free and none is known to be in collision, the pose is deemed traversable. Because yaw influences the collision geometry of legged robots, this checking is performed across all yaw angles at each node.

In the first hardware experiment, the robot is tasked with autonomously reconstructing the surface of a lab. We use surface frontiers to estimate gain, allowing the robot to actively navigate toward regions with holes to complete the surface reconstruction. Figure 1 illustrates the reconstructed maps as time evolves. The robot simultaneously explores the environment, expands its graph, and fills in missing holes to produce a more complete surface. While following the planned path, the robot replans at a fixed frequency (omitted from the figure for simplicity). The robot successfully explores this complex and cluttered environment and reconstructs the surface densely, with minimal artifacts such as small holes. We have also conducted surface reconstruction in an outdoor space and an engine museum (Figure 18) and volumetric exploration in a cafeteria and another laboratory environment (Figure 19). These experiments demonstrate the feasibility and effectiveness of running the full planning and perception pipeline onboard the robot across diverse environments.

Figure 18.

Autonomous surface reconstruction using surface frontiers as the gain. The robot’s trajectory is visualized with a red arrow indicating its direction of movement. The mesh reconstructed from the captured RGB-D images is also shown, with boundary artifacts removed for clarity.

Figure 19.

Autonomous volumetric exploration using unknown voxels as the gain. Experiments were conducted in both a cafeteria and a lab environment.

9. Limitations and future directions

While our approach demonstrates strong performance across extensive simulation and hardware experiments, several limitations remain. Our method is validated through empirical evaluation, and we do not provide a formal proof of optimality. Our quantitative evaluation also focuses primarily on indoor environments, as supported by the HSSD dataset, even though we show qualitative hardware results across a broader range of settings (see Figures 1, 18, and 19). Our current framework addresses only semantic-agnostic active perception, and extending it to more challenging semantic tasks is a natural direction for future work. Examples include object-goal navigation, where a robot must find objects in an unknown environment based on natural language instructions (Chang et al., 2024; Gervet et al., 2023; Qu et al., 2024; Shah et al., 2023), and semantic mapping (Fu et al., 2026; Gu et al., 2024; Jatavallabhula et al., 2023; Takmaz et al., 2023; Zhang et al., 2025). In these settings, designing a semantic-aware gain function becomes nontrivial, and it would be compelling to evaluate how our planner performs under such task-driven, semantics-heavy objectives. The present implementation also relies exclusively on ESDF-based mapping. We aim to generalize the planning framework to additional mapping paradigms, particularly 3D Gaussian Splatting (Chen et al., 2025; Goli et al., 2024; Jiang et al., 2023; Kerbl et al., 2023), and to incorporate learned information-gain predictors (Sun et al., 2025). Moreover, while we account for gain correlations in point-collection tasks, we do not model them in volumetric exploration and surface reconstruction. This is because accurately computing inter-node correlations for voxel-counting-based gains is computationally prohibitive—requiring per-voxel tracking to avoid double-counting—and would compromise real-time planning performance. Developing an efficient method to explicitly model correlations between nodes could relax the common additive node-wise gain assumption, thereby improving gain estimation and overall planning performance (Yu et al., 2016).

10. Conclusion

We present NBS, a beam-per-node strategy designed to address the challenges of active perception in mobile robotics. Through extensive empirical evaluation, we show that NBS consistently outperforms classical TSP and SPT baselines as well as naive beam search in graph-based environments. We further demonstrate that the expected gain selection criterion—by incorporating frontier information—achieves a better balance between exploration and exploitation, leading to higher task performance. To bridge abstract graph representations with active perception, we introduce the RRAG, which preserves orientation density and ensures graph connectivity in cluttered environments. In extensive simulations, NBS surpasses state-of-the-art planners across three representative tasks by a significant margin. We validate the practicality of our approach through hardware experiments in different scenarios using onboard computation. All proposed algorithms are available in the open-source OpenApero library to promote reproducibility and serve as a foundation for future active perception research.

Footnotes

Acknowledgments

The authors would like to thank Jie Tan for the initial discussions, Changan Chen for helpful input, and Boyang Sun and Jiaxu Xing for proofreading the manuscript.

ORCID iDs

Kaixian Qu

Han Wang

Victor Klemm

Cesar Cadena

Marco Hutter

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Swiss National Science Foundation through the National Centre of Competence in Digital Fabrication (NCCR dfab), by Huawei Tech R&D (UK) through a research funding agreement, and by an ETH RobotX research grant funded through the ETH Zurich Foundation.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Ahuja

Batra

Gupta

(1983) Combinatorial optimization with rational objective functions: a communication. Mathematics of Operations Research 8(2): 314. https://doi.org/10.1287/moor.8.2.314

Arora

Scherer

(2017) Randomized algorithm for informative path planning with budget constraints. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 4997–5004.

Asgharivaskasi

Koga

Atanasov

(2022) Active mapping via gradient ascent optimization of shannon mutual information over continuous se (3) trajectories. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 12994–13001.

Bai

Wang

Chen

, et al. (2016) Information-theoretic exploration with bayesian optimization. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 1816–1822.

Barraquand

Kavraki

Latombe

, et al. (1997) A random sampling scheme for path planning. International Journal of Robotics Research 16(6): 759–774. https://doi.org/10.1177/027836499701600604

Batra

Gokaslan

Kembhavi

, et al. (2020) Objectnav revisited: on evaluation of embodied agents navigating to objects. arXiv preprint arXiv:2006.13171.

Bellman

(1962) Dynamic programming treatment of the travelling salesman problem. Journal of the ACM (JACM) 9(1): 61–63. https://doi.org/10.1145/321105.321111

Bentley

(1975) Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9): 509–517. https://doi.org/10.1145/361002.361007

Bircher

Alexis

Burri

, et al. (2015) Structural inspection path planning via iterative viewpoint resampling with application to aerial robotics. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 6423–6430.

10.

Bircher

Kamel

Alexis

, et al. (2016a) Three-dimensional coverage path planning via viewpoint resampling and tour optimization for aerial robots. Autonomous Robots 40: 1059–1078. https://doi.org/10.1007/s10514-015-9517-1

11.

Bircher

Kamel

Alexis

, et al. (2016b) Receding horizon “next-best-view” planner for 3d exploration. In: 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 1462–1468.

12.

Bircher

Kamel

Alexis

, et al. (2018) Receding horizon path planning for 3d exploration and surface inspection. Autonomous Robots 42: 291–306. https://doi.org/10.1007/s10514-016-9610-0

13.

Blanco

Rai

(2014) nanoflann: a C++ header-only fork of FLANN, a library for nearest neighbor (NN) with kd-trees. https://github.com/jlblancoc/nanoflann

14.

Cao

Low

Dolan

(2013) Multi-robot informative path planning for active sensing of environmental phenomena: a tale of two algorithms. arXiv Preprint arXiv:1302.0723.

15.

Cao

Zhu

Choset

, et al. (2021) Tare: a hierarchical framework for efficiently exploring complex 3d environments. Robotics: Science and Systems 5: 2.

16.

Cao

Wang

Vashisth

, et al. (2023) Catnipp: context-aware attention-based network for informative path planning. In: Conference on Robot Learning. PMLR, pp. 1928–1937.

17.

Chang

Gervet

Khanna

, et al. (2024) Goat: go to any thing. Robotics: Science and Systems XX.

18.

Chen

Khardon

Liu

(2024a) Adaptive robotic information gathering via non-stationary Gaussian processes. The International Journal of Robotics Research 43(4): 405–436. https://doi.org/10.1177/02783649231184498

19.

Chen

Liu

Khardon

(2024b) Poam: probabilistic online attentive mapping for efficient robotic information gathering. arXiv Preprint arXiv:2406.03669.

20.

Chen

Zhan

Chen

, et al. (2025) Activegamer: active Gaussian mapping through efficient rendering. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 16486–16497.

21.

Cherkassky

Goldberg

(1999) Negative-cycle detection algorithms. Mathematical Programming 85(2): 277–311. https://doi.org/10.1007/s101070050058

22.

Cieslewski

Kaufmann

Scaramuzza

(2017) Rapid exploration with multi-rotors: a frontier selection method for high speed flight. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 2135–2142.

23.

Conway

Sloane

NJA

(2013) Sphere packings, lattices and groups. Springer Science & Business Media, Vol. volume 290.

24.

Cormen

Leiserson

Rivest

, et al. (2022) Introduction to Algorithms. MIT press.

25.

Dang

Tranzatto

Khattak

, et al. (2020) Graph-based subterranean exploration path planning using aerial and legged robots. Journal of Field Robotics 37(8): 1363–1388. https://doi.org/10.1002/rob.21993

26.

Dantzig

Blattner

Rao

(1966) Finding a Cycle in a Graph with Minimum Cost to Time Ratio with Application to a Ship Routing Problem. Defense Technical Information Center.

27.

Duberg

Jensfelt

(2022) Ufoexplorer: fast and scalable sampling-based exploration with a graph-based planning structure. IEEE Robotics and Automation Letters 7(2): 2487–2494. https://doi.org/10.1109/lra.2022.3142923

28.

Zurbrügg

, et al. (2026) Funfact: building probabilistic functional 3d scene graphs via factor-graph reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

29.

Galhotra

Mazumdar

Pal

, et al. (2019) Connectivity of random annulus graphs and the geometric block model. Approximation, Randomization, and Combinatorial Optimization. In: Algorithms and Techniques.

30.

Galhotra

Mazumdar

Pal

, et al. (2019) Connectivity of random annulus graphs and the geometric block model. Approximation, Randomization, and Combinatorial Optimization. In: Algorithms and Techniques (APPROX/RANDOM 2019). Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2019.53

31.

Ghaffari Jadidi

Valls Miro

Dissanayake

(2019) Sampling-based incremental information gathering with applications to robotic exploration and environmental monitoring. The International Journal of Robotics Research 38(6): 658–685. https://doi.org/10.1177/0278364919844575

32.

Glover

Laguna

(1998) Tabu Search. Springer.

33.

Goli

Reading

Sellán

, et al. (2024) Bayes’ rays: uncertainty quantification for neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20061–20070.

34.

Google (2025) Or-tools - google optimization tools. https://developers.google.com/optimization/ (Accessed 17 03 2025).

35.

Kuwajerwala

Morin

, et al. (2024) Conceptgraphs: open-vocabulary 3d scene graphs for perception and planning. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 5021–5028.

36.

Hardgrave

Nemhauser

(1962) On the relation between the traveling-salesman and the longest-path problems. Operations Research 10(5): 647–657. https://doi.org/10.1287/opre.10.5.647

37.

Held

Karp

(1962) A dynamic programming approach to sequencing problems. Journal of the Society for Industrial and Applied Mathematics 10(1): 196–210. https://doi.org/10.1137/0110015

38.

Hollinger

Sukhatme

(2014) Sampling-based robotic information gathering algorithms. The International Journal of Robotics Research 33(9): 1271–1287. https://doi.org/10.1177/0278364914533443

39.

Hsu

Latombe

Motwani

(1997) Path planning in expansive configuration spaces. Proceedings of International Conference on Robotics and Automation 3: 2719–2726.

40.

Huang

Zhou

Fan

, et al. (2023) Fael: fast autonomous exploration for large-scale environments with a mobile robot. IEEE Robotics and Automation Letters 8(3): 1667–1674. https://doi.org/10.1109/lra.2023.3236573

41.

Hutter

Gehring

Jud

, et al. (2016) Anymal-a highly mobile and dynamic quadrupedal robot. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 38–44.

42.

Jakkala

Akella

(2025) Fully differentiable adaptive informative path planning. In: 2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 5431–5437.

43.

Jatavallabhula

Kuwajerwala

, et al. (2023) Conceptfusion: Open-set multimodal 3d mapping. Robotics: Science and Systems (RSS).

44.

Jiang

Lei

Daniilidis

(2023) Fisherrf: active view selection and uncertainty quantification for radiance fields using Fisher information. arXiv Preprint arXiv:2311.17874.

45.

Karaman

Frazzoli

(2010) Incremental sampling-based algorithms for optimal motion planning. Robotics Science and Systems VI 104(2): 267–274.

46.

Karaman

Frazzoli

(2011) Sampling-based algorithms for optimal motion planning. The International Journal of Robotics Research 30(7): 846–894. https://doi.org/10.1177/0278364911406761

47.

Kavraki

Svestka

Latombe

, et al. (1996) Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation 12(4): 566–580. https://doi.org/10.1109/70.508439

48.

Kerbl

Kopanas

Leimkühler

, et al. (2023) 3d Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph 42(4): 139–141.

49.

Khanna

Mao

Jiang

, et al. (2024) Habitat synthetic scenes dataset (hssd-200): an analysis of 3d scene scale and realism tradeoffs for objectgoal navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16384–16393.

50.

Kirkpatrick

Gelatt

JCD

Vecchi

(1983) Optimization by simulated annealing. Science 220(4598): 671–680. https://doi.org/10.1126/science.220.4598.671

51.

Kompis

Bartolomei

Mascaro

, et al. (2021) Informed sampling exploration path planner for 3d reconstruction of large scenes. IEEE Robotics and Automation Letters 6(4): 7893–7900. https://doi.org/10.1109/lra.2021.3101856

52.

Kuffner

LaValle

(2000) Rrt-connect: an efficient approach to single-query path planning. Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065) 2: 995–1001.

53.

LaValle

(1998) Rapidly-exploring random trees: a new tool for path planning. Research Report 3: 9811.

54.

LaValle

Kuffner

JJJ

(2001) Randomized kinodynamic planning. The International Journal of Robotics Research 20(5): 378–400. https://doi.org/10.1177/02783640122067453

55.

Lawler

(1972) Optimal cycles in graphs and the minimal cost-to-time ratio problem. In: Periodic Optimization: Volume I: Course Held at the Department of Automation and Information, Vol. 1972, 37–60.

56.

Lee

Lin

(1986) Computational complexity of art gallery problems. IEEE Transactions on Information Theory 32(2): 276–282. https://doi.org/10.1109/tit.1986.1057165

57.

Lichev

Mihaylov

(2024) Annulus graphs in R d. Discrete Computational Geometry. Springer, 72(1). 379--401.

58.

Lim

Hsu

Lee

(2016) Adaptive informative path planning in metric spaces. The International Journal of Robotics Research 35(5): 585–598. https://doi.org/10.1177/0278364915596378

59.

Liu

Heidarsson

, et al. (2018) Data-driven learning and planning for environmental sampling. Journal of Field Robotics 35(5): 643–661. https://doi.org/10.1002/rob.21767

60.

Meister

Cotterell

Vieira

(2020) If beam search is the answer, what was the question? Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2173–2185.

61.

Meng

Qin

Chen

, et al. (2017) A two-stage optimized next-view planning framework for 3-d unknown environment exploration, and structural reconstruction. IEEE Robotics and Automation Letters 2(3): 1680–1687. https://doi.org/10.1109/lra.2017.2655144

62.

Moon

Chatterjee

Scherer

(2022) Tigris: an informed sampling-based algorithm for informative path planning. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 5760–5766.

63.

Oleynikova

Taylor

Fehr

, et al. (2017) Voxblox: incremental 3d euclidean signed distance fields for on-board mav planning. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 1366–1373.

64.

Ott

Kochenderfer

Boyd

(2024) Approximate sequential optimization for informative path planning. Robotics and Autonomous Systems 182: 104814. https://doi.org/10.1016/j.robot.2024.104814

65.

Otte

Frazzoli

(2016) Rrtx: asymptotically optimal single-query sampling-based motion planning with quick replanning. The International Journal of Robotics Research 35(7): 797–822. https://doi.org/10.1177/0278364915594679

66.

Popović

Hitz

Nieto

, et al. (2017) Online informative path planning for active classification using uavs. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 5753–5758.

67.

Popović

Vidal-Calleja

Hitz

, et al. (2020) An informative path planning framework for uav-based terrain monitoring. Autonomous Robots 44(6): 889–911. https://doi.org/10.1007/s10514-020-09903-2

68.

Tan

Zhang

, et al. (2024) Ippon: common sense guided informative path planning for object goal navigation. arXiv Preprint arXiv:2410.19697.

69.

Radzik

(2013) Fractional combinatorial optimization. In: Handbook of Combinatorial Optimization. Springer, pp. 1311–1355.

70.

Rückin

Jin

Popović

(2022) Adaptive informative path planning using deep reinforcement learning for uav-based active sensing. In: 2022 International Conference on Robotics and Automation (ICRA). IEEE, pp. 4473–4479.

71.

Rush

Chang

Collins

(2013) Optimal beam search for machine translation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 210–221.

72.

Savva

Kadian

Maksymets

, et al. (2019) Habitat: a platform for embodied ai research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, pp. 9339–9347.

73.

Schmid

Pantic

Khanna

, et al. (2020) An efficient sampling-based method for online informative path planning in unknown environments. IEEE Robotics and Automation Letters 5(2): 1500–1507. https://doi.org/10.1109/lra.2020.2969191

74.

Shah

Equi

Osiński

, et al. (2023) Navigation with large language models: semantic guesswork as a heuristic for planning. In: Conference on Robot Learning. PMLR, pp. 2683–2699.

75.

Singh

Krause

Guestrin

, et al. (2009) Efficient informative sensing using multiple robots. Journal of Artificial Intelligence Research 34: 707–755. https://doi.org/10.1613/jair.2674

76.

Sucan

Moll

Kavraki

(2012) The open motion planning library. IEEE Robotics & Automation Magazine 19(4): 72–82. https://doi.org/10.1109/mra.2012.2205651

77.

Sun

Chen

Leutenegger

, et al. (2025) Frontiernet: learning visual cues to explore. IEEE Robotics and Automation Letters 10: 914.

78.

Takmaz

Fedele

Sumner

, et al. (2023) Openmask3d: open-vocabulary 3d instance segmentation. Advances in Neural Information Processing Systems 36: 68367–68390.

79.

Vijayakumar

Cogswell

Selvaraju

, et al. (2018) Diverse beam search for improved description of complex scenes. Proceedings of the AAAI Conference on Artificial Intelligence 32: 12340. https://doi.org/10.1609/aaai.v32i1.12340

80.

Voudouris

(1997) Guided Local Search for Combinatorial Optimisation Problems. University of Essex.

81.

Witting

Fehr

Bähnemann

, et al. (2018) History-aware autonomous exploration in confined environments using mavs. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 1–9.

82.

Deng

Shimada

(2021) Autonomous uav exploration of dynamic environments via incremental sampling and probabilistic roadmap. IEEE Robotics and Automation Letters 6(2): 2729–2736. https://doi.org/10.1109/lra.2021.3062008

83.

Yadav

Krantz

Ramrakhya

, et al. (2023) Habitat challenge 2023. https://aihabitat.org/challenge/2023/

84.

Yamauchi

(1997) A frontier-based approach for autonomous exploration. In: Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97.’Towards New Computational Principles for Robotics and Automation. IEEE, pp. 146–151.

85.

Yoder

Scherer

(2016) Autonomous exploration for infrastructure modeling with a micro aerial vehicle. In: Field and Service Robotics: Results of the 10th International Conference. Springer, pp. 427–440.

86.

Schwager

Rus

(2016) Correlated orienteering problem and its application to persistent monitoring tasks. IEEE Transactions on Robotics 32(5): 1106–1118. https://doi.org/10.1109/tro.2016.2593450

87.

Shen

, et al. (2023) Echo: an efficient heuristic viewpoint determination method on frontier-based autonomous exploration for quadrotors. IEEE Robotics and Automation Letters 8(8): 5047–5054. https://doi.org/10.1109/lra.2023.3282783

88.

Zhang

Patil

, et al. (2025) Tag map: a text-based map for spatial reasoning and navigation with large language models Conference on Robot Learning. PMLR, pp. 2120–2146.

89.

Zhou

Zhang

Chen

, et al. (2021) Fuel: fast uav exploration using incremental frontier structure and hierarchical planning. IEEE Robotics and Automation Letters 6(2): 779–786. https://doi.org/10.1109/lra.2021.3051563

90.

Zhou

Shen

(2023) Racer: rapid collaborative exploration with a decentralized multi-uav system. IEEE Transactions on Robotics 39(3): 1816–1835. https://doi.org/10.1109/tro.2023.3236945