Efficiently closing loops in LiDAR-based SLAM using point cloud density maps

Abstract

Consistent maps are key for most autonomous mobile robots, and they often use SLAM approaches to build such maps. Loop closures via place recognition help to maintain accurate pose estimates by mitigating global drift, and are thus key for realizing an effective SLAM system. This paper presents a robust loop closure detection pipeline for outdoor SLAM with LiDAR-equipped robots. Our method handles various LiDAR sensors with different scanning patterns, fields of view, and resolutions. It generates local maps from LiDAR scans and aligns them using a ground alignment module to handle both planar and non-planar motion of the LiDAR, ensuring applicability across platforms. The method uses density-preserving bird’s-eye-view projections of these local maps and extracts ORB feature descriptors for place recognition. It stores the feature descriptors in a binary search tree for efficient retrieval, and self-similarity pruning addresses perceptual aliasing in repetitive environments. Extensive experiments on public and self-recorded datasets demonstrate accurate loop closure detection, long-term localization, and cross-platform multi-map alignment, agnostic to the LiDAR scanning patterns, fields of view, and motion profiles. We provide the code for our pipeline as open-source software at https://github.com/PRBonn/MapClosures.

Keywords

SLAM localization mapping LiDAR-based place recognition

1. Introduction

Mobile robots must navigate their surroundings safely and efficiently. They need to know their location within the environment to successfully navigate to a desired place or explore new areas. Accurate ego-motion estimation helps robots generate accurate maps of the environment, which they can use for navigation. Traditionally, robots often localize themselves using data acquired through exteroceptive sensors like cameras and laser range sensors (LiDAR), proprioceptive sensors like wheel odometers and IMUs, or a combination. Depending on the application and availability, outdoor systems often also exploit GNSS for global positioning.

LiDAR sensors are frequently used in robotics due to their accurate and dense 3D range data. Many previous studies have advanced the field of sequential pose estimation using LiDAR sensors (Dellenbach et al., 2022; Ferrari et al., 2024; Fontana et al., 2016; Guadagnino et al., 2022; Vizzo et al., 2023). Such sequential pose estimation alone, also called sensor odometry, suffers from drifting pose estimates over time due to inherent noise in the robot motion and sensor data, dynamics in the environment, and non-trivial data association problems.

We can compensate for such drift and improve pose estimation by recognizing places previously visited by the robot. This task is referred to as loop closing. It allows the use of geometric information from the LiDAR observations at revisited locations to correct the pose drift, for example, through pose-graph optimization. Robust loop closure detection is paramount in simultaneous localization and mapping (SLAM) systems. Robots must recognize that they have returned to a previously visited place to close a loop.

Place recognition is a key sub-task within loop closure detection, which performs data associations between the robot’s current view and a database of previously seen places. Generating a database of places is a non-trivial task that requires crafting an as unique as possible description of the environment, invariant to changes in viewpoint. Also, such a database should filter dynamic objects appearing in the sensor data to achieve robust place recognition across long time intervals. Perceptual aliasing is another challenge, as distinct places with similar structures can confuse place recognition algorithms. This could lead to false-positive loop closures that can adversely impact global SLAM estimates (Bailey and Durrant-Whyte, 2006; Blanco et al., 2013; Ramos et al., 2007). Furthermore, to perform place recognition for loop closing in a SLAM system, the database should encode the scene’s geometry to allow for relative pose estimation between different revisit viewpoints for the pose-graph optimization.

Global feature descriptors computed over the entire point clouds are easy to store in a database and allow for quick retrieval of revisit candidates (Kim et al., 2021; Lu et al., 2022; Luo et al., 2025; Uy and Lee, 2018; Xu et al., 2023). However, such methods often do not provide an initial geometric alignment between the detected loop closures and require an additional point cloud registration step to perform such an alignment. In contrast, feature descriptors computed over local 3D patches within a point cloud can aid the geometric alignment of revisited locations (Blanco et al., 2013; Steder et al., 2011; Yang et al., 2017). However, they require a nuanced approach to store the feature descriptors efficiently in a database.

Due to the algorithmic complexity of processing 3D point clouds to obtain local or global feature descriptors, several methods perform a bird’s-eye-view (BEV) projection (Kim et al., 2021; Li and Li, 2021; Luo et al., 2023) or a cylindrical projection (Chen et al., 2020; Ma et al., 2022; Steder et al., 2010) of the point clouds. This results in a 2D representation of the point cloud data, allowing faster feature detection and matching for online loop closure detection in SLAM.

The main contribution of this article is a robust loop closure detection pipeline that works with various LiDAR sensors having different motion profiles, invariant to their scanning pattern, field of view (FoV), and resolution. We generate local maps by aggregating consecutive scans and creating a density-preserving BEV projection of the local maps as an intermediate 2D representation for computing local features describing the structural information in the 3D local maps. Since the ground planes can be consistently identified in outdoor environments during revisits, we use them as a reference plane to ensure consistent BEV projections across revisits. We propose a ground alignment module to identify such a ground plane in each local map and transform the local map such that this ground plane coincides with the local xy-plane of the reference frame. This simplifies the BEV projection and restricts the loop closure alignment to two dimensions on the common ground plane and makes our approach applicable to various motion profiles of LiDAR sensors. We store binary ORB feature descriptors from the BEV projection into a Hamming distance embedding binary search tree (HBST). We design our approach to be robust against the effects of scene similarity, also known as perceptual aliasing, by performing a self-similarity pruning of the feature descriptors before inserting them in the database. Using a Hamming distance metric, we obtain loop closure candidates by matching feature descriptors from query local maps against this database. Our subsequent random sample consensus (RANSAC) based geometric validation step provides loop closures along with a 2D alignment between BEV projections of the local maps. When combined with the ground alignment estimate of each local map, we obtain a complete 3D estimate of the global alignment between the local maps. We provide an extensive experimental evaluation on multiple datasets, with sequences recorded with a wide range of LiDAR sensors mounted on different mobile platforms, testing the accuracy and robustness of our pipeline in challenging scenarios.

In sum, we make six key claims, which we support with the paper and our experimental evaluation. Our approach

(1) Detects loop closures between local maps generated from various LiDAR sensors with different scanning patterns, FoVs, and resolutions;

(2) Performs multi-session loop closure detection and alignment with long-term revisits;

(3) Works with handheld platforms having non-planar motion in the LiDAR sensor frame;

(4) Is robust against perceptual aliasing in environments with repetitive structures;

(5) Provides a complete 3D rigid-body transform to align the detected loop closures;

(6) Detects loop closures between sequences having minimal overlap, recorded with different LiDAR sensor platforms, enabling cross-platform multi-map alignment.

This article extends our previous conference paper (Gupta et al., 2024), which proposed loop closure detection using local maps by detecting local feature descriptors from their BEV projections and generating a binary search tree database. It extends the earlier work in the following critical ways: (1) relaxation of the planar-motion assumption through a new ground alignment module; (2) a complete 3D alignment estimate of the detected loop closures; (3) a pruning strategy on the ORB feature descriptors computed from the BEV density images to mitigate the issues with perceptual aliasing; (4) an extensive experimental evaluation on multiple datasets, with sequences recorded with a wide range of LiDAR sensors mounted on different mobile platforms, testing the accuracy and robustness of our pipeline in challenging scenarios.

The open-source implementation of our previous approach and the proposed approach are available at https://github.com/PRBonn/MapClosures.

2. Related work

Zhang et al. (2024) provide a thorough literature review on place recognition for loop closures in 3D LiDAR SLAM and Yin et al. (2024) on LiDAR-based global localization. This section briefly overviews key approaches for LiDAR-based loop closure detection in SLAM.

The most straightforward approach to loop closure detection in SLAM is using the individual scans and their poses obtained from odometry. The similarity of poses between non-consecutive scans within a search radius is used to detect loop closures (Rottmann et al., 2019; Shan et al., 2020). S4-SLAM (Zhou et al., 2021) and the approach by Mendes et al. (2016) use the odometry information to compute the overlap between point clouds to check if they are recorded from the same location. Chen et al. (2020) use convolutional neural networks to calculate the overlap between the range image representations of point clouds to detect loops. These methods primarily perform a place retrieval task, often requiring a subsequent point cloud registration step to validate the loop closures. Furthermore, they can be sensitive to the magnitude of drift present in the odometry pose estimates, requiring a search radius proportional to the drift.

2.1. Methods using 3D features from point clouds

Place recognition has been widely studied in the context of camera images (Cummins and Newman, 2008; Galvez-Lopez and Tardos, 2012; Mur-Artal et al., 2015; Vysotska and Stachniss, 2016, 2019). Consequently, much research on LiDAR place recognition for loop closing has been motivated by approaches in the camera vision domain. In particular, many approaches have extended the idea of local feature descriptors to 3D point clouds (Johnson and Hebert, 1999; Rusu et al., 2009; Salti et al., 2014), discretizing the neighborhood around 3D features into a geometrical grid. They compute a descriptor from the points in this neighborhood based on their height (Bosse and Zlot, 2013), density (Frome et al., 2004; Tombari et al., 2010), or distance and angle (Rusu et al., 2008). These local feature descriptor-based methods estimate a complete 3D relative pose between the loop closures. However, they have a high computational cost to identify such feature descriptors from 3D point clouds. They are sensitive to the radially increasing sparsity of point clouds obtained from a standard spinning LiDAR.

Several methods (Kim et al., 2024; Magnusson et al., 2009; Röhling et al., 2015; Uy and Lee, 2018) take an orthogonal approach by computing one global descriptor per LiDAR scan. Magnusson et al. (2009) superimpose a voxel grid on the point clouds and approximate a normal distribution over points in each grid cell. They compute a global descriptor for the point clouds by computing a histogram over these normal distributions. More recently, Kim et al. (2024) proposed a lightweight global descriptor with SOLiD by representing point clouds in polar coordinates and discretizing them into range-elevation and azimuth-elevation directions, respectively, with each discrete bin storing the number of points. Such global descriptors are easy to store in a simple database, resulting in a faster matching process to retrieve loop closure candidates. However, global descriptor-based methods typically require revisits within small error margins around the reference pose to limit the variation in the global descriptor. They often do not generalize when detecting loop closures across LiDARs with different scanning patterns and resolutions.

2.2. Two-dimensional projection-based methods

Many recent works focus on speeding up the recognition by describing 3D point clouds by a 2D projection (Kim and Kim, 2018; Li and Li, 2021; Luo et al., 2023; Xu et al., 2023; Yuan et al., 2024). Even though such a projection results in loss of geometric information, these methods perform comparably and sometimes even better than their full 3D counterparts, especially in outdoor scenes. Among the two-dimensional projection-based methods, the cylindrical projection into range images (Chen et al., 2020; Shan et al., 2021; Steder et al., 2010, 2011) and the orthogonal projection into BEV images (Kim et al., 2021; Lu et al., 2022; Luo et al., 2021; Wang et al., 2020; Yuan et al., 2023) are widely used 2D projections.

Range image projections are equivariant to azimuthal rotations due to the underlying cylindrical projection. They can recover the relative yaw between loop closures but suffer from scale distortions due to lateral shifts in the LiDAR viewpoint. On the other hand, BEV projections preserve 2D geometry along the local ground plane, which is vital for autonomous ground robots. Among the BEV projection methods, the elevation map is widely used (Kim et al., 2021; Kim and Kim, 2018; Luo et al., 2023; Yuan et al., 2023). An elevation map preserves the maximum elevation within each discrete pillar in the BEV representation.

Scan Context by Kim and Kim (2018) is a popular LiDAR-based loop closure approach. It computes a global descriptor for each LiDAR scan using the elevation map in polar coordinates. The polar coordinates make the descriptor equivariant to in-plane rotations, thereby allowing the loop closure alignment module to estimate the relative yaw between LiDAR scans from the same location. However, the polar coordinate representation can be sensitive to in-plane translations. Scan Context++ (Kim et al., 2021) improves upon this drawback of Scan Context and computes elevation maps in Cartesian coordinates to augment the polar elevation maps. However, preserving the maximum elevation of the points after the BEV projection makes the two approaches sensitive to the vertical resolution and FoV of the LiDAR.

In contrast, BVMatch (Luo et al., 2021) proposes a density-preserving BEV projection of point clouds and a Log Gabor filtering step. It requires training a bag-of-words model to retrieve loop closure candidates, making it less suitable for real-time applications such as SLAM. Our approach also uses a density-preserving BEV projection. It generates a database of local feature descriptors from the BEV image online using the HBST data structure (Schlegel and Grisetti, 2018) for efficient operation. The use of local feature descriptors for loop closure detection has also been previously explored in the 2D LiDAR domain with traditional occupancy grid maps (Blanco et al., 2013).

Yuan et al. (2023) propose utilizing a set of geometric primitives, that is, triangles with unique side lengths, to describe a point cloud. They obtain the feature vertices of such triangles by a local BEV projection of points within voxels that lie at the boundaries of large planar regions in the environment. BTC (Yuan et al., 2024) improves upon this work by proposing a binary descriptor for a detailed representation of local geometry. They combine the triangle descriptors and the newly proposed binary descriptors to perform loop closure retrieval. The works by Yuan et al. (2023, 2024) use a local map representation of the environment by accumulating a fixed number of consecutive scans. This helps them tackle the sparsity issue present with rotating 3D LiDAR. The local maps also make their approach generalizable to LiDAR sensors with non-repetitive scanning patterns and small FoVs. We also use a local map representation to detect loop closures. However, unlike STD and BTC, we accumulate scans until a minimum displacement of the platform to ensure that the local maps capture a sufficient portion of the scene.

2.3. Learning-based approaches

The recent popularity of deep neural networks, attributed to the widespread availability of specialized hardware and software to train such networks, has generated much interest from a LiDAR-based loop closing perspective (Chen et al., 2020; Dubé et al., 2018; Komorowski, 2021; Ma et al., 2022; Ma et al. 2024). PointNet (Qi et al., 2017b) and PointNet++ (Qi et al., 2017a) use multi-layer perceptron networks to detect local feature descriptors from point clouds. PointNetVLAD (Uy and Lee, 2018) proposes an end-to-end trained neural network that combines the local feature descriptors obtained from PointNet into a global descriptor using NetVLAD (Arandjelovic et al., 2016). SegMap (Dubé et al., 2018) segments point clouds using a region-growing algorithm and computes data-driven descriptors of these segments to detect loop closures.

BEV projections are also popular with learning-based approaches (Kim et al., 2019; Luo et al., 2023, 2025; Xu et al., 2021b). Kim et al. (2019) repurpose the Scan Context (Kim and Kim, 2018) image as a three-channel image using a jet colormap over a range of structural heights. They train a convolutional neural network using these Scan Context images and perform localization as a classification task. Xu et al. (2021b) propose an encoder-decoder network to extract descriptors from differentiable Scan Context images.

Vidanapathirana et al. (2022) use a global descriptor obtained from a sparse convolutional neural network to perform loop closures. They propose a local-consistency loss to train their approach to obtain consistent local features across revisits, which they later combine into a global descriptor using a pooling and normalization strategy.

BEVPlace (Luo et al., 2023) applies group convolutions corresponding to the $S O (2)$ rotational group to obtain rotation equivariant local features on BEV density images and computes a global descriptor for place retrieval using NetVLAD. BEVPlace++ (Luo et al., 2025) improves upon BEVPlace by proposing a novel rotation equivariant module, which passes rotation-warped BEV density images through a convolutional neural network to extract the local features.

Recently, Ramezani et al. (2024) proposed a place recognition method based on an attentional graph neural network representation. Their approach leverages the topological relationships between consecutive LiDAR scans within a pose-graph SLAM system by generating sub-graphs according to a travel distance criterion. They use a multi-layer perceptron to encode the odometry poses and global descriptor associated with every scan within each of the sub-graphs. For a candidate pair of sub-graphs, they construct a fully connected multiplex graph from the encoded nodes and process it through an attentional graph neural network, P-GAT, to perform place recognition.

Even though learning-based approaches achieve impressive performance regarding loop closure precision and recall metrics, they typically require a computationally expensive offline pre-training step on a GPU, affecting their application to real-time loop closure detection in SLAM.

Our method also exploits the spatio-temporal information across consecutive LiDAR scans by constructing local maps as the primary representation for loop closure detection, rather than relying on individual scans. Following our earlier work (Gupta et al., 2024), we generate local maps based on a travel displacement criteria, in contrast to Yuan et al. (2023, 2024), who accumulate a fixed number of scans. This strategy helps us overcome the sparsity of rotating LiDAR and make our method agnostic to the LiDAR sensor’s scanning pattern, FoV, and resolution. We use a density-preserving BEV projection of the local maps on the local ground plane to reduce dimensionality for computational efficiency. Subsequently, we compute ORB feature descriptors (Rublee et al., 2011) from the BEV density images and store them in an HBST database (Schlegel and Grisetti, 2018) for place recognition. We mitigate the negative impact of perceptual aliasing on our feature descriptor-matching algorithm by performing a self-similarity pruning of the ORB feature descriptors (Bosse et al., 2004). Our loop closure pipeline provides a complete 3D global alignment between the local maps.

3. Our methodology

We propose an approach to detect loop closures between local maps using their BEV representation that works with both vehicle-mounted and handheld LiDAR sensor platforms. It also provides a complete 3D initial guess of the relative pose between local maps involved in the loop closures. We present an overview of our pipeline in Figure 1.

Figure 1.

Overview of our pipeline for loop closure detection and alignment. Given a local map $M_{m}$ generated using a travel displacement criterion: (1) We estimate a ground alignment transform ${}^{g}T_{m}$ that aligns the ground plane with the xy-plane of the reference frame. (2) We project the ground-aligned local map $M_{m}^{g}$ onto the local xy-plane to obtain a 2D density image $I_{m}$ . (3) From this density image, we extract ORB feature descriptors $D_{m}$ and apply self-similarity pruning to remove redundant features. (4) We incrementally insert the pruned feature descriptors into a binary search tree database, which returns feature matches between the query local map $M_{m}^{g}$ with feature descriptors $D_{m}$ and reference local maps ${M_{r}^{g}}$ with feature descriptors ${D_{r}}$ . These matches form the initial loop closure candidates. (5) We verify candidates geometrically using RANSAC, obtaining the number of inliers and the corresponding 2D transform $T_{BEV}^{}$ . We classify candidates as a loop closures if the number of inliers exceeds a threshold. (6) Finally, our pipeline outputs a relative 3D transform ${}^{m}{T_{r}}$ that provides an initial alignment of the two local maps, which a subsequent pose-graph optimizer can refine.

We explain in Section 3.1 the procedure to generate local maps from LiDAR point clouds based on a travel displacement criterion. Then, in Section 3.2, we present an approach to detect and align the ground plane in the local maps to the xy-plane of the local reference frame. This allows us to compute a BEV representation even in scenarios with a non-planar motion of the LiDAR sensor, as explained in Section 3.3. We compute binary descriptors for point features on these BEV images, as presented in Section 3.4. To enable place recognition using these binary feature descriptors, we store them in an HBST database (Schlegel and Grisetti, 2018) for subsequent matching with query local maps as presented in Section 3.5. We obtain a set of candidate loop closures for new feature descriptors from a query local map by matching these feature descriptors to the database. We perform loop closure validation and geometric alignment in 2D between the matched feature descriptors using a RANSAC (Fischler and Bolles, 1981) scheme described in Section 3.6. This 2D alignment, together with the ground alignment for each local map, provides the relative 3D transform aligning each pair of loop-closed local maps.

3.1. Creation of local maps

Our approach uses local point cloud maps of the environment to perform loop closures. By aggregating downsampled LiDAR scans over time, we exploit the local consistency of sequential odometry estimates to generate these local maps. We use KISS-ICP (Vizzo et al., 2023) to obtain LiDAR odometry for our pipeline.

Given an online stream of sequential point clouds ${P_{1}, \dots, P_{n}}$ in the LiDAR frame and their corresponding 3D pose estimates ${{}^{w}T_{1}, \dots, {}^{w}T_{n}}$ in the odometry frame w, we first transform the point clouds into the odometry frame, resulting in point clouds ${P_{1}^{w}, \dots, P_{n}^{w}}$ .

Starting from a point cloud with index i, we consider k subsequent scans until the displacement ${‖ {}^{w}{t_{i + k - 1} - {}^{w}{t_{i}}} ‖}_{2}$ exceeds a threshold $τ_{c}$ , where ${}^{w}t_{i} \in ℝ^{3}$ is the translational component of the pose estimate ${}^{w}T_{i} \in S E (3)$ . Once we reach the displacement threshold $τ_{c}$ , we aggregate this subset of scans ${P_{i}^{w}, \dots, P_{i + k - 1}^{w}}$ into a local map $M_{m}^{w}$ where m is the local map index. Specifically, we use a voxel grid-based downsampling strategy to generate the local map, using a resolution of ν_map meters per voxel. Such a downsampling strategy also ensures a uniform spatial distribution of scanned points by retaining at most 20 points per voxel. We repeat this process of local map generation sequentially, thereby generating mutually exclusive sets of consecutive point clouds, as shown in Figure 2.

Figure 2.

A block diagram showcasing the composition of a local map $M_{m}$ , generated using point clouds ${P_{i}^{}, \dots, P_{i + k - 1}^{}}$ registered with corresponding odometry pose estimates ${{}^{w}T_{i}, \dots, {}^{w}T_{i + k - 1}}$ . The local map is referenced to the local coordinate frame of the first scan in this local map, as highlighted in blue.

We transform each local map $M_{m}^{w}$ into the local reference frame ${}^{w}T_{i}$ of the first scan in the sub-sequence as follows:

M_{m} = {}^{w}T_{i}^{- 1} M_{m}^{w} .

(1)

Aggregating consecutive LiDAR scans into local maps provides richer structural information than individual scans, as illustrated in Figure 3. This larger spatial context is particularly beneficial in multi-session scenarios, where local environmental changes may otherwise hinder place recognition. Furthermore, local maps generated at the same location exhibit higher similarity across different LiDAR resolutions and scanning patterns, whereas inter-LiDAR place recognition with individual scans remains a challenging data association problem, even for humans, as illustrated in Figure 3. As a result, the use of local maps improves robustness for multi-session and cross-platform loop closure detection.

Figure 3.

A comparison of data from LiDARs with different scanning patterns and FoVs. The first row shows a single scan from an Ouster OS2-128 LiDAR with a 360° × 22.5° FoV and its corresponding local map. The second row shows a Livox Avia LiDAR with a 70° × 77° FoV and its corresponding local map. The similar structural features across local maps are highlighted with ellipses.

3.2. Ground plane detection and alignment

After generating the point cloud local map $M_{m}$ , we perform a BEV projection. To ensure consistency across multiple revisits, we must project the point cloud onto a consistent physical plane in the environment. The ground plane provides a natural choice for this plane and enables the computation of a stable basis for the BEV projection. In our earlier work (Gupta et al., 2024), we assumed planar sensor motion so that the ground plane remained parallel to the xy-plane of the local map’s reference frame. However, when the LiDAR follows a non-planar trajectory, the local ground plane tilts relative to the xy-plane, as shown in Figure 4. To address this, we explicitly identify the ground plane and align it with the xy-plane of the reference frame.

Figure 4.

Effect of ground alignment on a local map generated from a handheld LiDAR sensor, recorded in a forest, with non-planar motion. The colors of the points represent their z-coordinates in the local reference frame. Left: Our ground alignment approach samples ground points $G_{m}$ (highlighted in red) from the local map $M_{m}$ and computes an alignment ${}^{g}T_{m}$ to the xy-plane. Right: The ground-aligned local map $M_{m}^{g}$ .

The first step identifies candidate ground points in the local map. Semantic segmentation networks can label ground in point clouds (Paigwar et al., 2020; Xu et al., 2021a), but they require offline GPU training and add significant computational costs. Real-time CPU-based methods such as Patchwork (Lim et al., 2021) and Patchwork++ (Lee et al., 2022) operate on individual scans, so they require per-scan segmentation and careful parameter tuning. In contrast, our BEV projection requires only one approximate estimate of the local ground plane to align the entire local map at once, not a precise segmentation for every scan.

We adopt a simple sampling strategy: in a 3D LiDAR local map, the lowest-elevation points usually correspond to the ground (Hu et al., 2014). This assumption generally holds for typical outdoor sensing platforms with known extrinsics, where the ground plane lies within the LiDAR’s field of view. It also remains valid when the local map’s xy-plane is not perfectly aligned with the ground plane, since a LiDAR cannot scan below the ground surface.

We divide the local map $M_{m}$ into a 2D grid over the xy-plane with 1.0 m cells. In each cell, we retain only the point with the minimum z-coordinate, yielding a sparse set of candidate ground points, as shown in Figure 4. This set may still contain non-ground points where the ground is occluded or where the xy-plane is misaligned with the ground plane. To filter these, we run principal component analysis (PCA) on the normal vectors associated with the sampled points and retain only samples whose normals have a cosine similarity larger than 0.95 with the first principal component. This step removes most non-ground points and yields the ground sample set $G_{m}$ . The first principal component also provides a strong initial estimate of the ground orientation.

We convert this PCA estimate into an initial transform $T_{init} \in S E (3)$ for ground alignment. Specifically, we compute an axis–angle rotation that aligns the first principal component with the z-axis of the local map frame:

\begin{align} a = υ_{pc} \times {[0 0 1]}^{T}, \end{align}

(2)

\begin{align} θ = \arccos (υ_{pc} \cdot {[0 0 1]}^{T}), \end{align}

(3)

where υ _pc denotes the first principal component, a the axis of rotation, and θ the rotation angle. We convert this axis–angle pair into a rotation matrix

R_{init} \in S O (3)

using Rodrigues’ formula. We also compute a mean z-shift:

z_{init} = r_{2} \cdot {\bar{g}}_{m},

(4)

where r ₂ is the last row of

R_{init}

and

{\bar{g}}_{m}

is the mean of

G_{m}

. We combine

R_{init}

and

z_{init}

into the initial transform

T_{init}

To make the ground alignment robust to outliers, we refine $T_{init}$ with a least-squares optimization:

{}^{g}T_{m} = \underset{T_{init}}{argmin} \sum_{\forall g_{k} \in G_{m}} {w_{k} ({\hat{n}}_{x y} \cdot (T_{init} g_{k}))}^{2},

(5)

where

T_{init} g_{k} = {[x_{k}^{'} y_{k}^{'} z_{k}^{'}]}^{T}

denotes the

S E (3)

group action on

R^{3}

{\hat{n}}_{x y} = {[0 0 1]}^{T}

is the normal vector of the xy-plane, and

w_{k}

is a Gaussian weight that reduces the influence of spurious outliers. We iterate this optimization, stopping after 10 iterations or once the pose correction between iterations falls below a threshold.

We compute the Jacobian $J$ of the residual term ${\hat{n}}_{x y} \cdot (T_{init} g_{k})$ in equation (5) with respect to a perturbation $Δ x = {[Δ t^{T} Δ ω^{T}]}^{T}$ around $T_{init}$ on the $S E (3)$ manifold:

J = [0 0 1 x_{k}^{'} - y_{k}^{'} 0] .

(6)

The Jacobian indicates that the optimization only updates the z-axis translation and the roll (x-axis) and pitch (y-axis), which directly correct the ground alignment.

Finally, we apply the ground-aligning transform ${}^{g}T_{m}$ to the local map $M_{m}$ to obtain the aligned local map $M_{m}^{g}$ whose ground plane lies on the xy-plane. The alignment step increases robustness to non-planar LiDAR motion profiles, particularly for handheld platforms. It establishes the ground plane as a robust reference plane for BEV projections during long-term revisits, since the ground surface typically remains consistent over time. We show an example of this alignment in Figure 4 and provide a detailed analysis in Section 5.3.

3.3. Density-preserving bird’s-eye-view projection

After establishing the primary representation of the environment in Sections 3.1 and 3.2, the next step is to extract informative yet distinct local feature descriptors to perform place recognition for loop closures. Rather than computing features directly on the 3D point clouds of local maps, which is computationally expensive given the spatial extent of such point clouds, we use a 2D BEV projection of the local maps as an intermediate representation for loop closure detection.

We perform the BEV projection by simply dropping the z-coordinate of all the individual points in the ground-aligned local maps $M_{m}^{g}$ . Furthermore, we discretize the $R^{2}$ projection space with a resolution of ν_b meters. This results in a 2D Cartesian grid $N_{m} (u, v)$ of size W_m × H_m, where

\begin{align} W_{m} & = ⌊ \frac{x_{m}^{u}}{ν_{b}} ⌋ - ⌊ \frac{x_{m}^{l}}{ν_{b}} ⌋ + 1, \end{align}

(7)

\begin{align} H_{m} & = ⌊ \frac{y_{m}^{u}}{ν_{b}} ⌋ - ⌊ \frac{y_{m}^{l}}{ν_{b}} ⌋ + 1, \end{align}

(8)

[\begin{matrix} x_{m}^{u} \\ y_{m}^{u} \end{matrix}] = \max_{x, y} M_{m}^{g}; [\begin{matrix} x_{m}^{l} \\ y_{m}^{l} \end{matrix}] = \min_{x, y} M_{m}^{g} .

(9)

However, the dimensionality reduction comes at the cost of losing the complete 3D information about the scene. Many traditional BEV projection approaches store the maximum elevation of the points in each cell (Kim et al., 2021; Kim and Kim, 2018; Li and Li, 2021; Luo et al., 2022). The maximum elevation, however, is sensitive to the distance between the scanner and the surface being scanned, as well as the LiDAR sensor’s FoV. In our pipeline, we instead store the point density in each discrete 2D cell after the projection, which is less sensitive to viewpoint changes (Luo et al., 2021).

Therefore, each cell in this grid $N_{m} (u, v) \in ℕ_{0}$ stores the count of projected points within that cell. The BEV image $I_{m} (u, v)$ of the local map $M_{m}^{g}$ is then defined as the relative point density in each cell of the grid as follows:

I_{m} (u, v) = \frac{N_{m} (u, v) - N_{min}}{N_{max} - N_{min}} \in ℝ^{W_{m} \times H_{m}},

(10)

N_{max} = \max N_{m} (u, v); N_{min} = \min N_{m} (u, v) .

(11)

To mitigate the undesired influence of dynamic objects that accumulate during local map generation, we explicitly set all image pixels $N_{m} (u, v)$ with a relative density lower than 5% to zero. This choice reflects the fact that most dynamic objects commonly found in urban environments, such as vehicles, pedestrians, and cyclists, have small vertical footprints and therefore produce low-density values.

3.4. Feature detection and pruning strategy

The BEV projection reduces the dimensionality of the local map, making feature detection for place recognition computationally efficient. The image-like BEV density representation enables us to apply well-established computer vision techniques to extract distinctive feature descriptors. Since these BEV density images preserve the geometrical, floorplan-like structure of the environment, we use ORB (Rublee et al., 2011) feature descriptors to capture relevant features. Unlike camera images, BEV images generated via orthographic projection have no scale ambiguity. We take advantage of this property by computing the ORB feature descriptors $D_{m}$ without applying scale-invariance adjustments, further improving computational efficiency.

Figure 5 shows an example of ORB features extracted from a BEV density image. We observe that most ORB features concentrate around high-density regions with strong gradients. These strong responses typically arise from static structures with large vertical footprints, such as building facades, trees, and light poles. In contrast, low-density regions, often corresponding to small bushes, vegetation, or dynamic objects that occasionally pass through the low-density filter, contribute few ORB features to the database. This leads to a robust behavior of our algorithm as it is not adversely affected by dynamic objects in the environment.

Figure 5.

A BEV density image of a local map, where darker pixels indicate higher point density. Red dots mark the ORB features extracted from the image.

ORB feature descriptors remain salient within a local neighborhood of the density image, but not necessarily across the entire image. As a result, environments with repetitive structures can generate self-similar feature descriptors that can confuse the feature-matching stage and cause false positives.

To improve robustness against perceptual aliasing, we prune the ORB feature descriptors within each density image using a self-similarity check inspired by Bosse et al. (2004). Specifically, we compute the nearest-neighbor match for each ORB feature descriptor based on the Hamming distance and discard any feature whose closest match lies below a threshold of τ_pr = 35 bits. This procedure eliminates self-similar feature descriptors within the same density image. This approach resembles Lowe’s ratio test (Lowe, 2004), but we apply it intra-image rather than inter-image, which avoids the cost of finding second-best matches across a larger feature descriptor database.

This pruning step filters out features from repetitive structures in the environment, such as repeating trusses on a bridge, as shown in Figure 6. This helps reduce false-positive feature matches across density images resulting from perceptual aliasing, thereby lowering the risk of incorrect loop closure detections. We present a detailed analysis of this pruning strategy and its impact on loop closure detection in Section 5.4.

Figure 6.

An example of self-similarity feature pruning applied to ORB feature descriptors computed on a BEV density image of a bridge with repetitive mechanical structures. Our algorithm prunes the features shown in red and retains those in green.

3.5. Feature database

Once we have the unique features within a BEV density image, we create a database to serve as a reference for place recognition. We leverage the binary domain of the ORB descriptors, using the Hamming distance embedding binary search tree (HBST) (Schlegel and Grisetti, 2018) to store the set of feature descriptors $D_{m}$ obtained from each density image $I_{m}$ , along with their corresponding map index m.

The depth of the HBST is limited by the number of bits in the binary descriptor. As a result, a query descriptor will require at most 256 bitwise comparisons with the tree’s nodes before reaching a leaf node. Each leaf node can hold a maximum of 100 descriptors, ensuring efficient feature matching. These design choices constrain the computational time for feature matching without imposing practical limitations on the usability of our approach. Unlike a clustering-based bag of words or a neural representation-based database, we use the HBST as an incrementally updated database, requiring no offline pre-training.

After obtaining a set of descriptors $D_{m}$ from the query local map’s BEV density image $I_{m}$ , we search the HBST database to find the closest match for each descriptor $d_{j, m} \in D_{m}$ . Matches are determined using a Hamming distance threshold of 50 bits between ORB descriptors. Since the binary tree links each stored descriptor to an index m corresponding to its local map, we compile a list of feature matches between the query local map and the reference local maps in the database. These matches are subsequently verified geometrically. Figure 7 visualizes the corresponding set of feature matches between two density images, as obtained from the HBST database.

Figure 7.

ORB descriptor matches between features from a reference and a query BEV density image obtained from the HBST database. The inlier correspondences obtained from the RANSAC-based geometric verification are shown with green lines, and the outlier correspondences with red lines.

3.6. Loop detection and map alignment

The geometric validation step entails performing a 2D alignment of the matched ORB features. This involves computing a 2D rigid-body transformation that optimally aligns the matched features from the binary search tree using a distance metric. Unlike the general image-alignment problem, this process is constrained to an $S E (2)$ transformation rather than a homography. To handle outlier associations caused by the limitations of binary tree-based matching, we employ a RANSAC-based alignment strategy.

We design a RANSAC-based approach that randomly selects two feature pairs from the set of matches between a query ( $I_{m}$ ) and a reference ( $I_{r}$ ) density image. We use the Kabsch-Umeyama algorithm (Kabsch, 1976; Umeyama, 1991) to compute the relative 2D alignment between the feature pairs. We use this alignment to compute the point-wise Euclidean distance between each pair of feature matches. Feature matches with distances larger than 1.5 m (3 pixels for ν_b = 0.5 m) are considered outliers. The RANSAC process runs for a fixed number of iterations N_ransac. We require a minimum number of inliers γ from the RANSAC alignment stage to decide whether two local maps belong to the same location.

Finally, after the N_ransac iterations, we compute a Kabsch-Umeyama 2D alignment over the entire set of inlier correspondences. It provides us with a rotation matrix $R \in S O (2)$ and a translation vector $t \in R^{2}$ , which we represent as a homogenous transformation $T_{BEV} \in S E (3)$ , having translation components only along the xy-plane, and a rotation component only about the z-axis. We scale the translation vector by the voxel size (ν_b) since the features are computed from the discretized density images.

Although $T_{BEV}$ only aligns the ground-aligned local maps, we can recover the complete $S E (3)$ pose to align the original local maps by composing $T_{BEV}$ with the ground-aligning transforms of the individual local maps, as shown in equation (12).

{}^{m}T_{r} = {}^{g}T_{m}^{- 1} T_{BEV} {}^{g}T_{r} .

(12)

This 3D alignment serves as an initial estimate for a fine-grained registration of local maps during typical pose-graph optimization in a SLAM pipeline. A qualitative example demonstrating the effectiveness of this initial alignment is illustrated in Figure 8. Notably, one of the local maps (in blue) in this visualization contains a large trace from a dynamic object (highlighted by the green ellipse), showing that our pipeline can also detect and align loop closures even in the presence of such strong dynamics.

Figure 8.

Two local maps (in red and blue) detected as loop closure and aligned using the initial estimate ${}^{m}T_{r}$ provided by our pipeline. Dynamic objects in the scene, as seen in the area highlighted by a green ellipse, do not affect the alignment quality provided by our approach. The red ellipse highlights the overlap of windowpanes on a building.

4. Experimental setup for evaluation

This section presents our experimental setup for evaluating our method and comparing it to existing loop closure detection approaches. We begin by introducing the datasets we use for evaluation and benchmarking. Next, we provide details about the implementation of the baseline methods we include in our comparison. Finally, we describe the quantitative metrics we use to assess the performance of our loop closure detection pipeline and the competing baselines.

4.1. Datasets

We conduct a comprehensive evaluation of our approach on datasets collected using a variety of LiDAR sensors with different resolutions and scanning patterns. These datasets include multiple sequences recorded in diverse urban environments across various mobile platforms.

4.1.1. Public datasets

We use the MulRan (Kim et al., 2020) and HeLiPR (Jung et al., 2024) public datasets, both recorded in urban environments using LiDAR sensors mounted on car rooftops. These datasets are well-suited for intra-session loop closure evaluation due to frequent in-sequence revisits with varying orientations and lane shifts. In addition, both datasets include at least three sequences per environment, making them ideal for evaluating inter-session loop closure detection.

The MulRan dataset was recorded using a 64-beam rotating LiDAR (Ouster OS1-64) operating at 10 Hz. Its horizontal FoV is limited to 290° due to occlusion from a radar sensor mounted behind it. We use sequences from three environments: KAIST, Riverside, and Sejong.

The HeLiPR dataset features multiple LiDAR sensors with different FoVs, scan patterns, and resolutions mounted on a single vehicle, enabling the evaluation of inter-LiDAR loop closure detection. From this dataset, we use three of the four available LiDARs: (i) a 128-beam spinning LiDAR (Ouster OS2-128) with 360° × 22.5° FoV, (ii) a hybrid solid-state LiDAR (Livox Avia) with 70° × 77° FoV and an unusual non-repetitive scanning pattern, and (iii) a solid-state LiDAR (Aeva Aeries II) with 120° × 19.2° FoV. We exclude the fourth sensor (Velodyne VLP-16) due to self-occlusion caused by surrounding sensors. The Bridge sequences in this dataset pose an additional challenge due to strong perceptual aliasing from repetitive structures in certain regions.

Moreover, both MulRan and HeLiPR datasets include sequences from the KAIST and Riverside environments, recorded approximately 4 years apart. This allows for a realistic evaluation of long-term inter-session loop closure detection, including cross-dataset comparisons between sequences captured with different LiDAR sensors. The KAIST and Riverside sequences also have some spatial overlap, providing an opportunity to evaluate multi-map alignment in a more challenging setting.

Additionally, we use two sequences from the NCLT dataset (Carlevaris-Bianco et al., 2016), recorded using a Velodyne HDL-32E LiDAR mounted on a Segway robot. The motion of the LiDAR in these sequences is non-planar due to the inverted pendulum-like dynamics of the Segway platform. We select two sequences recorded in the same environment but 1 year apart to assess the robustness of our loop closure detection approach under viewpoint and temporal variation.

4.1.2. Self-recorded datasets

In addition to the public datasets described above, we evaluate our approach on sequences recorded using our own instrumented vehicle and sensor platform. The IPB-Car sequence was recorded using an Ouster OS1-128 LiDAR mounted on the rooftop of an instrumented vehicle. We collected this data by driving through a hilly, semi-urban area with forest-lined roads and performing multiple revisits from varying viewpoints.

The IPB-Backpack sequence was captured using a sensor platform mounted on a backpack and carried through urban streets. This setup uses a Hesai Pandar-128 LiDAR mounted approximately 2 m above ground level. Unlike vehicle-mounted systems, the walking motion introduces non-planar sensor trajectories and dynamic changes in the viewpoint. This sequence presents a challenging test case for evaluating our ground alignment strategy under non-planar motion.

The two sequences share a small physical overlap in their environments, which we utilize to evaluate inter-session loop closure detection.

We generate near-ground-truth pose information using the offline LiDAR bundle adjustment method by Wiesmann et al. (2024), with manual inspection of the results to ensure alignment accuracy. This method incorporates RTK-GNSS data for geo-referencing; an information not provided to any loop closure system under evaluation. The process begins with initial pose estimates obtained from pose-graph fusion of LiDAR odometry (Vizzo et al., 2023), aided by either manually added loop closures (for the IPB-Backpack sequence) or RTK-GNSS data (for the IPB-Car sequence) to enforce global consistency. The final bundle adjustment refines these initial poses by aligning all scans with each other in an offline fashion.

4.2. Baseline methods used for comparison

We compare our approach against seven baseline methods, all of which provide publicly available implementations that we use for evaluation. To ensure a fair comparison, we modify their source code only to return multiple loop closure candidates per query scan rather than just the top-ranked one. Unless stated otherwise, we retain the default parameter settings in each baseline’s implementation.

4.2.1. Our approach

For our approach, we limit the maximum range of each LiDAR scan to 100 m. We set the travel displacement threshold τ_c for generating local maps to 100 m as well. For KISS-ICP (Vizzo et al., 2023) odometry, we use their latest open-source version (v1.2.3). We use a voxel size of ν_map = 0.5 m for the voxel grid used to generate the local maps, and the resolution of the BEV density images ν_b is set to 0.5 m. We use the default parameters for ORB feature descriptors, except that we turn off the scale invariance since the BEV images are orthographic projections. We obtain feature matches from HBST using a Hamming distance threshold of 50 bits and leave all other HBST parameters at their defaults. Finally, we classify two local maps as loop closures if the RANSAC-based alignment yields more than γ = 5 inliers.

For multi-session experiments, we save the HBST database from the reference session and use it as the database for the query session. We apply the same configuration to evaluate our previous approach, referred to as MapClosure (Gupta et al., 2024).

4.2.2. Scan context (SC)

Scan Context (Kim and Kim, 2018) is a widely used, state-of-the-art approach for LiDAR-based loop closure detection. However, it is specifically designed for LiDARs with a large horizontal FoV and a cylindrical scanning pattern. Therefore, we omit evaluation on the Livox Avia and Aeva Aeries II sensors from the HeLiPR dataset.

4.2.3. Stable triangle descriptors and binary triangle combined descriptors

STD by Yuan et al. (2023) introduces a triangle descriptor for point cloud local maps, which BTC (Yuan et al., 2024) extends with a binary descriptor for efficient storage and fast retrieval of loop closure candidates. Both methods accumulate 10 consecutive scans into a local map to compute descriptors. We use KISS-ICP to obtain the pose estimates for constructing these local maps. We adopt the default parameter settings from their implementations. For a fair comparison, we also evaluate STD and BTC on the same local maps used in our approach, generated with a 100 m travel displacement criterion. We refer to these variants as STD-100 and BTC-100.

4.2.4. Spatially organized and lightweight global descriptor (SOLiD)

SOLiD (Kim et al., 2024) is a recent loop closure detection method for 3D LiDAR sensors that is designed to operate across a wide variety of sensors, irrespective of their scan pattern, FoV, or resolution. The method employs a kd-tree-based search strategy to identify candidate loop closures. However, the authors do not provide an implementation of this search procedure in the publicly available codebase. Consequently, we approximate the candidate set by selecting all scans at least hundred frames prior to the current query scan. Following the procedure outlined in their manuscript, we compute the evaluation metrics using cosine distances between the descriptors of the query scan and each candidate.

4.2.5. LoGG3D-Net

LoGG3D-Net (Vidanapathirana et al., 2022) is a learning-based method that employs a 3D sparse convolutional network to extract consistent local features across different viewpoints. For our evaluation, we use the checkpoint trained on sequences from the MulRan dataset, as provided by the authors, and we adopt the default parameter settings from their implementation.

4.2.6. BEVPlace++

BEVPlace++ by Luo et al. (2025) is a learning-based method that computes a global descriptor for the BEV projection of the input scan using a rotation equivariant module. In our evaluation, we use the checkpoint trained on sequence 00 from the KITTI dataset (Geiger et al., 2013), as provided by the authors, and we adopt the default parameter settings from their implementation.

4.3. Reference loop closures between local maps

Since our method computes loop closures between local maps, we also define reference loop closures at the map level for evaluation. In contrast to prior works (Jiang and Shen, 2023; Kim et al., 2021, kim et al. 2024; Kim and Kim, 2018; Yuan et al., 2023), which typically rely on simple distance-based criteria, such heuristics are insufficient in our setting. The distance-based criterion breaks down in scenarios where other objects occlude the previously seen area, the sensor has a limited FoV, or the revisits are from a significantly different viewpoint. Additionally, there is inherent ambiguity in selecting the appropriate reference frame for measuring distances between local maps.

To overcome these limitations, we define reference loop closures based on the volumetric overlap of local maps, similar to Gupta et al. (2024) and Yuan et al. (2024). We generate reference local maps using the ground-truth pose information provided by the respective datasets. We ensure that each reference local map contains the same set of scans as the corresponding local maps produced by our method. We then identify reference loop closures by selecting all pairs of local maps that exhibit more than 25% voxel-level overlap. To avoid trivial short-range closures, we skip the three consecutive local maps preceding the query local map.

We adopt the same volumetric overlap strategy for cross-sequence evaluation, but reduce the overlap threshold to 10% to accommodate potential misalignments in the global pose information between sequences.

4.4. Scan-level to map-level conversion of loop closures for baselines

The baseline methods in our evaluation operate at different scales; some detect loop closures at the individual scan level (Scan Context and SOLiD), while others work on local maps with a fixed number of scans (STD and BTC). As a result, we cannot directly evaluate them using the reference loop closures defined between local maps generated based on a travel displacement criterion. Therefore, we convert their outputs into equivalent map-level loop closures.

For scan-level approaches, we treat a loop closure between scans $P_{i}$ and $P_{j}$ as a loop closure between local maps $M_{k}$ and $M_{l}$ , where scan $P_{i}$ belongs to map $M_{k}$ and scan $P_{j}$ belongs to map $M_{l}$ .

For methods such as STD and BTC, which use smaller local maps, we first extract all scan pairs between these local maps, $S_{m}$ and $S_{n}$ , that form a loop closure. We treat each scan pair as an individual scan-level loop closure and convert it into a corresponding map-level loop closure using the same method described above.

4.5. Evaluation metrics

4.5.1. Precision, recall, and F1 score

We use the reference closures computed in Section 4.3 to evaluate our approach and the baseline methods quantitatively. Specifically, we compute precision-recall curves by varying the threshold γ on the number of inlier feature descriptor matches from our pipeline’s RANSAC-based geometric validation stage. For the baseline methods, we generate their respective precision-recall curves by varying the key thresholding parameter described in their publications. In addition to the precision-recall curves, we report the average precision (AP) (area under the precision-recall curve), the maximum recall at 100% precision (R@1), and the maximum F1 score (F1_m). We specifically choose to report R@1 as the maximum recall at 100% precision to emphasize the need to avoid false loop closures in a SLAM pipeline that could lead to catastrophic failures (Lowry et al., 2016).

4.5.2. Absolute pose error

We evaluate the effectiveness of our approach in correcting drift within a SLAM pipeline through an offline pose-graph optimization using the g2o optimizer (Kümmerle et al., 2011). We directly incorporate the detected loop closures between local maps as constraints in the pose-graph, using the initial transformation estimates from our pipeline.

We assess performance by computing the root mean square (RMS) absolute pose error (APE) in translation, with respect to ground-truth poses, both before and after the pose-graph optimization. To directly evaluate the accuracy of the detected loop closures and their alignments, we do not apply any robust kernel for outlier rejection during optimization.

We further evaluate the accuracy of the initial alignment estimate $({}^{m}T_{r})$ between loop-closed local maps using the RMS APE in translation and rotation.

5. Experimental evaluation

The primary focus of this work is an accurate and effective loop closure detection pipeline that works with various LiDAR sensors, invariant to their scanning pattern, FoV, and resolution. We present our experiments to show the capabilities of our method and support our key claims. Our approach (1) detects loop closures between local maps generated from various LiDAR sensors with different scanning patterns, FoVs, and resolutions; (2) performs multi-session loop closure detection and alignment with long-term revisits; (3) works with handheld platforms having non-planar motion in the LiDAR sensor frame; (4) is robust against perceptual aliasing in environments with repetitive structures; (5) provides a complete 3D rigid-body transform to align the detected loop closures; (6) detects loop closures between sequences having minimal overlap, recorded with different LiDAR sensor platforms, enabling cross-platform multi-map alignment.

5.1. Intra-session loop closure detection

In this experiment, we evaluate the performance of our approach on intra-session loop closure detection. We compare against several state-of-the-art baselines by converting their detected closures to local map-level closures, as described in Section 4.4. In addition, we include a comparison with our previous conference publication, MapClosure (Gupta et al., 2024), which forms the foundation of this work. We also evaluate the local map-based baselines STD and BTC using the same 100 m travel displacement-based local maps as in our approach, referred to as STD-100 and BTC-100. However, we do not evaluate the scan-based baselines using similar local maps since scan-based methods like Scan Context and SOLiD depend on a single scan center as the reference frame, which local maps inherently lack. Handling large lateral shifts would require sampling many artificial scan centers, changing the methods’ intended use.

The quantitative results in Tables 1 and 2 show the effectiveness of our approach across multiple public and in-house datasets. Our method achieves the highest average precision in 15 of the 24 sequences and ranks second in eight of the remaining nine. In terms of recall, it achieves the best performance at 100% precision in 15 sequences and the second-best in nine others. This balance between precision and recall is also reflected in the F1 scores, where our method ranks first in 18 sequences and second in six others.

Table 1.

Average precision (AP), maximum recall at 100% precision (R@1), and maximum F1 scores (F1_m) of state-of-the-art baselines and our approach for intra-session loop closure detection. Larger values indicate better performance. The best values are in bold, and the second-best values are in italics. The R@1 field is marked with “-” when the baseline cannot achieve a 100% precision upon varying its corresponding thresholding parameter.

(a) HeLiPR dataset → Ouster OS2-128
Datasets	Bridge01			Bridge02			Roundabout01			Town02			Town03
Metrics	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m
STD	0.344	0.049	0.460	0.300	-	0.469	0.316	0.200	0.420	0.293	0.059	0.421	0.373	0.037	0.538
STD-100	0.006	-	0.048	0.024	-	0.121	0.008	0.005	0.078	0.005	0.008	0.076	0.017	-	0.133
BTC	0.733	0.005	0.730	0.644	0.009	0.719	0.627	0.220	0.689	0.651	0.034	0.703	0.584	0.022	0.678
BTC-100	0.029	-	0.171	0.247	-	0.393	0.446	0.031	0.566	0.098	-	0.286	0.013	-	0.135
SOLiD	0.136	0.079	0.211	0.181	0.009	0.226	0.072	0.015	0.121	0.098	0.008	0.167	0.106	-	0.167
LoGG3D	0.125	0.007	0.199	0.127	0.014	0.188	0.077	-	0.162	0.065	-	0.128	0.059	-	0.112
BEVPlace++	0.347	0.025	0.445	0.273	0.009	0.423	0.167	0.020	0.227	0.211	0.068	0.263	0.168	-	0.262
SC	0.186	-	0.489	0.166	-	0.558	0.217	0.144	0.412	0.151	-	0.382	0.104	-	0.351
MapClosure	0.662	0.219	0.720	0.752	0.470	0.781	0.731	0.472	0.788	0.568	0.119	0.690	0.694	0.343	0.743
Ours	0.688	0.332	0.794	0.685	0.614	0.797	0.892	0.508	0.890	0.607	0.424	0.750	0.711	0.388	0.749

(b) HeLiPR dataset → Aeva Aeries II
Datasets	Bridge01			Bridge02			Roundabout01			Town02			Town03
Metrics	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m
STD	0.191	-	0.376	0.170	0.005	0.340	0.087	-	0.252	0.082	0.055	0.177	0.242	0.010	0.369
STD-100	0.012	-	0.080	0.056	0.005	0.168	0.021	0.009	0.082	0.000	-	0.058	0.032	-	0.113
BTC	0.609	0.076	0.722	0.511	0.010	0.679	0.377	0.176	0.483	0.259	-	0.429	0.371	0.020	0.494
BTC-100	0.033	-	0.192	0.201	-	0.392	0.157	-	0.367	0.029	-	0.150	0.033	-	0.180
SOLiD	0.378	0.166	0.451	0.156	-	0.404	0.211	0.148	0.277	0.068	0.027	0.171	0.235	-	0.338
LoGG3D	0.041	0.003	0.103	0.063	0.010	0.162	0.038	-	0.083	0.030	-	0.071	0.058	0.020	0.106
BEVPlace++	0.265	-	0.414	0.240	0.005	0.417	0.082	0.018	0.139	0.035	-	0.098	0.121	-	0.168
MapClosure	0.814	0.488	0.818	0.834	0.493	0.785	0.478	0.139	0.607	0.196	0.192	0.356	0.465	0.202	0.584
Ours	0.711	0.424	0.796	0.683	0.488	0.765	0.584	0.259	0.654	0.206	0.110	0.358	0.543	0.172	0.642

(c) HeLiPR dataset → Livox Avia
Datasets	Bridge01			Bridge02			Roundabout01			Town02			Town03
Metrics	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m
STD	0.056	-	0.167	0.073	0.005	0.178	0.104	0.024	0.208	0.062	0.035	0.171	0.290	0.111	0.357
STD-100	0.002	-	0.025	0.013	0.010	0.065	0.015	0.012	0.056	0.000	-	0.028	0.032	-	0.107
BTC	0.111	0.040	0.293	0.211	-	0.476	0.241	0.146	0.394	0.133	0.035	0.330	0.346	0.022	0.500
BTC-100	0.011	-	0.098	0.105	-	0.288	0.050	0.037	0.175	0.000	0.018	0.034	0.010	0.022	0.079
SOLiD	0.365	0.052	0.455	0.400	0.038	0.462	0.223	0.024	0.346	0.064	-	0.098	0.212	-	0.358
LoGG3D	0.013	-	0.045	0.020	-	0.057	0.026	0.012	0.059	0.023	0.018	0.053	0.036	-	0.084
BEVPlace++	0.205	-	0.302	0.200	0.014	0.321	0.052	-	0.112	0.030	-	0.092	0.078	-	0.144
MapClosure	0.512	0.222	0.609	0.562	0.229	0.605	0.133	0.073	0.252	0.102	0.105	0.265	0.431	0.367	0.576
Ours	0.549	0.150	0.698	0.543	0.300	0.688	0.278	0.158	0.442	0.141	0.088	0.269	0.464	0.267	0.587

Table 2.

(a) MulRan dataset → Ouster OS1-64
Datasets	KAIST01			KAIST02			Riverside01			Riverside02			Sejong01
Metrics	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m
STD	0.545	0.030	0.655	0.462	0.258	0.640	0.329	-	0.488	0.443	0.067	0.619	0.016	-	0.143
STD-100	0.097	0.007	0.262	0.080	-	0.232	0.203	0.104	0.318	0.103	0.045	0.202	0.000	0.000	0.000
BTC	0.680	0.007	0.756	0.605	0.103	0.736	0.916	0.865	0.927	0.645	-	0.854	0.289	0.364	0.909
BTC-100	0.293	0.007	0.502	0.224	0.010	0.447	0.615	0.292	0.706	0.456	0.214	0.609	0.000	0.000	0.000
SOLiD	0.356	0.104	0.462	0.457	0.433	0.604	0.505	0.271	0.582	0.471	0.326	0.558	0.010	-	0.133
LoGG3D	0.246	0.022	0.362	0.290	-	0.353	0.337	0.021	0.412	0.225	0.022	0.319	0.001	-	0.018
BEVPlace++	0.387	0.111	0.421	0.418	0.237	0.464	0.490	0.062	0.556	0.428	0.168	0.451	0.195	0.091	0.444
SC	0.176	-	0.550	0.169	0.402	0.636	0.240	0.417	0.684	0.190	0.404	0.587	0.162	0.182	0.424
MapClosure	0.671	0.437	0.777	0.790	0.505	0.828	0.902	0.885	0.946	0.831	0.708	0.876	0.083	0.182	0.421
Ours	0.733	0.400	0.838	0.852	0.588	0.876	0.885	0.896	0.945	0.843	0.629	0.877	0.364	0.546	0.706

(b) NCLT dataset → Velodyne HDL-32E; IPB-Car dataset → Ouster OS1-128; IPB-Backpack dataset → Hesai Pandar-128
Datasets	NCLT 2012-01-08			NCLT 2013-04-05			IPB-car			IPB-backpack
Metrics	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m
STD	0.429	0.286	0.621	0.219	0.417	0.588	0.219	0.068	0.344	0.261	0.139	0.323
STD-100	0.000	0.018	0.035	0.000	0.000	0.000	0.013	0.016	0.042	0.000	-	0.053
BTC	0.333	-	0.654	0.104	0.083	0.333	0.470	0.036	0.607	0.302	0.236	0.383
BTC-100	0.319	-	0.544	0.000	0.083	0.154	0.331	0.112	0.448	0.000	0.083	0.154
SOLiD	0.092	0.018	0.153	0.092	0.018	0.153	0.078	0.012	0.120	0.216	0.014	0.365
LoGG3D	0.070	-	0.161	0.019	-	0.060	0.014	-	0.045	0.215	-	0.355
BEVPlace++	0.154	-	0.274	0.101	0.167	0.286	0.196	0.028	0.297	0.370	0.125	0.398
SC	0.144	-	0.495	0.031	-	0.444	0.168	-	0.426	0.253	0.083	0.358
MapClosure	0.321	0.089	0.500	0.307	-	0.571	0.899	0.272	0.906	0.169	0.181	0.306
Ours	0.703	0.321	0.796	0.506	0.500	0.727	0.929	0.404	0.909	0.361	0.361	0.540

Our method provides a robust solution for loop closure detection within a SLAM pipeline, where false positives must be minimized (Bailey and Durrant-Whyte, 2006; Blanco et al., 2013; Lowry et al., 2016). The precision-recall curves in Figure 9 show that our approach (red) consistently outperforms baselines across datasets with varied sensor setups. It maintains high precision over a broad range of recall values, eliminating the need for careful tuning of the inlier threshold in RANSAC or additional outlier rejection schemes in the pose-graph optimization.

Figure 9.

The precision-recall curves of state-of-the-art baselines and our approach for intra-session loop closure detection.

The evaluations on the HeLiPR dataset demonstrate the versatility of our method across different LiDAR sensors in varied urban environments. Our method consistently ranks among the top two methods across all metrics. In several sequences, our current approach ranks second only to MapClosure. However, the precision-recall curves in Figure 9 reveal a key advantage: our method maintains a higher precision than MapClosure, ensuring that detected loop closures remain accurate.

The HeLiPR Bridge sequences, which exhibit strong perceptual aliasing, highlight the effectiveness of our pruning strategy. While overall metrics in Table 1 may appear comparable between our method and MapClosure, our precision-recall curves in the first row of Figure 9 terminate earlier at a higher precision value than MapClosure, which also holds in comparison to other baselines. This behavior reflects our deliberate choice to prioritize precision over recall in perceptually ambiguous scenes. By doing so, we improve robustness, reduce sensitivity to the inliers threshold (γ), and lower the risk of false loop closures. The Town sequences present additional challenges with narrow alleyways, wide boulevards, and numerous dynamic objects such as pedestrians and cars. Even under these conditions, our approach ranks among the top two methods, showing its resilience in complex urban environments.

Additionally, the results on HeLiPR Bridge sequences provide insight into how the baselines perform under different levels of dynamic activity in the scene. The Bridge01 sequence, recorded at night, contains relatively few dynamic objects, whereas Bridge02, recorded during the day, includes substantially more dynamic participants. Since both sequences capture the same driving environment, we can directly compare their results to assess how dynamic objects affect loop closure detection. Our method performs consistently in terms of average precision and F1 score, highlighting its robustness to such dynamic elements. As expected, scan-based methods also show little to no variation in the metrics across these sequences, since individual scans are not significantly affected by dynamic objects due to the short temporal window of a single LiDAR scan.

The ground alignment strategy introduced in Section 3.2 further strengthens performance on datasets with non-planar LiDAR motion, such as the NCLT dataset and our self-recorded IPB-Backpack sequence. As shown in Table 2, our method outperforms Scan Context, BEVPlace++, and MapClosure, which assume planar motion for the BEV projection. We achieve the best values across all three metrics, except for average precision on the IPB-Backpack dataset, where the performance gap to BEVPlace++ remains marginal. Consistent BEV projection onto the local ground plane during revisits, even under varying pitch and roll, drives this performance. On the IPB-Car dataset, recorded in hilly terrain with vegetation and elevation changes, our method achieves almost 100% average precision, with a near-perfect precision-recall curve, confirming its robustness to complex ground plane variations.

The results also confirm the importance of local maps for extracting meaningful structural information for place recognition and loop closure detection. Among the baselines, only STD, BTC, and MapClosure perform competitively, and all rely on local maps. Nonetheless, our evaluation of STD and BTC with the same local maps as our pipeline, referred to as STD-100 and BTC-100, shows that our method’s advantage does not stem solely from larger local map sizes, but from the full algorithmic design.

In contrast, scan-based global descriptor approaches such as Scan Context and SOLiD fail to achieve consistent performance. Scan Context, designed for rotating LiDAR sensors, performs poorly in the HeLiPR Ouster OS2-128 and MulRan sequences due to its strong dependence on viewpoint. SOLiD, designed to handle LiDARs with different FoVs, performs better on the HeLiPR Aeva Aeries II and Livox Avia sequences but still falls short of local map-based methods. These trends highlight the inherent difficulty in constructing global descriptors for single LiDAR scans.

The weak performance of LoGG3D further illustrates this limitation. Without spatio-temporal aggregation or dimensionality reduction, LoGG3D remains highly sensitive to the point density within each scan. Trained exclusively on sequences from the MulRan dataset with Ouster OS1-64, it performs well only on that dataset and generalizes poorly elsewhere. It fails to reach 100% precision even at low recall values in most sequences, underscoring the need to retrain such methods for each sensor setup.

BEVPlace++, also a learning-based method, performs more robustly than LoGG3D does due to its density-preserving BEV projection. However, it struggles on NCLT and IPB-Backpack sequences because of its assumption of planar motion for BEV projection.

Overall, these experiments confirm our core claim: our method delivers state-of-the-art performance in intra-session loop closure detection across datasets and LiDAR platforms. By leveraging local maps as the core representation, our pipeline balances precision and recall, remains robust under perceptual aliasing, adapts to non-planar motion, and generalizes across different sensor characteristics, all without requiring changes to the pipeline parameters.

5.2. Inter-session and inter-LiDAR loop closure detection

In this experiment, we evaluate the performance of our approach for inter-session and inter-LiDAR loop closure detection and compare it against state-of-the-art baselines introduced in Section 4.2. We select sequences spanning revisit intervals from a few weeks to several years to test the robustness under diverse temporal conditions.

Our pipeline architecture remains fundamentally the same in this scenario, with only minor modifications. We store the ground alignment transformations for each local map together with the HBST database from the reference sessions. For a given query session, we match the ORB features against the reference database without adding any features from the query session to the database. Then, using the ground alignment transforms of the corresponding local map from the reference session and the query local map, we compute a complete $S E (3)$ pose estimate to align the loop closure between the two sessions.

We first evaluate inter-session loop closures using the same LiDAR sensor. The precision-recall curves in Figure 10 and the metrics reported in Table 3 reveal trends consistent with the intra-session results. Our method consistently achieves the highest recall at 100% precision across different LiDARs and revisit periods, confirming its reliability in challenging conditions. The precision-recall curves highlight this robustness, as they terminate earlier but maintain high precision. Our method also ranks among the top two in average precision and maximum F1 scores, with BTC being the only baseline matching our overall performance. Other baselines often fail to reach 100% precision or achieve it only at very low recall, except in the MulRan KAIST01–KAIST03 sequences.

Figure 10.

The precision-recall curves of state-of-the-art baselines and our approach for inter-session loop closure detection.

Table 3.

Average precision (AP), maximum recall at 100% precision (R@1), and maximum F1 scores (F1_m) of state-of-the-art baselines and our approach for inter-session loop closure detection between sequences recorded with the same LiDAR sensor. Larger values indicate better performance. Best values are in bold, and the second-best values are in italics. The R@1 field is marked with “-” when the baseline cannot achieve a 100% precision upon varying its corresponding thresholding parameter.

(a) Conventional spinning LiDAR sensors
Datasets	MulRan						NCLT			HeLiPR
Revisit period	≈2 months						$>$ 1 year			2 weeks			4 weeks
Reference seq.	KAIST01			Sejong02			2012-01-08			Bridge02			Town01
Query seq.	KAIST03			Sejong03			2013-04-05			Bridge03			Town03
LiDAR	Ouster OS1-64			Ouster OS1-64			Velodyne HDL-32E			Ouster OS2-128			Ouster OS2-128
Metrics	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m
STD	0.451	0.223	0.529	0.095	-	0.224	0.297	0.241	0.408	0.220	-	0.396	0.266	0.084	0.360
BTC	0.635	0.309	0.740	0.378	0.005	0.526	0.465	0.311	0.605	0.490	0.001	0.647	0.461	0.259	0.636
SOLiD	0.472	0.085	0.435	0.022	-	0.055	0.153	-	0.307	0.157	0.004	0.223	0.239	0.024	0.276
LoGG3D	0.342	0.027	0.424	0.017	-	0.069	0.179	-	0.312	0.152	0.005	0.205	0.149	0.002	0.222
BEVPlace++	0.547	0.276	0.575	0.053	-	0.149	0.238	0.017	0.308	0.335	0.002	0.468	0.316	0.025	0.350
SC	0.160	0.445	0.643	0.039	-	0.140	0.252	-	0.426	0.205	-	0.500	0.247	-	0.479
Ours	0.725	0.633	0.835	0.203	0.037	0.346	0.464	0.411	0.637	0.472	0.300	0.638	0.497	0.437	0.664

(b) Unconventional hybrid solid-state and solid-state, non-spinning LiDAR sensors
Datasets	HeLiPR
Revisit period	2 weeks						4 weeks
Reference seq.	Bridge02						Town01
Query seq.	Bridge03						Town03
LiDAR	Livox Avia			Aeva Aeries II			Livox Avia			Aeva Aeries II
Metrics	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m
STD	0.087	-	0.168	0.171	0.001	0.319	0.238	0.066	0.283	0.226	0.088	0.270
BTC	0.161	0.001	0.368	0.391	0.008	0.593	0.295	0.198	0.442	0.415	0.068	0.549
SOLiD	0.151	0.002	0.209	0.163	0.013	0.212	0.218	0.058	0.248	0.224	0.066	0.254
LoGG3D	0.039	-	0.125	0.065	-	0.158	0.114	0.002	0.194	0.152	0.002	0.249
BEVPlace++	0.207	0.009	0.302	0.279	0.002	0.408	0.224	0.078	0.268	0.252	0.054	0.298
Ours	0.328	0.223	0.494	0.404	0.060	0.580	0.319	0.218	0.485	0.349	0.215	0.519

The Sejong02–Sejong03 sequences from the MulRan dataset were recorded as opposite-direction traversals on a long highway, which produces sparse structural geometry throughout the sequences. The restricted FoV of the dataset’s LiDAR further increases the difficulty of inter-session loop closure detection. As shown in Table 3a, all baselines perform substantially worse on this pair of sequences, especially when we compare their results to those on the KAIST01–KAIST03 sequences from the same dataset, recorded with the same setup but in a dense urban environment.

The Bridge sequences from the HeLiPR dataset underscore the benefit of our self-similarity pruning strategy. Strong perceptual aliasing causes most baselines to either miss 100% precision entirely or reach it only at negligible recall. Even BTC, which performs comparably to our method in most other scenarios, fails to maintain its high precision on the Bridge sequences because it lacks an explicit mechanism to handle perceptual aliasing. In contrast, our approach detects accurate loop closures at comparatively higher recall, as reflected in the clear separation of our precision-recall curves from the baselines.

Additionally, we achieve better performance across different LiDAR sensors on the same sequences without changing any parameters in our pipeline. As shown in Table 3b, the Aeva Aeries II LiDAR experiences a sharp drop in maximum recall at 100% precision on the Bridge02–Bridge03 sequences. This drop arises from the overlap threshold used to define ground-truth loop closures: a few valid closures fall below the threshold and are therefore marked as false positives in ground-truth local maps, which lowers the maximum recall at 100% precision. Nevertheless, the average precision and the corresponding precision-recall curve in Figure 10 demonstrate that our method still performs strongly, maintaining near-100% precision across a substantially larger recall range.

On the NCLT dataset, which includes a revisit after about 1 year with a low-resolution Velodyne HDL-32E LiDAR, our approach again achieves the best recall at 100% precision and maximum F1 score. Only STD and BTC are the baselines that perform comparably in this scenario. This emphasizes the importance of having explicit knowledge about the ground plane under non-planar LiDAR motion.

These results also highlight the challenges that scan-based global descriptor methods such as Scan Context, SOLiD, and LoGG3D face in long-term revisit scenarios. In such scenarios, significant lateral shifts, viewpoint changes during revisits, and scene changes can all alter the global descriptor substantially. In contrast, local map-based local descriptor methods like ours, STD, and BTC avoid these issues by aggregating multiple scans to reduce viewpoint dependence and by focusing on local regions of the scene that remain similar across revisits, rather than compressing the local context into a single global descriptor.

The results on the HeLiPR sequences further highlight the impact of the LiDAR type on long-term place recognition. Even among the stronger baselines, we observe a noticeable drop in performance when switching from a 360°-horizontal-FoV Ouster OS2-128 LiDAR to the limited-horizontal-FoV LiDAR sensors such as Aeva Aeries II and Livox Avia, even when evaluating the same pair of sequences.

We next evaluate inter-LiDAR loop closure detection across multiple sessions. As shown in Table 4, our approach ranks second overall to BTC but achieves the best recall at 100% precision on 5 of 10 sequences. The precision-recall curves in Figure 11 highlight the same trend: our method often detects fewer closures but preserves high precision, ensuring reliable results without extensive parameter tuning.

Table 4.

Average precision (AP), maximum recall at 100% precision (R@1), and maximum F1 scores (F1_m) of state-of-the-art baselines and our approach for inter-session loop closure detection between sequences recorded with different LiDAR sensors. Larger values indicate better performance. The best values are in bold, and the second-best values are in italics. The R@1 field is marked with “-” when the baseline cannot achieve a 100% precision upon varying its corresponding thresholding parameter.

(a)
Datasets	HeLiPR
Revisit interval	2 weeks			4 weeks			≈2 weeks						2 weeks
Reference seq.	Bridge02			Bridge01			Roundabout01						Town02
Query seq.	Bridge03			Bridge03			Roundabout02						Town03
Reference LiDAR	Ouster OS2-128			Ouster OS2-128			Ouster OS2-128			Aeva Aeries II			Ouster OS2-128
Query LiDAR	Aeva Aeries II			Aeva Aeries II			Aeva Aeries II			Livox Avia			Aeva Aeries II
Metrics	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m
STD	0.086	0.003	0.208	0.061	-	0.188	0.109	0.003	0.229	0.019	-	0.134	0.165	0.017	0.250
BTC	0.398	0.033	0.592	0.352	0.004	0.520	0.499	0.152	0.620	0.361	0.002	0.504	0.478	0.243	0.612
SOLiD	0.048	-	0.097	0.028	-	0.072	0.102	-	0.228	0.099	0.004	0.198	0.105	-	0.251
LoGG3D	0.010	-	0.084	0.006	-	0.061	0.086	-	0.232	0.043	-	0.169	0.063	-	0.204
BEVPlace++	0.079	0.003	0.215	0.016	0.002	0.108	0.010	0.001	0.065	0.058	-	0.188	0.055	-	0.221
Ours	0.156	0.032	0.280	0.082	-	0.176	0.332	0.176	0.502	0.067	0.039	0.130	0.141	0.093	0.251

(b)
Datasets	HeLiPR						MulRan × HeLiPR						Self-recorded
Revisit interval	2 weeks			4 weeks			$>$ 4 years						2 weeks
Reference seq.	Town02			Town01			MulRan KAIST01						IPB-Car
Query seq.	Town03			Town03			HeLiPR KAIST05						IPB-Backpack
Reference LiDAR	Livox Avia			Livox Avia			Ouster OS1-64			Ouster OS1-64			Ouster OS1-128
Query LiDAR	Aeva Aeries II			Aeva Aeries II			Ouster OS2-128			Aeva Aeries II			Hesai Pandar-128
Metrics	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m	AP	R@1	F1_m
STD	0.159	0.045	0.229	0.159	0.034	0.226	0.124	0.014	0.278	0.108	-	0.271	0.012	0.012	0.052
BTC	0.325	0.002	0.468	0.297	0.004	0.437	0.585	0.100	0.713	0.410	0.087	0.579	0.156	0.167	0.286
SOLiD	0.163	-	0.242	0.171	0.002	0.250	0.171	0.002	0.266	0.156	-	0.252	0.005	-	0.025
LoGG3D	0.073	-	0.187	0.113	0.002	0.186	0.027	-	0.162	0.051	-	0.194	0.013	-	0.050
BEVPlace++	0.083	0.003	0.215	0.105	-	0.236	0.151	0.002	0.359	0.140	-	0.303	0.054	0.012	0.192
SC	N/A	N/A	N/A	N/A	N/A	N/A	0.251	0.202	0.432	N/A	N/A	N/A	0.031	-	0.107
Ours	0.145	0.098	0.259	0.143	0.044	0.256	0.229	0.106	0.377	0.091	0.054	0.200	0.149	0.167	0.362

Figure 11.

The precision-recall curves of state-of-the-art baselines and our approach for inter-LiDAR loop closure detection.

The performance of scan-based methods like Scan Context, SOLiD, LoGG3D, and BEVPlace++ is adversely affected by the difference in the scanning patterns, FoVs, and resolutions across LiDAR sensors. The local map-based methods achieve better results in such cross-LiDAR scenarios.

On the Roundabout01–Roundabout02, Town01–Town03, and Town02–Town03 sequences, our pipeline achieves the best recall at 100% precision among all baselines, demonstrating robustness to opposite traversals and constrained sensor views. However, on Bridge01–Bridge03 and Bridge02–Bridge03 sequences, the feature-pruning strategy limits recall to very low values, leaving room for improvement.

Cross-dataset evaluations between MulRan and HeLiPR push this further, with revisit intervals of over 4 years and different LiDAR sensors. These scenarios introduce structural changes in addition to sensor differences, making them highly challenging. Even so, our method achieves the second-best recall at 100% precision, while all baselines, including ours, achieve lower recall overall.

Finally, in self-recorded datasets collected 2 weeks apart with platforms having different motion profiles and LiDARs, our method performs comparably to BTC across all metrics, confirming its ability to generalize across diverse sensor setups.

Overall, these evaluations demonstrate that our approach effectively detects loop closures across multiple sessions for different LiDAR sensors and revisit intervals, thereby supporting our second key claim. At the same time, they highlight opportunities for further research on the highly challenging problem of inter-LiDAR loop closure detection.

5.3. Analysis and evaluation of the ground alignment module

In this section, we present both qualitative and quantitative evaluations of the ground alignment stage, which enhances the detection and alignment of loop closures. As we describe in Section 3.2, we need this stage in scenarios where the LiDAR experiences non-planar motion, allowing us to maintain a consistent reference plane for the BEV projection across revisits.

When we turn off the ground alignment module, our approach projects local maps onto the xy-plane, assuming planar motion. Consequently, using the RANSAC-based validation, it can align the local maps only along the xy-plane. We illustrate such a 2D alignment in the top-left image of Figure 12, where two local maps appear well-aligned from a top-down view. However, this alignment can be misleading. In the top-right image of Figure 12, a different perspective reveals an apparent misalignment in height and tilt between the two local maps, caused by non-planar LiDAR motion.

Figure 12.

Visual comparison of loop-closed local map alignments with and without ground alignment. The top-down view (xy) shows both local maps to be aligned in either case, as RANSAC operates in 2D. However, the side view (xz) reveals significant misalignment without ground alignment due to uncorrected pitch and roll differences, as highlighted by the black ellipse. Applying ground alignment provides consistent 3D alignment by projecting the local maps onto a common physical ground plane.

By applying explicit ground alignment, we ensure that the local maps involved in a loop closure align correctly in 3D. This improvement enables our method to perform complete 3D global alignment rather than relying solely on a 2D assumption. The second row of Figure 12 shows a more accurate initial alignment between the local map pairs, even before any point cloud registration takes place.

We further validate this improvement through quantitative analysis. For each detected loop closure, we compute the overlap between the corresponding local maps after applying our initial transformation estimate

{}^{m}T_{r}

. We evaluate this on two sequences with non-planar LiDAR motion: the 2013-04-05 sequence from the NCLT dataset (Carlevaris-Bianco et al., 2016) and the IPB-Backpack sequence recorded using our in-house setup. In Table 5, we report the number of true positive loop closures with at least five inliers and the root mean square error of the overlap between aligned local maps, compared to the overlap between their reference counterparts generated with ground-truth poses.

Table 5.

Evaluation of the ground alignment strategy for loop closure detection and local map alignment. We report the number of correctly identified loop closures and the root mean square (RMS) error in the overlap between aligned local maps using our pose estimate ${}^{m}T_{r}$ and the overlap between the reference local maps. We report results with and without the proposed ground alignment module.

Dataset	NCLT 2013-04-05		IPB-Backpack
Ground alignment	OFF	ON	OFF	On
Correct loop closures ↑	6	6	10	26
RMS error in overlap ↓	0.280	0.110	0.468	0.104
Recall ↑	0.500	0.500	0.139	0.361

As seen in Table 5, the ground alignment stage significantly reduces the overlap error between local maps. This improvement occurs without applying any local point cloud registration. This advantage becomes evident in datasets with strong non-planar motion, such as the IPB-Backpack setup, where roll and pitch angles vary approximately between −30° and 30°. In such cases, the proposed ground alignment module enables our system to detect substantially more loop closures. This is because the BEV density images remain consistent across revisits when we project them onto the actual ground plane in the environment rather than onto the xy-plane of the local map frame.

We next evaluate the ability of our ground alignment module to correct misalignments between the ground plane and the map’s local xy-plane. To test this, we manually apply rotations in the range (10°–80°) about 10 random axes lying in the xy-plane of each reference local map. Our method then estimates the transform ${}^{g}T_{m}$ that realigns the ground plane with the map’s xy-plane. We perform this evaluation on the MulRan KAIST01 and Sejong01 sequences, as well as the HeLiPR Bridge01 sequence, using the Aeva Aeries II and Livox Avia LiDARs. These sequences feature near-planar LiDAR motion, so their reference local maps already have the ground plane aligned with the xy-plane.

In Figure 13, we report the mean and standard deviation of the absolute error in the predicted rotation magnitude, averaged across all 10 axes for each local map, as a function of the applied misalignment. The ground alignment module reliably corrects misalignments up to 60°, with a mean error below 1° and a standard deviation below 8°. Even at initial misalignments of around 80°, the mean absolute error typically remains below 10°, except in the Livox Avia sequence, where observed failures stem from the Livox Avia sensor’s unusual reflective artifacts. These artifacts produce misleading, consistently oriented below-ground points that disrupt low-lying points sampling and PCA filtering under artificially large rotations. We would like to highlight that this is a sensor-specific quirk and not a flaw in the alignment method. However, since real-world misalignments rarely approach such extreme values, this level of performance is sufficient for practical use.

Figure 13.

Quantitative evaluation of the ground alignment accuracy. The azimuthal axis indicates the absolute initial misalignment magnitude between the ground plane and xy-plane of the local maps. The radial axis (log scale) shows the absolute error of the predicted ground alignment magnitude, reported as mean values (dots) and standard deviation (bars), across different sequences and LiDARs from the MulRan and HeLiPR datasets.

These results support our third claim that the proposed ground alignment module enables robust handling of handheld or non-planar LiDAR motion and also provides a strong initial pose estimate for the 3D alignment of loop-closed local maps.

5.4. Analysis and evaluation of the self-similarity pruning strategy

In this section, we evaluate the effectiveness of the self-similarity pruning strategy in detecting loop closures within environments characterized by highly repetitive structures. Specifically, we focus on the Bridge sequences from the HeLiPR dataset, which feature long bridges composed of repeating mechanical elements. These conditions present significant challenges due to perceptual aliasing, which can lead to false loop closures.

Figure 14 shows a zoomed-in precision-recall curve for different self-similarity pruning thresholds based on the Hamming distance between ORB descriptors. The mark X on the curve represents the performance at an inlier threshold of 5, the default setting in our pipeline. Lower threshold values prune only strictly similar features, leading to higher recall values but a drop in precision, similar to the performance without pruning. Conversely, higher thresholds prune more features, leading to a drop in recall but maintaining high precision. The threshold value of 35 provides a desirable trade-off, maintaining a recall of around 0.6 with minimal loss in precision.

Figure 14.

Precision-recall curves for various choices of pruning thresholds in terms of the Hamming distance between ORB descriptors. We visualize a zoomed-in precision-recall curve for better differentiation between different curves.

Without the pruning strategy, our method detects false loop closures in these Bridge sequences. We show one example at the bottom-left of Figure 15, where a false match occurs with a near-perfect alignment between local maps. This alignment is supported by a sufficient number of inlier correspondences after a 2D RANSAC alignment of their BEV features (right-hand side of Figure 15), primarily caused by repetitive structures on the bridge. We successfully eliminate these matches between similar structures by applying the feature pruning strategy described in Section 3.4, thereby preventing the false loop closure.

Figure 15.

Precision-recall curves evaluating the impact of the pruning strategy for loop closure detection on the Bridge sequences from the HeLiPR dataset. These sequences are recorded in an environment with strong perceptual aliasing. ambiguous, repetitive features that lead to perceptual aliasing.

We further compare the performance of our pipeline with and without feature pruning in Figure 16. The precision-recall curves for both configurations, across the two Bridge sequences and three different LiDAR sensors, show that our approach with pruning maintains high precision throughout rather than compromising on precision to achieve a better recall. In contrast, our approach without pruning achieves slightly higher recall but significantly drops precision.

Figure 16.

Visualization of perceptual aliasing in the HeLiPR Bridge sequence and the impact of feature pruning on loop closure detection. Left: The repetitive semi-circular arches of the bridge (top) lead to structurally similar local maps (below) as highlighted by the yellow ellipses, causing a false loop closure. Right: Feature correspondence plots before and after pruning. Without pruning, RANSAC mistakenly accepts many inlier correspondences (green lines), resulting in a false positive. After pruning, only outlier matches (red lines) remain, correctly preventing erroneous loop closure. Feature pruning effectively removes ambiguous, repetitive features that lead to perceptual aliasing.

To underscore the importance of precision, we conduct a pose-graph optimization experiment using the loop closures detected under both settings. As shown in Table 6, loop closures obtained with pruning lead to markedly better absolute trajectory estimates despite detecting fewer closures. In contrast, disabling pruning results in more false positives, which degrade the final trajectory estimate, even surpassing the error of odometry-only results.

Table 6.

The effect of feature pruning on loop closure accuracy, measured by RMS APE w.r.t. translation after pose-graph optimization. We report the results for three LiDAR sensors across two sequences from the HeLiPR dataset.

Metric	RMS APE w.r.t. translation (m) ↓
Sequence	HeLiPR Bridge01			HeLiPR Bridge02
LiDAR	Ouster	Aeva	Livox	Ouster	Aeva	Livox
Odometry	178.8	124.4	223.0	52.4	149.3	114.9
w/o pruning	331.1	282.7	695.9	572.6	314.1	457.7
w /pruning	23.9	28.7	51.0	16.6	23.5	56.4

Overall, this study supports our fourth claim that the self-similarity feature pruning strategy introduced in Section 3.4 is essential for maintaining robustness in loop closure detection under high structural repetition. By prioritizing precision, our method avoids catastrophic failures in pose estimation, ultimately leading to more reliable mapping and localization.

5.5. Evaluation of the 3D alignment estimates

In this section, we evaluate the accuracy of the initial alignment $({}^{m}T_{r})$ for the loop closures detected by our approach across the HeLiPR, NCLT, and IPB-Car datasets. To obtain a reliable reference for evaluation, we refine the local map alignments using Open3D’s (Zhou et al., 2018) point-to-point ICP registration, initialized with the dataset ground-truth pose. We use these refined results strictly as a proxy ground-truth alignment, since directly comparing against the dataset poses would conflate alignment error with odometry drift accumulated within the local maps that we generate using LiDAR odometry.

In Table 7a, we report the number of correctly identified loop closures and the RMS APE in translation and rotation for the pose estimates

({}^{m}T_{r})

produced by our method. Our loop closure identification approach consistently yields initial alignments within a couple of meters of translation and a few degrees of rotation relative to the reference alignment. This performance holds across all evaluated LiDARs, despite their differing scanning patterns, resolutions, and FoVs. It can thus be used with ICP for fine alignment.

Table 7.

Evaluation of the loop closure alignment accuracy. We report accuracy in terms of RMS APE in translation (m) and rotation (°) compared to the proxy ground-truth alignment obtained via Open3D point-to-point ICP. Best results are in bold.

(a) We report the number of true loop closures and the accuracy of the initial pose estimates $({}^{m}T_{r})$ produced by our method. Results from KISS-Matcher (KM) are also shown to contextualize the quality of our estimates; KISS-Matcher’s values correspond to only successful alignments.
Sequence	LiDAR	# Closures		RMS APE (translation) (m)		RMS APE (rotation) [°]
Sequence	LiDAR	KM	Ours	KM	Ours	KM	Ours
HeLiPR Bridge01	Ouster OS2-128	215	216	5.38 ± 5.34	5.57 ± 5.40	0.87 ± 0.83	1.80 ± 1.47
	Livox Avia	122	122	15.68 ± 14.95	2.07 ± 1.24	2.09 ± 1.87	2.74 ± 1.91
	Aeva Aeries II	213	214	1.65 ± 1.43	2.17 ± 1.50	1.06 ± 0.87	1.90 ± 1.44
HeLiPR Town02	Ouster OS2-128	42	42	0.30 ± 0.20	0.89 ± 0.44	0.31 ± 0.17	1.11 ± 0.84
	Livox Avia	5	5	1.07 ± 0.62	1.46 ± 0.73	0.65 ± 0.26	0.82 ± 0.44
	Aeva Aeries II	9	9	0.55 ± 0.23	1.31 ± 0.60	1.32 ± 0.96	1.62 ± 0.90
HeLiPR Roundabout01	Ouster OS2-128	125	139	0.33 ± 0.22	0.72 ± 0.33	0.33 ± 0.23	0.56 ± 0.33
	Livox Avia	12	13	0.44 ± 0.27	2.52 ± 1.88	0.56 ± 0.40	2.04 ± 1.45
	Aeva Aeries II	44	51	1.02 ± 0.74	1.63 ± 0.90	1.14 ± 0.83	1.14 ± 0.61
IPB-Car	Ouster OS1-128	216	230	0.49 ± 0.40	0.84 ± 0.49	0.50 ± 0.36	1.15 ± 0.80
NCLT 2012-01-08	Velodyne HDL-32E	14	26	1.06 ± 0.87	1.23 ± 0.85	1.06 ± 0.86	1.28 ± 0.97

(b) We refine the pose estimates from both methods using Open3D point-to-point ICP, and report the RMS APE in translation and rotation relative to the reference alignment. Both methods converge to similarly accurate solutions, demonstrating that our initial pose estimates are sufficiently precise to serve as reliable initialization for ICP-style refinement within a SLAM pipeline.
Sequence	LiDAR	RMS APE (translation) (m)		RMS APE (rotation) [°]
Sequence	LiDAR	KM + ICP	Ours + ICP	KM + ICP	Ours + ICP
HeLiPR Bridge01	Ouster OS2-128	5.05 ± 5.03	5.05 ± 5.05	1.12 ± 1.11	0.73 ± 0.73
	Livox Avia	15.67 ± 14.98	0.47 ± 0.43	2.02 ± 1.90	0.70 ± 0.69
	Aeva Aeries II	1.51 ± 1.41	0.60 ± 0.56	0.72 ± 0.70	0.40 ± 0.39
HeLiPR Town02	Ouster OS2-128	0.02 ± 0.01	0.02 ± 0.01	0.03 ± 0.01	0.03 ± 0.01
	Livox Avia	0.04 ± 0.02	0.04 ± 0.02	0.02 ± 0.01	0.02 ± 0.01
	Aeva Aeries II	0.02 ± 0.01	0.02 ± 0.01	0.03 ± 0.01	0.03 ± 0.01
HeLiPR Roundabout01	Ouster OS2-128	0.02 ± 0.01	0.02 ± 0.01	0.02 ± 0.01	0.02 ± 0.01
	Livox Avia	0.23 ± 0.22	0.17 ± 0.16	0.45 ± 0.43	0.23 ± 0.22
	Aeva Aeries II	0.48 ± 0.47	0.26 ± 0.25	0.49 ± 0.48	0.25 ± 0.25
IPB-Car	Ouster OS1-128	0.30 ± 0.29	0.28 ± 0.28	0.17 ± 0.17	0.17 ± 0.16
NCLT 2012-01-08	Velodyne HDL-32E	0.63 ± 0.61	0.94 ± 0.89	0.65 ± 0.62	0.61 ± 0.57

We also compare our method to KISS-Matcher by Lim et al. (2025) (KM), a state-of-the-art global point cloud registration method based on 3D features. For KISS-Matcher, we report the number of loop closures it successfully aligns (i.e., more than five inliers) and compute the corresponding RMS error metrics with respect to the reference alignment. We include this comparison not as a competing baseline, but to contextualize the quality of our initial alignment, and demonstrate how close our pose estimates are to those produced by a dedicated global registration system.

As shown in Table 7, KISS-Matcher, a technique designed to globally register point clouds, achieves better alignment accuracy across most sequences. This is expected as KISS-Matcher computes 3D features from the 3D point cloud and estimates the pose using a maximally consistent correspondence set. In contrast, our method combines 2D feature-based pose estimates with the approximate ground alignment estimates to get a complete 3D pose. Despite this fundamental difference, our RMS APE values remain within a few decimeters in translation and a couple of degrees in rotation compared to those of KISS-Matcher. Notably, KISS-Matcher performs worse than our method on the HeLiPR Bridge01 sequence because it lacks an explicit mechanism to handle perceptual aliasing during alignment.

To further demonstrate that our pose estimates are well-suited as initialization for a fine ICP-style alignment stage within a SLAM system, we explicitly run a point-to-point ICP refinement using Open3D, initialized with the pose estimates from our method as well as KISS-Matcher. The results in Table 7b show that both methods converge to solutions that are similarly accurate, and close to the reference alignment.

Our refined results also outperform KISS-Matcher on several sequences. The apparent performance inversion between KISS-Matcher and our pose estimation accuracy before and after ICP refinement may seem counterintuitive at first. A plausible explanation lies in the non-linear least-squares optimization underlying the ICP algorithm, which is sensitive to initialization and can converge to different local minima depending on the initial pose estimate.

Furthermore, two local maps generated from the same location using LiDAR odometry will generally not align perfectly, as they accumulate different amounts of drift and may contain artifacts due to imperfect scan deskewing. This could also lead to a different convergence for different initializations. It is also important to note that the discrepancy in performance between the two methods in Table 7b is, in most cases, on the order of only a few centimeters. The overall trend is that both methods converge to similarly accurate solutions after ICP refinement; therefore, we make no claims about the superiority of one method over the other in terms of final alignment accuracy.

Overall, these results support our fifth claim that the proposed approach yields accurate and complete 3D rigid-body transforms that align detected loop closures and are sufficiently precise to serve as initial guesses for fine-grained ICP-style registration within a SLAM back-end.

5.6. Multi-map alignment

This final experiment evaluates our pipeline’s ability to detect loop closures between the Riverside03 sequence and three KAIST sequences from the MulRan dataset. Although these trajectories share only minimal spatial overlap, it is sufficient to align the two scenes. These sequences also vary in temporal separation, with revisits ranging from the same day to a few months apart.

We first compare our method against baseline approaches using the precision-recall curves shown in Figure 17. Our approach consistently outperforms the baselines, achieving the highest recall while maintaining 100% precision across all three cases. Most other methods, except Scan Context, struggle in this challenging setting, with precision often falling below 50%.

Figure 17.

Precision-recall curves of our approach and state-of-the-art baselines for loop closure detection across sequences from the MulRan dataset with limited overlap.

Next, we compute the optimized trajectory for each sequence using in-session loop closure constraints detected by our method, followed by an additional pose-graph optimization step to align these optimized sequences across sessions using the inter-session closures identified by our pipeline.

We show in Figure 18 the final aligned trajectories and corresponding local maps, illustrating the successful alignment in the overlapping region. We report the precision, recall, and F1 scores for the inter-session loop closures in Table 8, using the default parameters detailed in Section 4.2. Despite the limited overlap, our method achieves perfect precision in all cases.

Figure 18.

We align three KAIST sequences to the Riverside03 sequence from the MulRan dataset using our loop closure pipeline. Despite the minimal spatial overlap, highlighted by the dashed rectangle, our approach accurately detects loop closures and aligns the corresponding local maps.

Table 8.

Precision, recall, and F1 score for multi-session loop closures between the Riverside03 and KAIST sequences from the MulRan dataset. We report the RMS error in the overlap between aligned local maps using our pose estimate ${}^{m}T_{r}$ and the overlap between the reference local maps.

Reference session	Riverside03
Query session	KAIST01	KAIST02	KAIST03
Revisit interval	2 months	Same day	1 week
Precision ↑	1.0	1.0	1.0
Recall ↑	0.120	0.108	0.150
F1 ↑	0.214	0.195	0.261
RMS error in overlap ↓	0.043	0.055	0.085

To assess the quality of the predicted alignment transformation ${}^{m}T_{r}$ , we compute the RMS error of the overlap between aligned local maps, compared to the overlap between their reference counterparts generated with ground-truth poses. The last row of Table 8 demonstrates that our alignment closely matches ground-truth performance.

Additionally, we perform a similar cross-session multi-map alignment between the two self-recorded datasets, IPB-Car and IPB-Backpack, captured with different LiDAR sensors mounted on separate mobile platforms. These sequences exhibit varying motion dynamics and sensor configurations but still contain overlapping regions. We show the resulting aligned trajectories and detected loop closures in Figure 19.

Figure 19.

An example of loop closures detected between two sequences recorded with different LiDAR sensor platforms with a revisit interval of 2 weeks. In blue is the IPB-Backpack sequence, recorded on the campus of the University of Bonn with a Hesai Pandar-128 LiDAR mounted on a backpack. In red is the IPB-Car sequence, recorded in the city of Bonn with an Ouster OS1-128 LiDAR. Both trajectories were individually optimized through a pose-graph with in-session loop closure constraints obtained from our pipeline. We used our loop closure pipeline to detect inter-session loop closures between these two sequences, which have little overlap, and aligned the two trajectories using the multi-session loop closure constraints. The overlapping areas are highlighted by small rectangles, and the corresponding loop closure from these areas is shown in the enlarged rectangles, respectively.

This experiment supports our final claim that our approach can successfully detect and align loop closures in challenging scenarios with little overlap. This capability is critical for tasks such as multi-robot map alignment, collaborative mapping, and long-term change detection.

5.7. Runtime evaluation of our approach

In this additional experiment, we evaluate the runtime of our approach. We run all the experiments on an Intel i9-10980XE @ 3.00 GHz CPU with 64 GB RAM. We implement our approach in C++ with a single-thread design. We report the runtime of different components of our pipeline in Table 9, including the mean and standard deviations for performing ground alignment for a local map, detecting loop closures using BEV density images, and validating detected loop closures through RANSAC. We also provide the average number of scans in each local map. The time required to generate each local map is not reported, as it directly corresponds to the LiDAR odometry algorithm used, that is, KISS-ICP.

Table 9.

Runtime evaluation of our approach. We report the mean and standard deviation of the execution times for each key component of our pipeline.

Sequence	Avg. scans per local map	Execution time (mean ± standard deviation) (ms)
Sequence	Avg. scans per local map	Ground alignment	Loop closure detection	Loop closure validation
MulRan Sejong01	119	4.27 ± 1.59	17.74 ± 10.85	0.31 ± 0.11
HeLiPR Bridge01	96	7.44 ± 4.02	32.11 ± 21.21	0.34 ± 0.20
NCLT 2012-01-08	664	14.94 ± 7.21	63.79 ± 50.49	0.38 ± 0.31
IPB-Car	137	10.38 ± 6.04	50.96 ± 28.40	0.35 ± 0.33

Additionally, in Table 10, we compare the average runtimes of all baselines in terms of frames processed per second (FPS) across three sequences with dense, 360° FoV LiDARs. Scan Context achieves the highest processing speed on all sequences, followed by SOLiD and our method. Notably, the FPS values for STD, BTC, and our method include the runtime of KISS-ICP for scan registration, which generally makes them slower than scan-based methods such as SOLiD and Scan Context, which do not require registration. The learning-based methods, LoGG3D and BEVPlace++, are the slowest, often running below the typical LiDAR frame rate of 10 FPS.

Table 10.

Average runtime evaluation of all the baselines. We report the average number of frames processed per second (FPS) for each baseline on three sequences with dense, 360° FoV LiDARs.

Sequence	MulRan KAIST01	HeLiPR Bridge01	NCLT 2012-01-08
LiDAR	Ouster OS1-64	Ouster OS2-128	Velodyne HDL-32E
STD	24	9	13
BTC	13	8	17
Scan Context	144	101	111
SOLiD	35	24	40
LoGG3D	12	7	15
BEVPlace++	7	5	5
Ours	27	15	29

Overall, our approach achieves a favorable balance between speed and effectiveness, outperforming most conventional baselines while remaining significantly faster than other local map-based and learning-based methods, consistently maintaining processing speeds above the sensor frame rate.

In summary, the experiments presented above support all our claims and showcase the applicability of our approach across various scenarios. The first experiment shows the robustness of our approach to various scanning patterns, FoVs, and resolutions of the LiDAR sensor. The second experiment illustrates the applicability of our approach to detect loop closures between multiple sessions with short-term and long-term revisits through the same environment. The two subsequent experimental evaluations support the newly proposed modules in our loop closure pipeline. We show that our approach can be used on LiDAR platforms with non-planar motion and is also robust toward perceptual aliasing. The fifth experiment confirms that our pipeline produces accurate and complete 3D pose estimates for aligning loop closures. We further demonstrate that our approach can align multiple trajectories with minimal physical scene overlap. Finally, we show that our pipeline can efficiently operate above sensor frame rates, even on dense 3D LiDAR sensors.

6. Limitations

Our approach demonstrates robustness across multiple datasets, but it has certain limitations. The pipeline relies on locally consistent LiDAR odometry to generate local maps, which constitutes its most critical limitation. When the scan matching algorithm produces degenerate maps, loop closure performance also degrades. Incorporating more robust odometry that fuses inertial or visual data with LiDAR scans can help alleviate this limitation.

The method also depends on consistent ground plane detection for BEV projection, which may not hold in cluttered indoor, forested, or off-road scenarios. In such environments, approaches tailored to specific settings, such as ForestLPR (Shen et al., 2025), perform better. For aerial data, the density-preserving BEV projection can undersample vertical structures; a maximum-height BEV may provide a stronger alternative.

Although our method consistently performs better than existing baselines in intra-session and inter-session scenarios, inter-LiDAR scenarios remain challenging. Opposite-direction revisits with low-FoV LiDAR sensors, such as Livox Avia and Aeva Aeries II, are particularly difficult due to limited scene overlap, even after aggregation into local maps.

In summary, our method performs robustly across diverse urban datasets but is constrained by odometry quality, assumptions of a detectable ground plane, and reduced recall in the inter-LiDAR loop closure detection. Addressing these limitations will broaden its applicability to more diverse environments.

7. Conclusion

In this paper, we present a novel and robust approach to detect loop closures for LiDAR-based SLAM and provide 3D alignments between the detected closures. Our method leverages a density-preserving bird’s-eye-view projection of local maps generated from local odometry estimates. We use the local ground plane as a common reference plane across revisits and align the local maps such that the ground plane coincides with the xy-plane of the local map’s reference frame. This ensures a consistent BEV representation across diverse mobile platforms with varying LiDAR motion profiles. We extract ORB feature descriptors on these BEV projections and apply a self-similarity pruning strategy to reduce incorrect closures in repetitive environments.

We implement and extensively evaluate our approach on several public and self-recorded datasets featuring different LiDAR sensors mounted on various mobile platforms. We compare our method with state-of-the-art baselines and demonstrate its effectiveness in detecting loop closures in diverse urban settings across LiDAR sensors with different scanning patterns and fields of view. Furthermore, we illustrate the capability of our approach to detect long-term loop closures across multiple sessions. Finally, our runtime evaluations indicate the real-time feasibility of our approach.

To facilitate further research on place recognition and loop closure detection in LiDAR-SLAM, we release our software as open-source.

Supplemental material

Footnotes

ORCID iDs

Saurabh Gupta

Benedikt Mersch

Niklas Trekel

Meher V. R. Malladi

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has partially been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy, EXC-2070 – 390732324 – PhenoRob, by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under STA 1051/5-1 – AID4Crops within the FOR 5351, by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070405 – DigiForest, and by the German Federal Ministry of Research, Technology and Space (BMFTR) under the Robotics Institute Germany (RIG).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online.

References

Arandjelovic

Gronat

Torii

, et al. (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1437–1451.

Bailey

Durrant-Whyte

(2006) Simultaneous Localization and Mapping (SLAM): part II. IEEE Robotics & Automation Magazine 13(3): 108–117. https://doi.org/10.1109/mra.2006.1678144

Blanco

Jiménez

Fernández-Madrigal

(2013) A Robust, multi-hypothesis approach to matching occupancy grid maps. Robotica 31: 687–701. https://doi.org/10.1017/s0263574712000732

Bosse

Zlot

(2013) Place recognition using keypoint voting in large 3D Lidar datasets. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), Karlsruhe, Germany, 06–10 May 2013.

Bosse

Newman

Leonard

, et al. (2004) Simultaneous localization and map building in large-scale cyclic environments using the Atlas framework. International Journal of Robotics Research (IJRR) 23(12): 1113–1139. https://doi.org/10.1177/0278364904049393

Carlevaris-Bianco

Ushani

Eustice

(2016) University of Michigan north campus long-term vision and LiDAR dataset. International Journal of Robotics Research (IJRR) 35(9): 1023–1035. https://doi.org/10.1177/0278364915614638

Chen

Läbe

Milioto

, et al. (2020) OverlapNet: loop closing for LiDAR-based SLAM. In: Proceedings of Robotics: Science and Systems (RSS), Corvalis, OR, USA, 12–16 July 2020.

Cummins

Newman

(2008) FAB-MAP: probabilistic localization and mapping in the space of appearance. International Journal of Robotics Research (IJRR) 27(6): 647–665. https://doi.org/10.1177/0278364908090961

Dellenbach

Deschaud

Jacquet

, et al. (2022) CT-ICP real-time elastic LiDAR odometry with loop closure. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

10.

Dubé

Cramariuc

Dugas

, et al. (2018) SegMap: 3D segment mapping using data-driven descriptors. In: Proceedings of Robotics: Science and Systems (RSS), Pittsburgh, PA, USA, 26–30 June 2018.

11.

Ferrari

Giammarino

Brizi

, et al. (2024) MAD-ICP: it is all about matching data–robust and informed LiDAR odometry. IEEE Robotics and Automation Letters 9(11): 9175–9182. https://doi.org/10.1109/lra.2024.3456509

12.

Fischler

Bolles

(1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6): 381–395. https://doi.org/10.1145/358669.358692

13.

Fontana

Agamennoni

Siegwart

, et al. (2016) Point clouds registration with probabilistic data association. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

14.

Frome

Huber

Kolluri

, et al. (2004) Recognizing objects in range data using regional point descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer.

15.

Galvez-Lopez

Tardos

(2012) Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics 28(5): 1188–1197. https://doi.org/10.1109/tro.2012.2197158

16.

Geiger

Lenz

Stiller

, et al. (2013) Vision meets robotics: the KITTI dataset. International Journal of Robotics Research (IJRR) 32(11): 1231–1237. https://doi.org/10.1177/0278364913491297

17.

Guadagnino

Chen

Sodano

, et al. (2022) Fast sparse LiDAR odometry using self-supervised feature selection on intensity images. IEEE Robotics and Automation Letters 7(3): 7597–7604. https://doi.org/10.1109/lra.2022.3184454

18.

Gupta

Guadagnino

Mersch

, et al. (2024) Effectively detecting loop closures using point cloud density maps. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

19.

Rodríguez

FSA

Gepperth

(2014) A multi-modal system for road detection and segmentation. In: Proceedings of the IEEE Intelligent Vehicles Symposium. IEEE.

20.

Jiang

Shen

(2023) Contour context abstract structural distribution for 3D LiDAR loop detection and metric pose estimation. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

21.

Johnson

Hebert

(1999) Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(5): 433–449. https://doi.org/10.1109/34.765655

22.

Jung

Yang

Lee

, et al. (2024) HeLiPR: heterogeneous LiDAR dataset for inter-LiDAR place recognition under spatiotemporal variations. International Journal of Robotics Research (IJRR) 43(12): 1867–1883. https://doi.org/10.1177/02783649241242136

23.

Kabsch

(1976) A solution for the best rotation to relate two sets of vectors. Acta Crystallographica - Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography 32(5): 922–923. https://doi.org/10.1107/s0567739476001873

24.

Kim

(2018) Scan context: egocentric spatial descriptor for place recognition within 3D point cloud map. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

25.

Kim

Park

Kim

(2019) 1-day learning, 1-year localization: long-term LiDAR localization using scan context image. IEEE Robotics and Automation Letters 4(2): 1948–1955. https://doi.org/10.1109/lra.2019.2897340

26.

Kim

Park

Cho

, et al. (2020) Mulran: multimodal range dataset for urban place recognition. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

27.

Kim

Choi

Kim

(2021) Scan context++: structural place recognition robust to rotation and lateral variations in urban environments. IEEE Transactions on Robotics 38(2): 21–27. https://doi.org/10.1109/tro.2021.3116424

28.

Kim

Choi

Sim

, et al. (2024) Narrowing your FOV with SOLiD: spatially organized and lightweight global descriptor for FOV-constrained LiDAR place recognition. IEEE Robotics and Automation Letters 9(11): 9645–9652. https://doi.org/10.1109/lra.2024.3440089

29.

Komorowski

(2021) MinkLoc3D: point cloud based large-scale place recognition. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE.

30.

Kümmerle

Grisetti

Strasdat

, et al. (2011) g2o: a general framework for graph optimization. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

31.

Lee

Lim

Myung

(2022) Patchwork++: fast and robust ground segmentation solving partial under-segmentation using 3D point cloud. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

32.

(2021) LiDAR-based initial global localization using two-dimensional (2D) submap projection image (SPI). In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

33.

Lim

Minho

Myung

(2021) Patchwork: concentric zone-based region-wise ground segmentation with ground likelihood estimation using a 3D LiDAR sensor. IEEE Robotics and Automation Letters 6(4): 6458–6465. https://doi.org/10.1109/lra.2021.3093009

34.

Lim

Kim

Shin

, et al. (2025) KISS-Matcher: fast and robust point cloud registration revisited. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

35.

Lowe

(2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2): 91–110. https://doi.org/10.1023/b:visi.0000029664.99615.94

36.

Lowry

Sunderhauf

Newman

, et al. (2016) Visual place recognition: a survey. IEEE Transactions on Robotics 32(1): 1–19. https://doi.org/10.1109/tro.2015.2496823

37.

Yin

, et al. (2022) One RING to rule them all: Radon Sinogram for place recognition, orientation and translation estimation. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

38.

Luo

Cao

Han

, et al. (2021) BVMatch: Lidar-based place recognition using bird’s-eye view images. IEEE Robotics and Automation Letters 6(3): 6076–6083. https://doi.org/10.1109/lra.2021.3091386

39.

Luo

Cao

Sheng

, et al. (2022) LiDAR-based global localization using histogram of orientations of principal normals. IEEE Transactions on Intelligent Vehicles 7(3): 771–782. https://doi.org/10.1109/tiv.2022.3169153

40.

Luo

Zheng

, et al. (2023) BEVPlace: learning LiDAR-based place recognition using bird’s eye view images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE.

41.

Luo

Cao

, et al. (2025) BEVPlace++: fast, robust, and lightweight LiDAR global localization for unmanned ground vehicles. IEEE Transactions on Robotics 41: 4479–4498. https://doi.org/10.1109/tro.2025.3585385

42.

Zhang

, et al. (2022) OverlapTransformer: an efficient and yaw-angle-invariant transformer network for LiDAR-based place recognition. IEEE Robotics and Automation Letters 7(3): 6958–6965. https://doi.org/10.1109/lra.2022.3178797

43.

Xiong

, et al. (2024) CVTNet: a cross-view transformer network for LiDAR-based place recognition in autonomous driving environments. IEEE Transactions on Industrial Informatics 20(3): 4039–4048. https://doi.org/10.1109/tii.2023.3313635

44.

Magnusson

Andreasson

Nuechter

, et al. (2009) Appearance-based loop detection from 3D laser data using the normal distributions transform. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

45.

Mendes

Koch

Lacroix

(2016) ICP-based pose-graph SLAM. In: Proceedings of the IEEE Intl. Symp. on Safety, Security, and Rescue Robotics (SSRR). IEEE.

46.

Mur-Artal

Montiel

Tardos

(2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31(5): 1147–1163. https://doi.org/10.1109/tro.2015.2463671

47.

Paigwar

Erkent

Sierra-Gonzalez

, et al. (2020) GndNet: fast ground plane estimation and point cloud segmentation for autonomous vehicles. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

48.

, et al. (2017a) PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS). Curran Associates Inc.

49.

, et al. (2017b) PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

50.

Ramezani

Wang

Knights

, et al. (2024) Pose-graph attentional graph neural network for lidar place recognition. IEEE Robotics and Automation Letters 9(2): 1182–1189. https://doi.org/10.1109/lra.2023.3341766

51.

Ramos

Nieto

Durrant-Whyte

(2007) Recognising and modelling landmarks to close loops in outdoor SLAM. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

52.

Röhling

Mack

Schulz

(2015) A fast histogram-based similarity measure for detecting loop closures in 3-D LIDAR data. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

53.

Rottmann

Bruder

Schweikard

, et al. (2019) Loop closure detection in closed environments. In: Proceedings of the European Conference on Mobile Robotics (ECMR). IEEE.

54.

Rublee

Rabaud

Konolige

, et al. (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE.

55.

Rusu

Blodow

Marton

, et al. (2008) Aligning point cloud views using persistent feature histograms. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

56.

Rusu

Blodow

Beetz

(2009) Fast Point Feature Histograms (FPFH) for 3D registration. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

57.

Salti

Tombari

Stefano

(2014) SHOT: unique signatures of histograms for surface and texture description. Journal of Computer Vision and Image Understanding (CVIU) 125: 251–264.

58.

Schlegel

Grisetti

(2018) HBST: a hamming distance embedding binary search tree for visual place recognition. IEEE Robotics and Automation Letters 3(4): 3741–3748. https://doi.org/10.1109/lra.2018.2856542

59.

Shan

Englot

Meyers

, et al. (2020) LIO-SAM: tightly-coupled Lidar inertial odometry via smoothing and mapping. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

60.

Shan

Englot

Duarte

, et al. (2021) Robust place recognition using an imaging Lidar. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

61.

Shen

Tuna

Hutter

, et al. (2025) ForestLPR: LiDAR place recognition in forests attentioning multiple BEV density images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

62.

Steder

Grisetti

Burgard

(2010) Robust place recognition for 3D range data based on point features. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

63.

Steder

Ruhnke

Grzonka

, et al. (2011) Place recognition in 3D scans using a combination of bag of words and point feature based relative pose estimation. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

64.

Tombari

Salti

Stefano

(2010) Unique shape context for 3D data description. In: Proceedings of the ACM Workshop on 3D Object Retrieval.

65.

Umeyama

(1991) Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(4): 376–380. https://doi.org/10.1109/34.88573

66.

Lee

(2018) PointNetVLAD: deep point cloud based retrieval for large-scale place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

67.

Vidanapathirana

Ramezani

Moghadam

, et al. (2022) LoGG3D-Net: locally guided global descriptor learning for 3D place recognition. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

68.

Vizzo

Guadagnino

Mersch

, et al. (2023) KISS-ICP: in defense of point-to-point ICP – simple, accurate, and robust registration if done the right way. IEEE Robotics and Automation Letters 8(2): 1029–1036. https://doi.org/10.1109/lra.2023.3236571

69.

Vysotska

Stachniss

(2016) Lazy data association for image sequences matching under substantial appearance changes. IEEE Robotics and Automation Letters 1(1): 213–220. https://doi.org/10.1109/lra.2015.2512936

70.

Vysotska

Stachniss

(2019) Effective visual place recognition using multi-sequence maps. IEEE Robotics and Automation Letters 4(2): 1730–1736. https://doi.org/10.1109/lra.2019.2897160

71.

Wang

Sun

, et al. (2020) LiDAR Iris for loop-closure detection. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

72.

Wiesmann

Marks

Gupta

, et al. (2024) Efficient LiDAR bundle adjustment for multi-scan alignment utilizing continuous-time trajectories. ArXiv Preprint arXiv:2412.11760.

73.

Zhang

Dou

, et al. (2021a) RPVNet: a deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE.

74.

Yin

Chen

, et al. (2021b) DiSCO: differentiable scan context with orientation. IEEE Robotics and Automation Letters 6(2): 2791–2798. https://doi.org/10.1109/lra.2021.3060741

75.

, et al. (2023) RING++: Roto-translation invariant gram for global localization on a sparse scan map. IEEE Transactions on Robotics 39(6): 4616–4635. https://doi.org/10.1109/tro.2023.3303035

76.

Yang

Zhang

Xiao

, et al. (2017) TOLDI: an effective and robust approach for 3D local shape description. Pattern Recognition 65: 175–187. https://doi.org/10.1016/j.patcog.2016.11.019

77.

Yin

, et al. (2024) A survey on global LiDAR localization: challenges, advances and open problems. International Journal of Computer Vision (IJCV) 132: 1–33. https://doi.org/10.1007/s11263-024-02019-5

78.

Yuan

Lin

Zou

, et al. (2023) STD: stable triangle descriptor for 3D place recognition. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE.

79.

Yuan

Lin

Liu

, et al. (2024) BTC: a binary and triangle combined descriptor for 3-D place recognition. IEEE Transactions on Robotics 40: 1580–1599. https://doi.org/10.1109/tro.2024.3353076

80.

Zhang

Shi

(2024) LiDAR-based place recognition for autonomous driving: a survey. ACM Computing Surveys 57(4): 1–36. https://doi.org/10.1145/3707446

81.

Zhou

Park

Koltun

(2018) Open3D: a modern library for 3D data processing. ArXiv Preprint arXiv:1801.09847.

82.

Zhou

Qian

, et al. (2021) S4-SLAM: a real-time 3D LIDAR SLAM system for ground/watersurface multi-scene outdoor applications. Autonomous Robots 45(1): 77–98. https://doi.org/10.1007/s10514-020-09948-3

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB