EgoExo++: Integrating on-demand exocentric visuals with 2.5D ground surface estimation for interactive teleoperation of underwater ROVs

Abstract

Underwater ROVs (Remotely Operated Vehicles) are indispensable for subsea exploration and task execution, yet typical teleoperation engines based on egocentric (first-person) video feeds restrict human operators’ field-of-view and limit precise maneuvering in complex, unstructured underwater environments. To address this, we first propose EgoExo, a geometry-driven solution integrated into a visual SLAM pipeline that synthesizes on-demand exocentric (third-person) views from egocentric camera feeds. We further propose EgoExo++, which extends beyond 2D exocentric view synthesis (EgoExo) to augment a piecewise-planar 2.5D ground surface estimation on-the-fly. Its anchor-free aerial viewpoint supports ground-relative reasoning, such as clearance and terrain-based navigation marker following. The computations involved are closed-form and rely solely on egocentric views and monocular SLAM estimates, which makes it portable across existing teleoperation engines and robust to varying waterbody characteristics. We validate the geometric accuracy of our approach through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. To assess operational benefits, we conduct two user studies with simulation and real-world data, each involving 15 participants, comparing baseline egocentric teleoperation and EgoExo++. Results indicate improved system usability (SUS), reduced perceived workload (NASA-TLX), and significant gains in objective teleoperation performance, including 16% faster missions, 5-fold reduction in path deviation ratio, and fewer collision events (2 vs 5 across trials). Furthermore, we highlight the role of EgoExo++ augmented visuals in supporting shared autonomy, operator training, and embodied teleoperation. This new interactive approach to ROV teleoperation presents promising opportunities for future research in subsea telerobotics. The source packages for EgoExo++ are available at: https://github.com/uf-robopi/EgoExo.

Keywords

subsea telerobotics field robotics XR in robotics

1. Introduction

Unmanned submersible vehicles such as ROVs (Remotely Operated Vehicles) play a crucial role in subsea inspection, remote surveillance, and underwater cave exploration (Rumson, 2021; Siegel et al., 2023; Wishnak, 2022). They are particularly useful in inspecting deep-water structures and surveying confined spaces that are beyond the reach of human scuba divers (Buzzacott et al., 2009; Joshi et al., 2022). In a typical mission, ROVs are controlled by human operators from a surface vessel, who are responsible for the safe and efficient maneuvering of the vehicle (Kennedy et al., 2019; Konoplin et al., 2019). The control consoles for teleoperation typically offer real-time data such as the egocentric video feed, pose, velocity, depth, etc. State-of-the-art (SOTA) ROVs can also include autonomous features for atomic tasks such as hovering (Jin et al., 2022), following navigation guidelines inside underwater caves and overhead structures (Abdullah et al., 2024a; Mohammadi et al., 2023; Yu et al., 2023a), object manipulation (Chen et al., 2025; Manjunatha et al., 2018), trajectory estimation, etc.

While the subsea industries and agencies such as NOAA and naval defense teams deploy underwater ROVs with high-end cameras, sonars, and IMUs (Elor et al., 2021; Wishnak, 2022)—safe and efficient teleoperation remains a challenge in adverse visibility conditions and around complex or sensitive structures. The typical first-person feeds from an ROV camera provide very limited information in landmark-deprived underwater scenes. The operators on the surface can only see the egocentric view, often without global or peripheral semantic information around the ROV (Lensgraf et al., 2023; Thatipelli et al., 2025). Although ROVs can use artificial lights to enhance visibility in low-light scenes, their bright lights get reflected and back-scattered by suspended particles directly at the front camera (Yu et al., 2023a), creating glare and large blind spots for the operator. Additionally, the autonomous and semi-autonomous features of ROVs become erroneous without peripheral positioning in such noisy sensing conditions.

In this paper, we address these issues by introducing an AR (augmented reality) inspired ROV teleoperation interface that generates third-person (exocentric) perspectives as well as provides interactive control choices for viewpoint selection. As shown in Figure 1, the proposed console can generate multiple exocentric views from past egocentric images, with a virtual ROV model projected on the images as if it were taken by a third person following the robot. Our early work introduced the idea of EOB (Eye On the Back) visuals (Islam, 2024), envisioning a single third-person view from immediately behind the ROV to facilitate better teleoperation. Our recent work materialized this idea in EgoExo (Abdullah et al., 2024c), by formalizing an AR-based framework that generates on-demand exocentric imagery from any EOB viewpoint. It also integrated the feature for geometrically accurate ROV positioning into those views. This work further advances this direction of research by introducing EgoExo++, a dynamic 2.5D exocentric visualization, analogous to a bird’s-eye view in terrestrial contexts, that offers an interactive and semantically enriched perspective of the environment. Importantly, our approach is closed-form and solely geometry-driven, ensuring accuracy and real-time efficiency without reliance on data-driven methods or training biases. The envisioned interface supports both fore-aft transitions across multiple EOB views and an interactive, rotatable 360° exocentric perspective, enabling a safer and more informed ROV teleoperation.

Figure 1.

The proposed teleoperation interface is demonstrated for an underwater cave exploration scenario with an ROV. The traditional console interfaces are based on egocentric views (top left), which are limiting and disorienting to a surface operator in noisy low-light conditions. Our EgoExo solution (Abdullah et al., 2024c) offers on-demand exocentric views from a fixed EOB (eye on the back) viewpoint, that is, third-person views from behind the ROV (bottom left). In EgoExo++, we further integrate dynamic 2.5D exocentric views, with the ROV rendered above a textured ground surface. These interactive view options are integrated into a standard BlueROV2 console (by Blue Robotics Inc.) for a significantly improved teleoperation experience.

Specifically, we introduce an efficient framework for generating egocentric to exocentric visual perspectives integrated into a visual SLAM system for underwater ROV teleoperation. The base EgoExo algorithm keeps track of the ROV camera poses and exploits a buffer of egocentric views for exocentric view synthesis. We then transform and project a pre-sampled 3D model of the ROV, in the form of a point cloud, into those views to generate realistic augmented visuals with more peripheral information. In parallel, the EgoExo++ pipeline utilizes SLAM-generated feature points to identify ground regions and fuses them into a piecewise-planar ground surface where pixel colors are transferred from corresponding image regions. This 2.5D ground reconstruction is particularly meaningful in seabed/structure inspection tasks as well as underwater cave missions where navigation cues such as caveline, arrows, and cookies (Abdullah et al., 2024a) are located on or near the floor. In our implementation, we employ a temporal fuse and stack strategy to preserve the historical ground evidence, while the 3D ROV model is projected on the same spatial context. The resulting 2.5D perspective enables operators to interact with the scene using dynamic viewpoints in real-time. As illustrated in Figure 1, these views provide operators with a globally informed and semantically rich snapshot of the surrounding environment. In addition to supporting interactive viewpoint control, the SLAM backend delivers real-time updates on camera pose and environmental mapping to better assist with atomic tasks such as obstacle avoidance, object following, next-best-view planning, object manipulation, etc (Cai et al., 2020; Chen et al., 2025; Palomeras et al., 2019).

We validate the proposed method through a series of analytical, simulation-based, and real-world experiments conducted in both terrestrial and underwater settings. In the base EgoExo system, we first quantify the geometric accuracy of view augmentation with a TurtleBot4 ground robot through reprojection error analysis of known reference points in indoor scenes. Then we conduct underwater cave trials with a BlueROV2 at various geographical locations, which pose unique challenges such as low visibility, turbid water conditions, and moving shadow effects. In such challenging operation scenarios, 15 human subjects rate the utility of EgoExo visuals using the System Usability Scale (SUS) (Brooke, 1996); it achieves an average SUS score of 77.5, indicating the class of Good usability. We further discuss various challenging cases of caveline detection and following in noisy low-light conditions inside multiple underwater caves. These findings establish EgoExo’s trailing views as a useful teleoperation aid, and also highlight the critical role of maintaining altitude clearance and detecting on-ground navigation markers, motivating the semantic ground representation of EgoExo++.

EgoExo to EgoExo++: new capabilities. Our base EgoExo framework leverages the SLAM-estimated 6-DoF pose to synthesize a trailing third-person view by projecting the ROV into a past egocentric image within a global 3D frame. EgoExo++ advances beyond this “pose only trajectory” visualization to a semantically enriched global representation by introducing two new capabilities:

(1) EgoExo++ transforms the semantically empty world representation of EgoExo into a 2.5D ground-referenced altitude map. It leverages the already available sparse SLAM features to identify and reconstruct locally observable ground patches from each local view, which are then temporally fused in the global frame.

(2) It advances the visualization from a trajectory-anchored, EOB perspective to an anchor-free aerial viewpoint, allowing operators to view the ROV from arbitrary vantage points above the reconstructed ground surface and to reason more effectively about altitude clearance and navigation cues.

Extended evaluation of EgoExo++. To demonstrate the new capabilities, we present a detailed technical pipeline for ground surface reconstruction, along with comprehensive experiments. First, we perform geometric analyses to quantify ground segmentation reliability, including plane fitting error, inlier fraction, plane normal consistency, and altitude drift. We then conduct a new user study involving 15 participants, where users teleoperate an ROV in a Gazebo-based underwater industrial facility (Abdullah et al., 2025a) to follow pipelines and detect target markers. We report both subjective workload (NASA-TLX (Hart and Staveland, 1988)) and objective metrics, including mission completion time, path deviation, and collision count. We also conduct a supplementary counterbalanced evaluation in which the experiment order is reversed for a subset of participants to assess potential order bias between the baseline and EgoExo++ trials. The results show consistent performance trends, indicating the usability and benefit of our system.

Notably, EgoExo++ reduces perceived workload while achieving approximately 16% faster mission completion, 5× lower Path Deviation Ratio (PDR), and fewer collisions compared to the baseline (egocentric only) teleoperation system. Together, these additional evaluations demonstrate both technical soundness and operational relevance of the proposed framework. Finally, we discuss the broader potential of the EgoExo++ paradigm for shared autonomy and digital twin-based training, as well as its dependency on SLAM and the practical limitations that arise in challenging underwater environments.

2. Background and Related Work

2.1. Third-person views for ROV teleoperation

A common issue reported by ROV operators is that using a remote vision platform for teleoperation is like looking through a “soda straw” (Islam, 2024; Woods et al., 2004). This is because the typical ROV controller interfaces are based on egocentric first person camera views—which provide no peripheral vision, resulting in significantly reduced situational awareness (Casper and Murphy, 2003; Zollmann et al., 2014). Researchers have explored both fixed (Ferland et al., 2009; Lager et al., 2018) and dynamic (Nguyen et al., 2001; Okura et al., 2013) viewpoint augmentation methods in contemporary human-machine interface study (Abdullah et al., 2024b; Xia et al., 2022).

Two primary approaches are used for generating exocentric views in unmanned ground and aerial vehicles. The first leverages external cameras to capture the vehicle’s motion from a distance; examples include fixed ground cameras (Jangir et al., 2022), UAV-mounted overhead views (Erat et al., 2018; Gawel et al., 2018; Inoue et al., 2023; Saakes et al., 2013), elevated on-robot mounts (Shiroma et al., 2004), camera-equipped follower ROVs (Nagatani et al., 2011), and fisheye lenses for top-down perspectives (Hing et al., 2010; Sato et al., 2013). The second method utilizes additional onboard sensors, such as LiDAR (Light Detection and Ranging), to generate a point cloud of the surrounding environment (Ferland et al., 2009; Lager et al., 2018) and use it to create an augmented/virtual reality for interfacing and teleoperation (Hing et al., 2010; Livatino et al., 2021; Thomason et al., 2019; Xia et al., 2023b).

Adapting the aforementioned methods from terrestrial or aerial domains to underwater environments presents inherent challenges. Firstly, sending diver-robot teams (Islam et al., 2021) is not always an option in complex deep-water missions—which are the majority of use cases for ROVs. Secondly, UGVs that utilize past egocentric views (Ito et al., 2008; Murata et al., 2014) primarily rely on GPS-based localization that does not apply to GPS-denied underwater environments. Unlike underwater ROVs, ground vehicles generally operate on a 2D plane with limited pitch and roll variations over rough terrain (Yoon et al., 2025). Thirdly, installing an external visual system requires significant hardware modifications, for example, they need to be rugged and pressure-sealed, recalibrated for buoyancy and motion dynamics, and additional tether integration for high-speed exocentric data transfer. Even with all the structural modifications, an external camera will provide a single additional third-person perspective.

A range of AR/VR-based teleoperation systems have been developed to enhance operator immersion and augment visual feedback for subsea tasks such as object grasping and manipulation (Bruno et al., 2018; Chen et al., 2025; Girbes-Juan et al., 2020; Xia et al., 2022), inspection (Blow et al., 2025; Zhou et al., 2023), and navigation (Xia et al., 2023c). These systems commonly support third-person perspectives by embedding the operator within an XR (extended reality) environment that incorporates a digital twin of the ROV (Xia et al., 2023a) and reconstructs the surrounding scene using 3D models. While such immersive interfaces improve situational awareness and control, they often require extensive sensory augmentation (e.g., visual, auditory, haptic) at both the ROV and operator ends (Chen et al., 2025; Xia et al., 2023b; Zhou et al., 2023), which increases hardware demands and complicates real-time deployment.

2.2. 2.5D exocentric view generation

Generating 2.5D/3D third-person views from front-facing camera is critical for scene understanding, both for human teleoperators and for autonomous vehicles. The challenges lie in extreme viewpoint shift and lack of direct depth cues from monocular inputs. Recent efforts for 2.5D view synthesis can be categorized into two main areas: homography-based geometric projections (Abbas and Zisserman, 2019; Wang et al., 2019) and generative models using encoder-decoder, adversarial, or transformer-based learning (Li et al., 2024; Luo et al., 2024).

The geometry-guided CNN proposed by (Abbas and Zisserman, 2019) warps frontal images to the top view using a fitted homography matrix. While efficient for structured environments, their approach is limited to the flat-ground assumption and struggles with non-planar surfaces. Zhu et al. (Zhu et al., 2018) introduce an intermediate homography view from generative adversarial network (GAN) to reduce the difficulty of pure geometric transformation. Transformer models such as BEVFormer (Li et al., 2024) and BEVDepth (Li et al., 2023) integrate temporal or multi-view cues to improve realism in synthesized views at a cost of high computation. Other learning-based approaches fuse multiple camera views or additional sensors (e.g., LiDAR) to generate semantic aerial views (Reiher et al., 2020; Samani et al., 2023), diverging from monocular egocentric setups. Unlike these data-driven approaches, we propose a lightweight geometric solution, integrated into a visual SLAM pipeline that offers real-time, interactive third-person perspectives without relying on multi-modal sensory augmentation or additional hardware.

3. EgoExo++: Problem Formulation

We formulate the EgoExo++ problem as a 3D geometric algorithm that involves generating an on-demand EOB view, reconstructing the ground surface, and then projecting the ROV model both on 2D and 2.5D context for augmented rendering of the scene; see Figure 2. The proposed method has the following computational components.

Figure 2.

The computational pipeline is shown. From historical egocentric views and SLAM-derived poses, EgoExo computes a 2D exocentric image by applying pose geometry to project the ROV model; a sparse map of the environment is also constructed using SLAM-derived feature points. EgoExo++ reuses the feature points to fit a ground plane via RANSAC, then generates a textured 2.5D ground surface, and augments the ROV mesh to produce interactive exocentric views.

3.1. Curating ROV pose and image buffer

A monocular SLAM algorithm such as ORB-SLAM3 (Campos et al., 2021) provides a continuous solution for estimating and tracking camera poses from a sequence of monocular images. We use an ORB-SLAM3-based framework to obtain camera poses of each keyframe location to eventually construct the trajectory map of the teleoperated robot. In our implementation, the SLAM pipeline initiates the trajectory estimation process by building a pose buffer of length n: ${}^{w}T ≜ [{}_{i}^{w}T {}_{i - 1}^{w}T, {}_{i - n + 1}^{w}T]$ , where, ${}_{i}^{w}T = [{}_{i}^{w}R_{3 \times 3} | {}_{i}^{w}t_{3 \times 1}]$ denotes camera pose at instance i in global (world) frame of reference. The corresponding raw egocentric views I for each instance are also stored in a queue I ≜ [ I_i, I_i−1, ⋯ I_i−n+1 ]. These memory buffers are updated instantaneously as the robot’s pose changes during teleoperation. We use an empirically tuned threshold to trigger an update only when the pose change is significant to avoid unnecessary updates (when the robot is static).

3.2. Generating 2D Exo image

Given the pose memory ${}^{w}T$ and egocentric views I, we formulate the EgoExo problem of estimating an exocentric view from a reference location r, looking toward the robot’s current location c, where r, c ∈ [i − n + 1, i] and r < c. Typically, c is set to i (most recent available frame), and r remains a free variable with n known samples in memory—to mimic the EOB viewpoint generation.

We use the ROV point cloud model P_rov of size 3 × m as prior. These m points are transformed from current camera pose ${}_{c}^{w}T$ to reference camera pose ${}_{r}^{w}T$ using:

{\tilde{P}}_{rov} = ({}_{r}^{w}R^{- 1} {}_{c}^{w}R) \cdot P_{rov} + ({}_{c}^{w}t - {}_{r}^{w}t),

(1)

where

[{}_{c}^{w}R | {}_{c}^{w}t]

and

[{}_{r}^{w}R | {}_{r}^{w}t]

represent the ROV pose for current and reference (target) location in world coordinate, respectively. The transformed point cloud

{\tilde{P}}_{rov}

is then projected onto the target image plane by using camera intrinsics K as:

{[\begin{matrix} u & v & 1_{m×1} \end{matrix}]}^{T} = λ_{1} K \cdot {\tilde{P}}_{rov} .

(2)

here, u and v vectors denote the pixel locations (u, v) on image I_r for projection; λ₁ is the scale.

3.3. Generating 2.5D Exo views

In EgoExo++, we reuse the SLAM-generated visual features to estimate the ground surface and synthesize a lightweight terrain-aware 2.5D perspective. We focus on reconstructing only the ground surface rather than the full volumetric scene since the ground structure provides altitude awareness and a stable spatial anchor for teleoperation. Moreover, navigation markers in underwater caves, such as cavelines and arrows, are typically located on the ground, making ground reconstruction more relevant for guided navigation. The process involves four stages: (i) selecting candidate feature points for the ground surface, (ii) fitting the ground plane, (iii) translating texture from image pixels to the estimated surface, and (iv) fusing multiple frames over time for real-time visualization.

Due to the lack of horizon line in open water settings and the uneven geometry of confined underwater spaces (e.g., caves), we incorporate geometric priors based on the camera orientation to initialize the ground region estimation. In the nominal case with zero pitch and roll, the ground remains within the bottom half of the image, separated by a horizontal line at v = H/2 (where H is the image height). As the camera pitches downward, this line shifts upward, since a larger portion of the ground comes within the camera’s FOV, and vice versa. A camera roll rotates this dividing line on the image plane accordingly. By computing this orientation-adjusted imaginary horizon from the known camera pose, we restrict candidates to points that fall within the “ground side” of the image. This prior ensures that no 3D point projecting above the horizon (e.g., from cave walls and ceiling) is selected as ground.

Let ${}^{w}P = {p_{j} \in R^{3}}_{j = 1}^{J}$ be the set of SLAM feature points in the world frame, associated with camera pose ${}_{i}^{w}T$ at time instance i. After imposing the geometric prior and pre-selecting candidate points, we fit a plane π: n^⊤x + d = 0 via RANSAC (Fischler and Bolles, 1981):

\min_{n, d} \sum_{j} ρ (|n^{⊤} p_{j} + d|),

(3)

where ρ(⋅) is an inlier loss with threshold τ. To enforce stability, we apply a prior that constrains the plane normal n within an angle ±θ_max of the expected vertical direction (−y in camera frame). Given the plane π and a reference anchor x₀ (closest point from camera center to π), we define an orthonormal basis {e_u, e_v, n} on the plane. Each 3D point is expressed in local coordinates as:

[\begin{matrix} u_{j} & v_{j} & h_{j} \end{matrix}] = {(p_{j} - x_{0})}^{⊤} [\begin{matrix} e_{u} & e_{v} & n \end{matrix}] .

(4)

A rectangular grid (ξ, η) is constructed on the ground plane, and the sparse heights {h_j} are interpolated to obtain a smooth elevation field h(ξ, η). Each grid vertex

q (ξ, η) = x_{0} + ξ e_{u} + η e_{v} + h (ξ, η) n

(5)

is then reprojected to the image using intrinsics K:

[\begin{matrix} u^{'} & v^{'} & 1 \end{matrix}] = λ_{2} K {}_{i}^{w}T^{- 1} q (ξ, η) .

(6)

Image colors I(u′, v′) are sampled (bilinear interpolation) to texture the grid, producing a dense 2.5D ground surface. To extend the ground beyond a single camera frame, all historical ground patches are accumulated in the global frame while maintaining local plane normals; we do not assume a globally flat-ground plane. Patches are merged using voxel decimation and Delaunay triangulation (Lee and Schachter, 1980) to avoid redundancy while preserving continuity. The fused mesh forms a 2.5D exocentric perspective with natural uneven terrain variation and realistic coloring consistent with the egocentric imagery.

3.4. ROV model rendering and scene update

While the SLAM system constructs a sparse map of the surroundings, the proposed algorithm simultaneously renders the 3D ROV point cloud (or mesh for EgoExo++) on the same spatial context. The ROV points P_rov are transformed to the current camera location and projected based on the relative pose information ${}_{i}^{w}T$ as follows:

{\tilde{P}}_{map} = λ_{3} {}_{i}^{w}R \cdot P_{rov} + {}_{i}^{w}t .

(7)

here, λ₃ is the scaling factor for the ROV model. Note that our mapping and projection method is up to scale, like all monocular SLAM-based systems (Kazerouni et al., 2022; Macario Barros et al., 2022). While the scale can be resolved with additional sensor fusion, the augmented visuals of equation (7) are sufficient for teleoperation.

4. Implementation & Evaluation

4.1. Implementation details

The framework is implemented using ROS Noetic in an Ubuntu 20.04 environment, running on an Intel Core i9 processor with 16 GB of RAM. A ROS node for ORB-SLAM3 is integrated as the monocular SLAM backbone. Note that we adopt the North-East-Down (NED) frame convention used by (Manderson et al., 2016), which is local to the SLAM origin (not aligned with Earth’s North/East). The scaling parameters λ₁, λ₂, and λ₃ are empirically tuned once for each test sequence according to the scale of the map and the approximate physical dimensions of the ROV to ensure visually realistic projections. Once chosen, the scale parameters remain fixed throughout the mission and do not require retuning during operation.

EgoExo view augmentation. We maintain a buffer of past egocentric frames with a queue size of n = 100; the frame separation threshold is set to 0.001 units (up to scale). The ROV point clouds are generated by sampling 3D mesh models of BlueROV2 and TurtleBot4; 10,000 points are sampled for each model.

EgoExo++ ground reconstruction. We extract the ground patch from each egocentric image using a RANSAC-based plane estimation process. We use a 0.10 unit point-to-plane inlier threshold, up to 2000 iterations, and require at least 50 inliers, with early termination if more than 80% of candidates are explained. The fitted plane normal in camera coordinates is restricted to stay within 60° of the negative camera y-axis (downward). All points within a distance band of 0.20 unit around the fitted plane, that are (i) in front of the camera, (ii) inside the image, (iii) in the bottom band, and (iv) geometrically below the camera are finally labeled as ground.

We then express these inliers in a local plane frame and rasterize them onto a regular 2D grid with cell size 0.01 unit. Heights on this grid are obtained with a LinearNDInterpolator over the inlier samples, with a NearestNDInterpolator used to fill holes where the linear interpolation is undefined. The resulting dense 3D surface is triangulated as a regular mesh (two triangles per grid cell) provided that all three vertices back-project to valid image pixels inside the convex hull of the ground inliers; for efficiency and decimation, we cap the number of triangles per frame at 20000.

4.2. Proof of concept: 2D indoor navigation

Experimental setup. The proof-of-concept experiments are conducted with TurtleBot4, a 2D ground robot that can be teleoperated with egocentric views from its front-facing monocular camera. It has only two degrees of freedom (DOF) for linear and angular velocity, which simplifies the motion kinematics for tracking its instantaneous position and orientation. We teleoperate it to collect visual data with a USB camera at 640 × 480 p resolution in office, laboratory, and hallway scenarios. The experiments are designed to validate the proposed algorithm by evaluating ground plane estimation and reprojection errors.

Geometric validation: reprojection error analysis. We first evaluate the reprojection errors of known reference points from the generated EgoExo views and the estimated ROV pose. We use standard checkerboard corners as reference points from egocentric views and then evaluate the reprojection errors for those points from exocentric views. This test is iterated over different sets of past egocentric images, each corresponding to a different EOB distance. As shown in Figure 3, a checkerboard is viewed from different EOB distances (further back into the past), indicated by the parameter f. More specifically, f is the number of frames between the current egocentric view and the selected EOB view. The corresponding reprojection error is plotted in Figure 3(b), which shows how the estimation is accurate for lower values of f, and gradually degenerates for f > 100. This is consistent with our visual observation of the projected ROV point cloud, that is, it is on the ground plane with accurate orientation based on the SLAM trajectory estimates.

Figure 3.

We conduct 2D indoor navigation experiments with a TurtleBot4 to validate the geometric accuracy of our algorithm. (a) The TurtleBot4 trajectory during teleoperation is shown; here, the f numbers indicate the EOB distance from current to reference frame used for the generated EgoExo views. (b) Reprojection errors for reference points (checkerboard corners) are evaluated for different EOB distances (f). The estimated ground surface is shown as a convex hull of inlier points; the surface normal is overlaid for better visualization.

Geometric validation: ground plane estimation. We adopt four metrics to evaluate the quality of ground plane estimation: inlier fraction, plane RMSE, temporal drift in plane normal, and temporal drift in altitude. The inlier fraction for each frame reports the ratio of inliers to total candidate points (N) after RANSAC fit:

η = \frac{1}{N} \sum_{j = 1}^{N} 1 (| n^{⊤} p_{j} + d | < τ),

(8)

where n, d are the fitted plane parameters, p_j are the candidate 3D points, and τ is the distance threshold (see Equation (3)). A higher η indicates that the majority of candidate points are consistent with a single ground plane. Subsequently, the point-to-plane distances of inliers are calculated to quantify the fitting residual as root-mean-square error (RMSE):

RMSE = \sqrt{\frac{1}{N} \sum_{j = 1}^{N} {(n^{⊤} p_{j} + d)}^{2}} .

(9)

A lower RMSE reflects a tighter fit around the estimated plane. Next, to assess temporal consistency, we compute the angular difference between consecutive plane normals:

Δ θ_{i} = \arccos (\frac{n_{i}^{⊤} n_{i - 1}}{‖ n_{i} ‖ ‖ n_{i - 1} ‖}) .

(10)

The mean angular drift across frames is reported, where low values indicate temporal stability. Finally, the altitude at instance i is computed as the vertical distance of the camera center c_i to the estimated plane:

h_{i} = \frac{n_{i}^{⊤} c_{i} + d_{i}}{‖ n_{i} ‖} .

(11)

In the absence of true measurement, the computed (scaled) altitude is not meaningful; however, a low deviation across frames indicates that the synthesized 2.5D ground remains consistent for visualization.

In 2D indoor setup, the robot’s camera is rigidly mounted with negligible roll and pitch variation, so the estimated ground plane normal is expected to align with the camera’s vertical axis and remain stable across frames. Consequently, the plane inlier fraction should be consistently high, and the residual error should approach zero. The results obtained from several trials in office, laboratory, and hallway scenarios are summarized in Table 1. A high inlier fraction and low normal drift across frames confirm the robustness of our approach under such structured conditions; please refer to the next section for further evaluation in unstructured settings.

Table 1.

Evaluation of ground plane estimation is presented for indoor UGV operation.

# Ego Frames	InlierFraction (↑)	PlaneRMSE (↓)	NormalDrift (↓)	AltitudeDrift (↓)
476	99.4%	0.005	4.67°	0.058

Figure 3(b) illustrates two representative examples from an office scene: one for f = 70 with a low reprojection error, and another for f = 260 with a high error. As seen, the estimated ground plane normal validates the geometric accuracy for the f = 70 case. On the other hand, a misaligned plane normal for the f = 260 case demonstrates the underlying error in pose estimation as well as in the reprojection process. Essentially, the geometric accuracy of our proposed algorithm depends on the pose estimation performance of the SLAM system.

Computational efficiency. We analyze the computational complexity of the proposed algorithm for different configurations to ensure real-time execution in resource-constrained edge devices onboard standard ROV platforms. Table 2 shows the memory requirement of our algorithm for different choices of buffer size. The memory footprint is 274 MB for a buffer size of 180 frames, making it highly efficient. Table 3 demonstrates that the base EgoExo framework maintains a consistent output rate of over 25 FPS (frames per second). The added computation for ground estimation slightly reduces the scene update rate in EgoExo++, but still maintains over 20 FPS, making it suitable for integration in existing teleoperation engines.

Table 2.

Memory requirement of the proposed framework for different buffer lengths.

Buffer (# of frames)	50	100	200	300	400
Memory usage (MB)	65	142	301	455	609

Table 3.

End-to-end computational speed of the proposed framework; the rows report: (i) ROS node publish rate of the imagery; and (ii) the global scene update rate.

Method	SLAM	SLAM + EgoExo	SLAM + EgoExo++
Image update	26 FPS	25.1 FPS	25.1 FPS
Scene update	26 FPS	25.3 FPS	20.2 FPS

4.3. Field deployment: 3D underwater caves

Experimental setup. We extend our experiments to underwater cave exploration scenarios, where the ROV performs full 6-DOF motions. While the roll motion is limited in the standard BlueROV2, we consider all 6-DOF for teleoperation with the buoyancy change and pressure imbalance caused by water flow at the cave openings. For remote teleoperation, we consider the scenarios where human operators maneuver an underwater ROV from the surface by following the caveline and other navigation markers as guides (Abdullah et al., 2024a). The mission objective is to navigate the ROV 75-300 feet deep inside the cave through its complex structures, and then safely return it to the surface. The videos are recorded at 1920 × 1080 p resolution with a GoPro11 camera mounted on BlueROV2 and then compressed to 640 × 480 p within our framework. In addition to evaluating the geometric accuracy, we consider how informative the generated views are compared to traditional consoles for ROV teleoperation.

Real-time map update and teleoperation. In addition to the exocentric view generation and ROV pose rendering, the EgoExo framework simultaneously updates a sparse map with extracted feature points from the SLAM system. Figure 4 shows an ROV’s trajectory mapped during our trial in an underwater cave in Peacock Springs, Florida. As seen, the generated EgoExo views embed more peripheral information about the scene. The exocentric view of the ROV pose and its relative distance from cave walls or overhead obstacles are useful to surface operators for obstacle avoidance and efficient decision-making. Additionally, the 3D map shows the ROV’s past trajectory and its current pose, which are useful to analyze the mission progress, which is not possible in traditional teleoperation consoles. Such a global view of the trajectory is also useful during emergency evacuation and recovery. Beyond cave exploration, these features will be crucial in ROV-based subsea surveillance and search-and-rescue operations as well.

Figure 4.

EgoExo and EgoExo++ views are shown for field trials conducted in the Peacock Springs cave system, Florida. (a) The EgoExo pipeline generates 2D exocentric imagery from directly behind the ROV, along with a sparse 3D map of the environment. Pop-ups show: (i-ii) Ego and Exo views with rendered ROV pose; and (iii-iv) updated camera poses and Exo view of the 3D map. (b) The EgoExo++ extends it by reconstructing the ground surface and offering full 360° exocentric viewpoints. Pop-ups show operator-selected viewpoints above the ground surface: (i) back, (ii) front, (iii) top, and (iv) side.

Validation: homographic projection. Due to the complex scene geometry inside underwater caves, we adopt a homography estimation approach for the performance validation. As shown in Figure 5, April-Tag (Olson, 2011) corners are used as reference points for reprojection. Specifically, we compute the homography transformation between the egocentric and synthesized exocentric views to visualize the reprojection errors. We use a sample 2D logo and project it onto the reference April-Tag surface using the homographic transform. The unskewed planar projection validates the accuracy of the pose estimation and point cloud rendering processes.

Figure 5.

A snapshot from our cave exploration scenario: (a) egocentric view with detected reference points; and (b) synthesized EgoExo view with projected ROV point cloud. We use a sample logo for homographic projection on the reference surface to demonstrate the accuracy in pose estimation.

Validation: ground plane and 2.5D view. The ground plane estimation in the field is assessed following the same method as the indoor validation; Table 4 summarizes the results for trials performed at two different cave systems. In addition to the four metrics defined earlier in equation 8–11, a success rate is also reported, since a valid ground plane cannot always be recovered in unstructured cave scenes. The success rate is defined as the percentage of frames in which a plane can be reliably fitted from the sparse SLAM feature points.

Table 4.

Evaluation of ground plane estimation is presented for field trials conducted in two distinct cave systems in FL, USA.

Field trials	Peacock springs	Devil’s springs
Ego frames	485 segments	1153 segments
Success rate (↑)	85.8%	97.17%
Inlier fraction (↑)	94.0%	90.7%
Plane RMSE (↓)	0.04	0.08
Normal drift (↓)	6.3°	20.49°
Altitude drift (↓)	0.29	0.47

The results in Table 4 show that although trials in Devil’s Springs cave systems have a higher success rate in detecting the ground plane, the plane quality from Peacock Springs cave systems is consistently better. This difference can be attributed to the more complex cave structure and the challenging ROV trajectory executed in the latter case. In Devil’s Springs, several obstacles (e.g., large rocks) appeared directly in front of the ROV, forcing the operator to ascend and maneuver around. The terrain itself had high altitude variations, composed of rocks, boulders, and scattered pebbles, in contrast to the relatively smooth sedimentary floor observed in Peacock Springs. The reconstructed ground map from Peacock Springs shows consistent elevation and orientation (see Figure 4 (b)), supporting the quantitative results. More snapshots from the two sites are provided in Figure 8. As seen, the rocky terrain in Devil’s Springs and the resulting jerky vehicle motion led to larger errors in ground plane estimation, greater deviations in the fitted normal, and higher variability in estimated altitude.

Observations: strengths and limitations. Our experiments reveal some key strengths of the proposed teleoperation framework. First, the generated exocentric views closely resemble the actual EOB views during a smooth trajectory, which is usually the case for subsea exploration and surveying tasks. Second, the buffer memory works as a backup during a temporary failure of the SLAM system, typically observed at turning corners or due to abrupt motion. In such cases, our algorithm retains historical poses along with their associated egocentric images from its buffer memory. Unlike the sparse SLAM map and last known poses, which are geometrically sparse and semantically empty, the buffered images and the 2.5D terrain map provide richer spatial context, assisting the operator to understand the recent scene layout better and safely anchor or pause the mission until communication is restored. On the other hand, its heavy dependency on the SLAM backbone leads to some inherent limitations. Feature-based monocular SLAM systems often fail in feature-deprived, noisy underwater scenes, which leads to inaccurate pose tracking and thus inaccurate EgoExo view synthesis. Tracking 6-DOF ROV motion from monocular vision is particularly challenging with no additional sensor to recover the scale information (Joshi et al., 2019; Wu et al., 2023). We observe some instances where the estimated ROV pose is incorrectly scaled in the rendering. To address this, multi-sensor fusion-based underwater SLAM backbones (Lago et al., 2024; Mo et al., 2021; Rahman et al., 2022) can be utilized in more critical applications.

4.4. User study #1: SUS

The goal of this user study is to assess interface-level usability using real mission data. Hence, the study is conducted with multiple underwater cave exploration data collected during our field trials. A BlueROV2 recorded egocentric video feeds inside the caves at up to 100-m penetrations. Later on, the playback sessions are presented to 15 human participants, between the ages of 21-32, with little/no prior teleoperation experiences. They evaluate the ease of operation with our developed EgoExo console and compare it to traditional consoles. Their feedback is recorded using the System Usability Scale (SUS) (Brooke, 1996), with our interface achieving an average SUS score of 77.5. We also formulate an independent set of questions on the teleoperator’s preference for the novel features of our method. The individual questions and corresponding scores are presented in Table 5. Some key observations from this study are listed below.

(1) The obtained SUS score is fairly above median (score: 68) and is considered Good for user experience; it is slightly below the Excellent (score: 80.3) category.

(2) Post-operation feedback from our ROV operators suggests that the exocentric views are more useful for safe ROV maneuvers.

(3) The synthesized 3D map provides a better sense of the ROV’s global location and improves spatial awareness of the teleoperators.

(4) The operators report a significantly lower workload (perceived cognitive load) in conducting complex tasks such as object following and structure mapping.

Table 5.

In our study, 15 human participants provide their feedback to the following two sets of questions: (i) The first 10 questions are from SUS (Brooke, 1996); and (ii) The remaining three questions are custom-designed. Response to each question is scaled from 1 (strongly disagree) to 5 (strongly agree).

#	Questions	Mean, SD
1	I think that I would like to use this system frequently.	4.3, 0.6
2	I found the system unnecessarily complex.	2.0, 0.7
3	I thought the system was easy to use.	4.3, 0.4
4	I think that I would need the support of a technical person to be able to use this system.	2.0, 0.6
5	I found the various functions in this system were well integrated.	4.0, 0.8
6	I thought there was too much inconsistency in this system.	2.3, 0.7
7	I would imagine that most people would learn to use this system very quickly.	4.4, 0.5
8	I found the system very cumbersome to use.	2.0, 0.6
9	I felt very confident using the system.	3.7, 0.7
10	I needed to learn a lot of things before I could get going with this system.	1.4, 0.5
11	The proposed exocentric view is beneficial for ROV teleoperation.	4.5, 0.5
12	I found the EOB distance tuning feature useful to get the best view.	4.5, 0.5
13	The generated 3D map provides a better understanding of the ROV’s global location and its surroundings.	4.6, 0.9

4.5. User study #2: NASA-TLX

Unlike the SUS study, this experiment involves active teleoperation, enabling objective assessment of performance metrics such as mission completion time, path deviation ratio, and collision count. Such metrics are difficult to measure in real cave environments due to the lack of reliable ground-truth localization and trajectory data. Hence, this user study is conducted in a Gazebo-simulated environment of a 60 m deep, 25 m × 15 m underwater industrial facility containing pipelines, subsea pods (Abdullah et al., 2025a; Blow et al., 2025), and visual fiducial markers (see Figure 6). Participants are tasked with teleoperating the NemeSys robot in ROV mode (Abdullah et al., 2025b) in a lawnmower pattern along the three green pipelines and visually detecting the front/back-faced markers attached to the pod with the ROV’s front-facing camera. The autopilot assists in maintaining the intended depth, so the operator primarily controls surge and yaw motion in the horizontal plane; however, depth and roll control remain available for collision avoidance if needed. The participant pool consists of 15 individuals (aged 25–36), including 4 females and 11 males, with prior teleoperation experience distributed as 3 experts (significant ROV teleoperation experience), 4 intermediate users, and 8 novices (no teleoperation experience). Each participant performs the same mission twice: first using only egocentric camera views and then using the proposed EgoExo++ system.

Figure 6.

A Gazebo-simulated underwater facility with five subsea pods is used to evaluate teleoperation performance. Participants drive the ROV by following the green pipelines and visually identify fiducial markers mounted on the pods. Each operator repeats the mission twice—first using the egocentric camera feed and then using the proposed EgoExo++ interface.

Subjective evaluation. Subjective workload is assessed using NASA-TLX (Hart and Staveland, 1988), collected on an unweighted 0 (very low) – 20 (very high) scale; the results are summarized in Table 6. EgoExo++ reduces perceived workload across all six dimensions. For instance, participants report 34% reduction in mental demand and feel 19% more successful when using our teleoperation console. These results suggest that EgoExo++ reduces cognitive effort and stress while improving task efficiency.

Table 6.

NASA-TLX subjective workload is compared between egocentric-only teleoperation and EgoExo++. Scores are collected from 15 participants on the standard unweighted 0 (very low) to 20 (very high) scale; Mean, Standard Deviation are reported.

#	Questions	Ego-only (baseline)	EgoExo++ (proposed)
1	How mentally demanding was the task?	9.3, 4.9	6.1, 3.5
2	How physically demanding was the task?	6.2, 5.8	4.1, 3.7
3	How hurried or rushed was the pace of the task?	8.1, 4.8	7.3, 5.5
4	How successful were you in accomplishing what you were asked to do?	14.3, 3.2	17.0, 2.5
5	How hard did you have to work to accomplish your level of performance?	9.2, 4.4	5.4, 3.1
6	How insecure, discouraged, irritated, stressed, and annoyed were you?	3.9, 2.8	2.3, 2.1

Objective evaluation. We report three objective metrics to assess task performance; the results are summarized in Table 7. Mission time measures the duration required to navigate past all 5 pods and detect all 10 target markers. Path Deviation Ratio (PDR) represents the normalized overhead distance beyond the ideal path length (81.75 m); it is calculated as: PDR = (L_actual − L_ideal)/L_ideal × 100%.

Table 7.

Task performance for 15 users is compared between the traditional egocentric teleoperation interface and EgoExo++. Scores for mean and standard deviation are reported for the first two metrics, while the final row reports the total collision count across all participants.

Metric	Ego-only(Baseline)	EgoExo++(Proposed)
Average mission time (sec. ↓)	145.3 ± 31	121.3 ± 20
PDR: Path Deviation Ratio (% ↓)	12.3 ± 6.0	2.5 ± 9.3
Total collision count (↓)	5	2

Essentially, PDR = 0 indicates perfect adherence to the intended trajectory, while positive values represent proportionally higher deviation. Collision is counted as the number of physical contacts with the seabed, pipelines, or pod structures. Compared to egocentric-only teleoperation, EgoExo++ achieves 16% faster mission completion and 5× lower PDR. The relatively large standard deviation in PDR reflects variability in operator expertise, particularly among novice participants who made wide turns at the sharp corners. Additionally, only 2 collisions were observed under EgoExo++ compared to 5 under egocentric-only teleoperation, indicating safer and more confident navigation behavior of the ROV.

Order bias and counterbalanced validation. To reduce potential order bias and learning effects between two consecutive trials, we use a practice/familiarization phase before the formal experiments. Specifically, we brief the participants about the facility layout, mission objective, and target locations. They are then allowed to freely teleoperate the ROV using any of the available visualization modes (Ego-only, trailing EgoExo, and ground-referenced map). This practice session allows participants to become comfortable with the joystick controls and the visual interfaces before data collection begins. Furthermore, we conduct a supplementary evaluation in which 5 participants (2 new and 3 from the previous pool) perform the trials in reversed order (EgoExo++ first, then Ego-only). The results show consistent trends with the main study: EgoExo++ again achieves faster mission completion (approximately 18% improvement) and lower path deviation ratio (2.6%) compared to the Ego-only baseline (9.0%). These results suggest that the observed performance gains are independent of the order in which the trials are performed.

Qualitative insights. The aggregated trajectory overlays in Figure 7 further support the quantitative findings: EgoExo++ trajectories are more compact and aligned with the intended route, whereas ego-only trajectories exhibit wider turns and more lateral deviations. Moreover, qualitative feedback from participants provides two key insights:

(1) The EgoExo trailing view helps operators anticipate turning points and align the ROV efficiently to identify pod markers with that peripheral information; and

(2) The 2.5D EgoExo++ map offers a global situational context, allowing operators to maintain a clear sense of “where am I?”, particularly during U-turns where no pipeline is visible.

Figure 7.

ROV trajectories for egocentric and EgoExo++ teleoperation are visually compared. Egocentric-only missions (red) exhibit wider turning radii and more lateral deviation (shaded red regions). In contrast, EgoExo++ trajectories (blue) are more compact and aligned with the ideal pipeline route.

Nevertheless, a commonly reported limitation of our system is perceptual disorientation caused by occasional latency in the augmented view. Aside from this effect, the operators preferred EgoExo++ for more confident ROV teleoperation.

5. Improved Underwater ROV Teleoperation: Strengths, Challenges, and Limitations

Multiple augmented viewpoints. We validate the utility of our proposed teleoperation interface through further experiments on underwater cave exploration data. Our expedition in cave segments at Devil’s Springs, Florida, reveals that when ROVs move slowly against strong currents, extending the exocentric viewpoint distance can significantly improve teleoperation. This is achieved by tuning the queue parameters r, c, and n in the proposed teleoperation interface. We consistently find that exocentric views are more informative, especially for about 5–10 seconds preceding the ROV position during navigation. The multiple preceding views offered by our interface are particularly useful for mapping large structures such as newly discovered cave segments or shipwrecks (Chatzispyrou et al., 2025, 2026; Eustice, 2005). As Figure 8 shows, the synthesized viewpoints provide more spatial context, enabling operators to control the ROV efficiently around complex underwater structures.

Figure 8.

EgoExo and EgoExo++ views are shown from field trials at two different cave systems. In EgoExo, the operator slides across the EOB distance f to find the preferred Exo view, for example, f = 100 for the first case. EgoExo++ further enables free 360° viewpoint control, allowing the ROV to be visualized from arbitrary perspectives such as top, back, front, and side views. A video demonstration is provided in the supplementary files; it can also be seen here: https://youtu.be/xpvnzIJ_YbM.

2D and 2.5D exocentric view. Our EgoExo pipeline synthesizes third-person views as flat 2D projections of the robot model onto a reference egocentric view from the past. The EgoExo++ advances from this purely image-based rendering to a 2.5D representation by recovering and texturing a dense ground surface in 3D space. This addition provides both altitude awareness and geometric context relative to the terrain, enabling operators to maintain safe clearance above uneven ground (Hing et al., 2010). Insights from our prior user study also emphasized the value of adjustable viewpoints, such as bird’s-eye or side perspectives. While earlier EOB viewpoints offered multiple exocentric views, they remained anchored to the robot’s trajectory and fixed reference frames. The EgoExo++ view is no longer tied to a fixed reference image; the virtual viewpoint can be freely adjusted, offering better situational awareness during ROV teleoperation (Lager et al., 2018).

Efficient teleoperation in complex missions. We conducted extensive field trials across multiple underwater cave systems, including Orange Grove, Devil’s Springs, and Peacock Springs, as well as inside a grotto system in Hudson, Florida. We observe that maneuvering the robot by following the caveline with egocentric views is challenging because little/no ambient light penetrates inside underwater caves. Despite using powerful lights, problems such as moving shadows and scattered waves create significant blind spots (Gupta et al., 2025). Consequently, tracking and following the caveline or any other navigation markers (Abdullah et al., 2024a) without any peripheral view is extremely disorienting to the operator. In some cases, we observe that the cavelines get blended with the texture and features of cave walls in noisy conditions; see Figure 9. In such scenarios, shifting the viewpoint to exocentric views allows easier identification of cavelines against the surrounding and overhead cave walls. Additionally, the augmented 3D map displays the robot’s pose, allowing much safer maneuvering of the vehicle to its desired orientation (Stewart et al., 2016).

Figure 9.

Three challenging scenarios are shown for ROV teleoperation inside underwater caves: (i) caveline is not visible, that is, blended with the background; (ii) caveline is not in the FOV; and (iii) front camera-light interactions with suspended particles are causing hazy egocentric views. In all cases, our augmented visuals are clearer and more informative to a surface operator.

Safer navigation in hazy low-light conditions. Underwater caves present a unique formation of silt and sediment on their floor that results from erosion over extended periods. The silt is susceptible to disturbance from external factors (Massone et al., 2024), such as the motion of underwater ROVs or the turbulence generated by their propellers. Although ROV operators pay close attention to avoid contact with the floor and cave walls, it is often unavoidable due to buoyancy imbalance and strong flow of water. Dislodging the sediments results in cloudy or hazy conditions that obscure visibility (Yu et al., 2023b). Bright lights from the ROV reflect from these suspended particles and make it even more challenging to capture clear imagery of the surroundings. In such cases, third-person EOB views from behind the ROV offer a clearer and more informative perspective for navigation, as shown in Figure 9. It improves spatial awareness and helps the operator to safely move away from the sediment formations toward open, accessible areas and avoid obstructing other scuba divers in the process (Abdullah et al., 2024a; Islam et al., 2024).

Operator-ROV shared autonomy. In subsea teleoperation, collaborative decision-making frameworks split responsibilities so that humans set high-level goals while the vehicle plans and autonomously executes low-level actions (Chen et al., 2025). EgoExo++ augmented visuals strengthen this “human-in-the-loop” paradigm by providing an interactive shared scene where operator intent (e.g., safe altitude, keep-out zones) can be directly expressed, and the ROV autonomously confirms execution. For instance, the 2.5D view helps the operator determine a safe ground clearance, which the ROV can then autonomously maintain. Additionally, higher-level human-robot interactions can be integrated in the EgoExo++ interface: the operator can draw an intended path directly on the exocentric view (Lee et al., 2016), then the ROV can plan and follow an optimal path accordingly. Beyond navigation, augmented visuals have high demand in shared telemanipulation tasks such as delicate object grasping (Chapman et al., 2008), valve control (Zhang et al., 2024), artifact collection (Bruno et al., 2018), etc. Our proposed 360° views provide the operator with critical situational cues in such tasks. For instance, a side-view perspective helps position the ROV with respect to the target, then the operator can switch to a close-up egocentric view for precise manipulation (Chen et al., 2025; Phung et al., 2024). Overall, EgoExo++ serves as a shared perceptual layer: it enhances situational awareness with explicit geometric cues and allows the operator to specify intent more precisely, promising a safer and more effective teleoperation and telemanipulation.

Digital Twins and shadows. A digital shadow creates a virtual replica of the robot and its environment, enabling operators to practice missions, rehearse manipulation tasks, and refine control strategies (Jones et al., 2020). While a shadow is a passive replica, a digital twin offers a bidirectional data pipeline and predictive simulation, thereby closing the feedback loop (Sjarov et al., 2020). In this context, EgoExo++ complements DT-based training by providing geometrically consistent exocentric perspectives and interactive Real2Sim 2.5D reconstructions derived from mission SLAM data. During rehearsal, such views allow operators to anticipate spatial challenges, practice navigation in cluttered cave-like terrains, and visualize manipulators reaching the target from third-person perspectives (Lim et al., 2022). EgoExo++ views can also be rendered on HMIs, where identical head motions may be mapped to different outcomes depending on the selected visualization mode (Candeloro et al., 2015). For instance, a head tilt in egocentric mode can directly control the ROV body, whereas the same movement in exocentric mode can control the virtual camera viewpoint, with no impact on the ROV. Rehearsing such multi-visual feedback and control mappings in high-fidelity simulator engines will significantly improve operator skills in high-risk, time-critical missions.

Challenges in ground estimation. Our field trials in diverse underwater caves and grotto systems reveal that the irregular and deceptive terrain poses several unique challenges in estimating the ground surface. As illustrated in Figure 10, elevation slope or sharp jumps may fall outside the fitting capability of plane detection algorithms, leading to fragmented or distorted ground estimation. Additionally, narrow passages create occlusions and limited visibility of the ground, causing gaps in both feature tracking and surface mapping. The presence of large protruding structures that appear ground-like in texture can lead to incorrect segmentation of the actual ground surface. These issues collectively challenge our EgoExo++ pipeline, occasionally resulting in an unreliable representation of the terrain.

Figure 10.

Challenges in estimating uneven ground surface in unstructured environments: terrain complexities, such as elevation changes, narrow passages, and misleading planar obstacles, hinder the accurate ground surface reconstruction.

SLAM dependency and failure statistics. Since the EgoExo++ framework depends on the pose and visual features estimated by SLAM, the pose uncertainty and noisy visual conditions affect EgoExo view generation and EgoExo++ 2.5D ground reconstruction. In poor visual conditions or during abrupt motion, SLAM may temporarily lose tracking, in which case our method suspends map generation and waits for successful relocalization. Importantly, the historical ground-referenced trajectory map remains visible and provides operators with an intuitive sense of “where am I?”, which helps them steer toward previously mapped, visually richer regions where relocalization is more likely. Overall, EgoExo++ transforms SLAM-derived data into an interactive representation that bridges egocentric and exocentric awareness, allowing operators to reason about both local maneuvering and global mission context.

However, successful relocalization remains the responsibility of the operator and the SLAM system; EgoExo++ only supports this process passively through improved situational awareness. It operates as a lightweight visualization layer that deterministically consumes SLAM pose and sparse map without modeling or propagating uncertainty. This design choice prioritizes real-time operation and simplicity of integration with existing teleoperation engines.

Across more than 1500 mission segments collected from our field trials in three different cave systems, SLAM tracking loss occurred in less than 5% of the segments, with successful relocalization achieved in roughly half of these events. The major causes are abrupt vehicle motion or collisions that drive the ROV too close to an obstacle, producing feature-deprived imagery and disrupting tracking; an example is illustrated in Figure 11. Across datasets, the terrain estimation method achieves an average success rate of approximately 92% (see Table 4), implying that 8% of segments are rejected due to insufficient visual support. In such cases, EgoExo++ omits unreliable ground patches and leaves a hole in the reconstructed scene rather than rendering incorrect geometry.

Figure 11.

An example of SLAM failure during aggressive pitch maneuvering is illustrated. A “nose-down” dive drives the ROV too close to the cave floor, causing the scene to become feature-deprived and leading to a SLAM tracking loss.

6. Conclusion and Future Work

This work presents an AR-based framework to synthesize exocentric camera views from egocentric feed in real-time for improved underwater ROV teleoperation. A pose geometry-based closed-form solution is formulated for the proposed EgoExo++ problem and then integrated into a visual SLAM backbone. The end-to-end pipeline only requires a sequence of past egocentric views to generate 2D/2.5D exocentric views with the accurate ROV model projected onto them. The proof-of-concept is validated by ground plane estimation and reprojection error analyses in a series of 2D indoor navigation experiments. Subsequent field experiments are conducted to demonstrate the effectiveness of 2.5D scene rendering in unstructured underwater cave scenarios. We validate the system through two subjective studies: one in simulation and one using real underwater datasets. These studies demonstrate improved system usability (SUS), reduced perceived workload (NASA-TLX), and quantitative gains in teleoperation performance, including more efficient paths and faster mission completion compared to egocentric-only baselines. We are currently exploring more comprehensive multi-sensor fusion-based underwater SLAM backbones, such as the SVIn2 (Rahman et al., 2022), for more accurate and robust estimation. We will further extend our simulation platform with interactive tools to enable advanced teleoperation studies such as confined-space navigation, SLAM-recovery behavior, close-up inspection, complex maneuvering around structures, etc.

Supplemental Material

Footnotes

Acknowledgments

The authors would like to acknowledge the help from Woodville Karst Plain Project (WKPP), El Centro Investigador del Sistema Acuífero de Quintana Roo A.C. (CINDAQ), Global Underwater Explorers (GUE), Ricardo Constantino, and Project Baseline in providing access to challenging underwater caves. The authors are also grateful for equipment support by Halcyon Dive Systems, Teledyne FLIR LLC, and KELDAN GmbH lights. We appreciate the participants of our user study for their time and valuable feedback during the evaluation process. Finally, we thank the anonymous reviewers for their insightful suggestions, which significantly improved this paper.

Adnan Abdullah

Md Jahidul Islam

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported in part by the NSF grants 2330416, 2534503, and 2545370.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online. The source packages are available at: https://github.com/uf-robopi/EgoExo. The demo video can be seen here: .

References

Abbas

Zisserman

(2019) A geometric approach to obtain A bird’s eye view from an image In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, pp. 4095–4104.

Abdullah

Barua

Tibbetts

, et al. (2024a) CaveSeg: deep semantic segmentation and scene parsing for autonomous underwater cave exploration In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3781–3788.

Abdullah

Blow

Chen

, et al. (2024b) Human-Machine Interfaces for Subsea Telerobotics: From soda-straw to Natural Language Interactions. ArXiv Preprint arXiv:2412.01753.

Abdullah

Chen

Rekleitis

, et al. (2024c) Ego-to-Exo: interfacing third person visuals from egocentric views in real-time for improved ROV teleoperation In: International Symposium on Robotics Research (ISRR).

Abdullah

Blow

Rampazzi

, et al. (2025a) Active localization of close-range adversarial acoustic sources for underwater data center surveillance. Under review at the IEEE Journal of Oceanic Engineering (JOE): 20122, ArXiv:2510.

Abdullah

Gupta

Ramesh

, et al. (2025b) Nemesys: Toward Online Underwater Exploration with Remote operator-in-the-loop Adaptive Autonomy. ArXiv Preprint arXiv:2507.11889.

Blow

Abdullah

Sheldon

, et al. (2025) Detection and localization of Acoustic vulnerabilities of underwater data centers for remote surveillance. In: Ocean Sensing and Monitoring XVII. SPIE, Vol. 13482, 72–82.

Brooke

(1996) SUS- A quick and dirty usability scale. Usability Evaluation in Industry 189(194): 4–7.

Bruno

Lagudi

Barbieri

, et al. (2018) Augmented reality visualization of scene depth for aiding ROV pilots in underwater manipulation. Ocean Engineering 168: 140–154. https://doi.org/10.1016/j.oceaneng.2018.09.007

10.

Buzzacott

Zeigler

Denoble

, et al. (2009) American cave diving fatalities 1969-2007. International Journal of Aquatic Research and Education 3(2): 7. https://doi.org/10.25035/ijare.03.02.07

11.

Cai

Zhang

(2020) Three-dimensional obstacle avoidance for autonomous underwater robot. IEEE Sensors Letters 4(11): 1–4. https://doi.org/10.1109/lsens.2020.3034309

12.

Campos

Elvira

Gómez

, et al. (2021) ORB-SLAM3: an accurate open-source library for visual, visual-inertial and multi-map SLAM. IEEE Transactions on Robotics 37(6): 1874–1890. https://doi.org/10.1109/tro.2021.3075644

13.

Candeloro

Valle

Miyazaki

, et al. (2015) HMD as a new tool for telepresence in underwater operations and closed-loop control of ROVs. In: OCEANS 2015-MTS/IEEE Washington. IEEE, pp. 1–8.

14.

Casper

Murphy

(2003) Human-robot interactions during the robot-assisted urban search and rescue response at the world trade center. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 33(3): 367–385. https://doi.org/10.1109/TSMCB.2003.811794

15.

Chapman

Roussel

Drap

, et al. (2008) Virtual exploration of underwater archaeological sites: visualization and interaction in mixed reality environments. In: International Symposium on Virtual Reality, Archaeology and Intelligent Cultural Heritage.

16.

Chatzispyrou

Horgan

Hwang

, et al. (2025) Mapping the Catacombs: An Underwater Cave Segment of the Devil’s Eye System. ArXiv preprint arXiv:2507.06397.

17.

Chatzispyrou

Horgan

Hwang

, et al. (2026) Mapping Pamir: multi-session visual/inertial SLAM and 3D reconstruction of an underwater shipwreck In: IEEE International Conference on Robotics and Automation (ICRA). Vienna, Austria. (accepted).

18.

Chen

Blow

Abdullah

, et al. (2025) SubSense: VR-haptic and motor feedback for immersive control in subsea telerobotics In: OCEANS 2025 - Great Lakes. IEEE OES, pp. 1–8.

19.

Elor

Thang

Hughes

, et al. (2021) Catching jellies in immersive virtual reality: a comparative teleoperation study of ROVs in underwater capture tasks. In: Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology, pp. 1–10.

20.

Erat

Isop

Kalkofen

, et al. (2018) Drone-augmented human vision: exocentric control for drones exploring hidden areas. IEEE Transactions on Visualization and Computer Graphics 24(4): 1437–1446. https://doi.org/10.1109/TVCG.2018.2794058

21.

Eustice

(2005) Large-Area Visually Augmented Navigation for Autonomous Underwater Vehicles. Massachusetts Institute of Technology. PhD Thesis.

22.

Ferland

Pomerleau

Le Dinh

, et al. (2009) Egocentric and exocentric teleoperation interface using real-time, 3D video projection. In: Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction, pp. 37–44.

23.

Fischler

Bolles

(1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6): 381–395. https://doi.org/10.1145/358669.358692

24.

Gawel

Lin

Koutros

, et al. (2018) Aerial-ground collaborative sensing: third-person view for teleoperation In: 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1–7.

25.

Girbes-Juan

Schettino

Demiris

, et al. (2020) Haptic and visual feedback assistance for dual-arm robot teleoperation in surface conditioning tasks. IEEE Transactions on Haptics 14(1): 44–56. https://doi.org/10.1109/TOH.2020.3004388

26.

Gupta

Abdullah

, et al. (2025) Demonstrating CavePI: autonomous exploration of underwater caves by semantic guidance In: Robotics: Science and Systems (RSS).

27.

Hart

Staveland

(1988) Development of NASA-TLX (task load index): results of empirical and theoretical research In: Advances in Psychology. Elsevier, Vol. volume 52, pp. 139–183.

28.

Hing

Sevcik

(2010) Development and evaluation of a chase view for UAV operations in cluttered environments. Journal of Intelligent and Robotic Systems 57: 485–503. https://doi.org/10.1007/s10846-009-9356-4

29.

Inoue

Takashima

Fujita

, et al. (2023) BirdViewAR: surroundings-Aware remote drone piloting using an augmented third-person perspective In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–19.

30.

Islam

(2024) Eye on the back: augmented visuals for improved ROV teleoperation in deep water surveillance and inspection Ocean Sensing and Monitoring XVI. USA: SPIE: Maryland, Vol. 13061, pp. 21–25.

31.

Islam

Sattar

(2021) Robot-to-Robot relative pose estimation using humans as markers. Autonomous Robots 45(4): 579–593. https://doi.org/10.1007/s10514-021-09985-6

32.

Islam

Quattrini Li

Girdhar

, et al. (2024) Computer vision applications in underwater robotics and oceanography. In: Computer Vision: Challenges, Trends, and Opportunities, pp. 173–196.

33.

Ito

Sato

Sugimoto

, et al. (2008) A teleoperation interface using past images for outdoor environment In: 2008 SICE Annual Conference. IEEE, pp. 3372–3375.

34.

Jangir

Hansen

Ghosal

, et al. (2022) Look closer: bridging egocentric and third-person views with transformers for robotic manipulation. IEEE Robotics and Automation Letters 7(2): 3046–3053. https://doi.org/10.1109/lra.2022.3144512

35.

Jin

Cho

Jiafeng

, et al. (2022) Hovering control of UUV through underwater object detection based on deep learning. Ocean Engineering 253: 111321. https://doi.org/10.1016/j.oceaneng.2022.111321

36.

Jones

Snider

Nassehi

, et al. (2020) Characterising the digital twin: a systematic literature review. CIRP Journal of Manufacturing Science and Technology 29: 36–52. https://doi.org/10.1016/j.cirpj.2020.02.002

37.

Joshi

Rahman

Kalaitzakis

, et al. (2019) Experimental Comparison of Open Source visual-inertial-based State Estimation Algorithms in the Underwater Domain. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 7221–7227.

38.

Joshi

Xanthidis

Roznere

, et al. (2022) Underwater exploration and mapping In: IEEE OES AUV Symposium. Singapore, pp. 1–7.

39.

Kazerouni

Fitzgerald

Dooly

, et al. (2022) A survey of state-of-the-art on visual SLAM. Expert Systems with Applications 205: 117734. https://doi.org/10.1016/j.eswa.2022.117734

40.

Kennedy

Cantwell

Malik

, et al. (2019) The unknown and the unexplored: insights into the Pacific deep-sea following NOAA CAPSTONE expeditions. Frontiers in Marine Science 6: 480. https://doi.org/10.3389/fmars.2019.00480

41.

Konoplin

Filaretov

(2019) Development of intellectual support system for ROV operators. In: IOP Conference Series: Earth and Environmental Science: IOP Publishing, volume 272. 032101. https://doi.org/10.1088/1755-1315/272/3/032101

42.

Lager

Topp

Malec

(2018) Remote operation of unmanned surface vessel through virtual Reality-A low cognitive load approach In: Proceedings of the 1st International Workshop on Virtual, Augmented, and Mixed Reality for HRI (VAM-HRI).

43.

Lago

Neves

Ventura

, et al. (2024) Visual-inertial odometry for metric-scale mapping of underwater caves In: 2024 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC). IEEE, pp. 183–188.

44.

Lee

Schachter

(1980) Two algorithms for constructing A delaunay triangulation. International Journal of Computer & Information Sciences 9(3): 219–242. https://doi.org/10.1007/bf00977785

45.

Lee

Mehmood

Ryu

(2016) Development of the human interactive autonomy for the shared teleoperation of Mobile robots In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 1524–1529.

46.

Lensgraf

Balkcom

Quattrini Li

(2023) Buoyancy Enabled Autonomous Underwater Construction with Cement Blocks. IEEE International Conference on Robotics and Automation (ICRA), 5207–5213.

47.

, et al. (2023) BEVDepth: acquisition of reliable depth for multi-view 3d object detection. Proceedings of the AAAI Conference on Artificial Intelligence 37: 1477–1485. https://doi.org/10.1609/aaai.v37i2.25233

48.

Wang

, et al. (2024) BEVFormer: learning Bird’s-Eye-View representation from lidar-camera via spatiotemporal transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2024.3515454

49.

Lim

Huang

Chen

, et al. (2022) Real2sim2real: self-supervised learning of physical single-step dynamic actions for planar robot casting In: 2022 International Conference on Robotics and Automation (ICRA). IEEE, pp. 8282–8289.

50.

Livatino

Guastella

Muscato

, et al. (2021) Intuitive robot teleoperation through multi-sensor informed mixed reality visual aids. IEEE Access 9: 25795–25808. https://doi.org/10.1109/access.2021.3057808

51.

Luo

Zhu

Zhai

, et al. (2024) Intention-Driven Ego-to-Exo Video Generation. ArXiv Preprint arXiv:2403.09194.

52.

Macario Barros

Michel

Moline

, et al. (2022) A comprehensive survey of visual SLAM algorithms. Robotics 11(1): 24. https://doi.org/10.3390/robotics11010024

53.

Manderson

Shkurti

Dudek

(2016) Texture-aware SLAM using stereo imagery and inertial information In: 2016 13th Conference on Computer and Robot Vision (CRV). IEEE, pp. 456–463.

54.

Manjunatha

Selvakumar

Godeswar

, et al. (2018) A low cost underwater robot with grippers for visual inspection of external pipeline surface. Procedia Computer Science 133: 108–115. https://doi.org/10.1016/j.procs.2018.07.014

55.

Massone

Druon

Triboulet

(2024) A novel 3D reconstruction sensor using a diving lamp and a camera for underwater cave exploration. Sensors (Basel, Switzerland) 24(12): 4024. https://doi.org/10.3390/s24124024

56.

Islam

Sattar

(2021) Fast direct stereo visual SLAM. IEEE Robotics and Automation Letters 7(2): 778–785. https://doi.org/10.1109/lra.2021.3133860

57.

Mohammadi

Huang

Barua

, et al. (2023) Caveline detection at the edge for autonomous underwater cave exploration and mapping In: IEEE International Conference on Machine Learning and Applications (ICMLA). USA: FL, pp. 1392–1398. Jacksonville.

58.

Murata

Songtong

Mizumoto

, et al. (2014) Teleoperation System Using past Image Records for Mobile Manipulator. IEEE/RSJ International Conference on Intelligent Robots and Systems, 4340–4345.

59.

Nagatani

Kiribayashi

Okada

, et al. (2011) Redesign of rescue Mobile robot quince In: 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, pp. 13–18.

60.

Nguyen

Bualat

Edwards

, et al. (2001) Virtual reality interfaces for visualization and control of remote vehicles. Autonomous Robots 11: 59–68. https://doi.org/10.1023/a:1011208212722

61.

Okura

Ueda

Sato

, et al. (2013) Teleoperation of Mobile robots by generating augmented free-viewpoint images In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 665–671.

62.

Olson

(2011) AprilTag: a robust and flexible visual fiducial system In: 2011 IEEE International Conference on Robotics and Automation. IEEE, pp. 3400–3407.

63.

Palomeras

Hurtós

Vidal

, et al. (2019) Autonomous exploration of complex underwater environments using a probabilistic next-best-view planner. IEEE Robotics and Automation Letters 4(2): 1619–1625. https://doi.org/10.1109/lra.2019.2896759

64.

Phung

Billings

Daniele

, et al. (2024) A shared autonomy system for precise and efficient remote underwater manipulation. IEEE Transactions on Robotics 40: 4147–4159.

65.

Rahman

Quattrini Li

Rekleitis

(2022) SVIn2: a multi-sensor fusion-based underwater SLAM system. International Journal of Robotics Research 41(11-12): 1022–1042. https://doi.org/10.1177/02783649221110259

66.

Reiher

Lampe

Eckstein

(2020) A Sim2real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to A semantically segmented image in bird’s eye view In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, pp. 1–7.

67.

Rumson

(2021) The application of fully unmanned robotic systems for inspection of subsea pipelines. Ocean Engineering 235: 109214. https://doi.org/10.1016/j.oceaneng.2021.109214

68.

Saakes

Choudhary

Sakamoto

, et al. (2013) A teleoperating interface for ground vehicles using autonomous flying cameras In: 2013 23rd International Conference on Artificial Reality and Telexistence (ICAT), pp. 13–19.

69.

Samani

Tao

Dasari

, et al. (2023) F2BEV: bird’s eye view generation from surround-view fisheye camera images for automated driving In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 9367–9374.

70.

Sato

Moro

Sugahara

, et al. (2013) Spatio-temporal bird’s-Eye view images using multiple fish-eye cameras In: Proceedings of the 2013 IEEE/SICE International Symposium on System Integration, pp. 753–758.

71.

Shiroma

Sato

Chiu

, et al. (2004) Study on effective camera images for Mobile robot teleoperation In: IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN), pp. 107–112.

72.

Siegel

Stone

Richmond

(2023) Robotic survey and 3-D mapping of underwater caves using a SUNFISH® autonomous underwater vehicle. LPI Contribution 2697: 1037.

73.

Sjarov

Lechler

Fuchs

, et al. (2020) The digital twin concept in Industry–A review and systematization. In: 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)IEEE, Vol. 1, 1789–1796. https://doi.org/10.1109/etfa46521.2020.9212089

74.

Stewart

Ryden

Cox

(2016) An interactive interface for multi-pilot ROV intervention In: OCEANS 2016-Shanghai. IEEE, pp. 1–6.

75.

Thatipelli

Roy-Chowdhury

(2025) Egocentric and exocentric methods: a short survey. In: Computer Vision and Image Understanding, p. 104371.

76.

Thomason

Ratsamee

Orlosky

, et al. (2019) A comparison of adaptive view techniques for exploratory 3D drone teleoperation. ACM Transactions on Interactive Intelligent Systems 9: 1–19. https://doi.org/10.1145/3232232

77.

Wang

Devin

Cai

, et al. (2019) Monocular plan view networks for autonomous driving In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 2876–2883.

78.

Wishnak

(2022) New frontiers in ocean exploration: the ocean exploration trust. In: NOAA Ocean Exploration, and Schmidt Ocean Institute 2021 Field Season.

79.

Woods

Tittle

Feil

, et al. (2004) Envisioning human-robot coordination in future operations. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 34(2): 210–218. https://doi.org/10.1109/tsmcc.2004.826272

80.

Islam

(2023) 3D reconstruction of underwater scenes using nonlinear domain projection In: 2023 IEEE Conference on Artificial Intelligence (CAI). IEEE, pp. 359–361. Best Paper Award.

81.

Xia

McSweeney

Wen

, et al. (2022) Virtual telepresence for the future of ROV teleoperations: opportunities and challenges In: SNAME Offshore Symposium. SNAME, p. D011S001R001.

82.

Xia

McSweeney

Song

, et al. (2023a) ROV teleoperation based on sensory augmentation and digital twins In: Offshore Technology Conference. OTC, p. D031S041R004.

83.

Xia

Song

, et al. (2023b) Sensory augmentation for subsea robot teleoperation. Computers in Industry 145: 103836. https://doi.org/10.1016/j.compind.2022.103836

84.

Xia

You

(2023c) Visual-haptic feedback for ROV subsea navigation control. Automation in Construction 154: 104987. https://doi.org/10.1016/j.autcon.2023.104987

85.

Yoon

Park

Ahn

(2025) Learning viewpoint control from human-initiated transitions for teleoperation in construction. Advanced Engineering Informatics 68: 103665. https://doi.org/10.1016/j.aei.2025.103665

86.

Tibbetts

Barua

, et al. (2023a) Weakly supervised caveline detection for AUV navigation inside underwater caves In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 9933–9940.

87.

Islam

(2023b) UDepth: fast monocular depth estimation for visually-guided underwater robots In: IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 3116–3123.

88.

Zhang

Xia

, et al. (2024) The adaptive bilateral control of underwater manipulator teleoperation system with uncertain parameters and external disturbance. Electronics 13(6): 1122. https://doi.org/10.3390/electronics13061122

89.

Zhou

Xia

, et al. (2023) Embodied robot teleoperation based on high-fidelity visual-haptic simulator: pipe-fitting example. Journal of Construction Engineering and Management 149(12): 04023129. https://doi.org/10.1061/jcemd4.coeng-13916

90.

Zhu

Yin

Shi

, et al. (2018) Generative adversarial frontal view to bird view synthesis In: 2018 International Conference on 3D Vision (3DV). IEEE, pp. 454–463.

91.

Zollmann

Hoppe

Langlotz

, et al. (2014) FlyAR: augmented reality supported micro aerial vehicle navigation. IEEE Transactions on Visualization and Computer Graphics 20(4): 560–568. https://doi.org/10.1109/TVCG.2014.24

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB