FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting

Under submission

Anonymous authors

Abstract

We present FlashSLAM, a novel SLAM approach that leverages 3D Gaussian Splatting for efficient and robust 3D scene reconstruction. Existing 3DGS-based SLAM methods often fall short in sparse view settings and during large camera movements due to their reliance on gradient descent-based optimization, which is both slow and inaccurate. FlashSLAM addresses these limitations by combining 3DGS with a fast vision-based camera tracking technique, utilizing a pretrained feature matching model and point cloud registration for precise pose estimation in under 80 ms - a 90% reduction in tracking time compared to SplaTAM - without costly iterative rendering. In sparse settings, our method achieves up to a 92% improvement in average tracking accuracy over previous methods. Additionally, it accounts for noise in depth sensors, enhancing robustness when using unspecialized devices such as smartphones. Extensive experiments show that FlashSLAM performs reliably across both sparse and dense settings, in synthetic and real-world environments. Evaluations on benchmark datasets highlight its superior accuracy and efficiency, establishing FlashSLAM as a versatile and high-performance solution for SLAM, advancing the state-of-the-art in 3D reconstruction across diverse applications.

Architecture overview

Our approach takes RGB-D inputs to perform accurate 3D scene reconstruction. Initially, precise matches between consecutive frames are detected, which enables tracking of the camera pose through a rigid transformation. This pose is further refined using gradient-based optimization, leveraging Gaussian alignment to ensure accurate registration of new frames with the existing 3D model. The mapping process updates and transforms existing Gaussian splats in the 3D scene, producing high-quality reconstructions with efficient alignment and optimization steps.

RELICA

Room 0 (r0)
Room 1 (r1)
Room 2 (r2)
Office 0 (o0)
Office 1 (o1)
Office 2 (o2)
Office 3 (o3)
Office 4 (o4)

TUM RGB-D

Freiburg1 desk (fr1/ desk)
Freiburg1 desk2 (fr1/ desk2)
Freiburg1 room (fr1/ room)
Freiburg2 xyz (fr2/ xyz)
Freiburg3 long office (fr3/ office)

ScanNet++

8b5caf3398
b20a261fdf

Self-captured dataset

Visual comparison

Rendering comparison on TUM fr1/desk
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
Rendering comparison on TUM fr1/xyz
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
Rendering comparison on TUM fr1/office
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT

Additional tracking results

Tracking performance on the ScanNet dataset, reported using the Absolute Trajectory Error (ATE) metric (cm).
Methods 0000 0059 0106 0169 0181 0207 Avg.
Vox-Fusion 68.84 24.18 8.41 27.28 23.30 9.41 26.90
NICE-SLAM 12.00 14.00 7.90 10.90 13.40 6.20 10.73
Point-SLAM 10.24 7.81 8.65 22.16 14.77 9.54 12.20
SplaTAM 12.83 10.10 17.72 12.08 11.10 7.46 11.88
Ours 9.86 7.80 8.02 14.25 13.36 8.71 10.33
Tracking time on the Replica dataset, measured in seconds.
Methods r0 r1 r2 o0 o1 o2 o3 o4 Avg.
SplaTAM 2.07 2.03 1.68 1.87 1.70 1.36 2.03 2.29 1.88
MonoGS 1.27 1.16 1.16 1.07 0.83 1.00 1.10 1.06 1.08
Ours 0.54 0.48 0.52 0.43 0.42 0.49 0.54 0.46 0.48
Tracking time on the TUM dataset, measured in seconds.
Methods fr1/desk fr1/desk2 fr1/room fr2/xyz fr3/office Avg.
SplaTAM 3.46 3.02 3.57 3.85 5.16 3.81
MonoGS 0.89 0.93 0.89 0.59 0.81 0.82
Ours 0.50 0.48 0.50 0.51 0.52 0.50

Self-captured dataset

Rendering comparison on a self-captured dataset captured with an iPhone camera.
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT
SplaTAM MonoGS Ours GT

Novel view synthesis

Novel view synthesis results with depth for scene b20a261fdf from the ScanNet++ dataset. The left columns display RGB images, and the right columns show the corresponding depth maps.
SplaTAM 8_s2_splatam 8_s2_depth_splatam 9_s2_splatam 9_s2_depth_splatam 10_s2_splatam 10_s2_depth_splatam
Ours 8_s2_flsl 8_s2_depth_flsl 9_s2_flsl 9_s2_depth_flsl 10_s2_flsl 10_s2_depth_flsl
GT 8_s2_gt 8_s2_gt_depth 9_s2_gt 9_s2_gt_depth 10_s2_gt 10_s2_gt_depth