FlashSLAM

FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting

Under submission

Abstract

We present FlashSLAM, a novel SLAM approach that leverages 3D Gaussian Splatting for efficient and robust 3D scene reconstruction. Existing 3DGS-based SLAM methods often fall short in sparse view settings and during large camera movements due to their reliance on gradient descent-based optimization, which is both slow and inaccurate. FlashSLAM addresses these limitations by combining 3DGS with a fast vision-based camera tracking technique, utilizing a pretrained feature matching model and point cloud registration for precise pose estimation in under 80 ms - a 90% reduction in tracking time compared to SplaTAM - without costly iterative rendering. In sparse settings, our method achieves up to a 92% improvement in average tracking accuracy over previous methods. Additionally, it accounts for noise in depth sensors, enhancing robustness when using unspecialized devices such as smartphones. Extensive experiments show that FlashSLAM performs reliably across both sparse and dense settings, in synthetic and real-world environments. Evaluations on benchmark datasets highlight its superior accuracy and efficiency, establishing FlashSLAM as a versatile and high-performance solution for SLAM, advancing the state-of-the-art in 3D reconstruction across diverse applications.

Architecture overview

Our approach takes RGB-D inputs to perform accurate 3D scene reconstruction. Initially, precise matches between consecutive frames are detected, which enables tracking of the camera pose through a rigid transformation. This pose is further refined using gradient-based optimization, leveraging Gaussian alignment to ensure accurate registration of new frames with the existing 3D model. The mapping process updates and transforms existing Gaussian splats in the 3D scene, producing high-quality reconstructions with efficient alignment and optimization steps.

RELICA

Room 0 (r0)

Room 1 (r1)

Room 2 (r2)

Office 0 (o0)

Office 1 (o1)

Office 2 (o2)

Office 3 (o3)

Office 4 (o4)

TUM RGB-D

Freiburg1 desk (fr1/ desk)

Freiburg1 desk2 (fr1/ desk2)

Freiburg1 room (fr1/ room)

Freiburg2 xyz (fr2/ xyz)

Freiburg3 long office (fr3/ office)

ScanNet++

8b5caf3398

b20a261fdf

Self-captured dataset

Visual comparison

**Rendering comparison on TUM fr1/desk**
SplaTAM	MonoGS	Ours	GT

**Rendering comparison on TUM fr1/xyz**
SplaTAM	MonoGS	Ours	GT

**Rendering comparison on TUM fr1/office**
SplaTAM	MonoGS	Ours	GT

Additional tracking results

Tracking performance on the ScanNet dataset, reported using the Absolute Trajectory Error (ATE) metric (cm).
Methods	0000	0059	0106	0169	0181	0207	Avg.
Vox-Fusion	68.84	24.18	8.41	27.28	23.30	9.41	26.90
NICE-SLAM	12.00	14.00	7.90	10.90	13.40	6.20	10.73
Point-SLAM	10.24	7.81	8.65	22.16	14.77	9.54	12.20
SplaTAM	12.83	10.10	17.72	12.08	11.10	7.46	11.88
Ours	9.86	7.80	8.02	14.25	13.36	8.71	10.33

Tracking time on the Replica dataset, measured in seconds.
Methods	r0	r1	r2	o0	o1	o2	o3	o4	Avg.
SplaTAM	2.07	2.03	1.68	1.87	1.70	1.36	2.03	2.29	1.88
MonoGS	1.27	1.16	1.16	1.07	0.83	1.00	1.10	1.06	1.08
Ours	0.54	0.48	0.52	0.43	0.42	0.49	0.54	0.46	0.48