AI Investigation #98: Visual SLAM - ORB-SLAM, RTAB-Map, V...

10/9/2025

artificial-intelligence ai robotics embodied-ai deep-learning machine-learning SLAM

Visual SLAM Overview

Uses camera as core sensor to achieve SLAM without LiDAR.

Technical Process

Sensor input (monocular, stereo, RGB-D)
Frontend (visual odometry):
- Feature point method (ORB-SLAM)
- Direct method (LSD-SLAM, DSO)
Backend optimization (g2o, GTSAM, EKF)
Loop closure detection (DBoW2)
Mapping (sparse/dense)

Classic Solutions

ORB-SLAM Series

Feature point V-SLAM
Supports monocular/stereo/RGB-D
Centimeter-level positioning accuracy

RTAB-Map

Real-time appearance-based mapping
Innovative memory management (WM/STM/LTM architecture)

VINS-Fusion

Tightly coupled visual-inertial SLAM
Sliding window optimization
Loop closure detection
Multi-sensor fusion

LSD-SLAM / DSO

Direct method SLAM

OpenVSLAM / ORB-SLAM3

Multi-map system
Improved IMU fusion

Application Scenarios

Indoor robots
AR/VR
UAVs
Service robot navigation
Autonomous driving visual positioning module

Technical Challenges

Dynamic object interference
Weak texture scenes
Real-time performance vs accuracy balance