AI Research #118: Embodied AI Mobile-ALOHA - Mobile + Dua...

Core Technology

Hardware Architecture

Mobile-ALOHA represents a significant advancement in mobile manipulation systems by combining a mobile base with dual collaborative robotic arms. The system features dual 7-DOF (Degrees of Freedom) robotic arms, an omnidirectional wheel-driven chassis, and a 16-joint force feedback system enabling precise control and safe human-robot interaction.

Data Collection

The system employs a “shadow mode” teleoperation approach, where the operator is physically bonded to the system. The operator controls the robot by backdriving the mobile base while manipulating the arms. This approach synchronously records multi-modal data including RGB-D visual feeds, joint angles, force feedback, and base odometry—providing rich training data for imitation learning algorithms.

Training Method

The “Dynamic-Static Co-Training” algorithm represents the key innovation of Mobile-ALOHA. It combines 50 dynamic demonstration data episodes from Mobile-ALOHA with over 20,000 static manipulation episodes from the original ALOHA system. This collaborative training approach enables the system to leverage both mobile manipulation data and extensive static manipulation demonstrations, achieving superior performance on long-horizon tasks.

Typical Tasks

Scenario	Success Rate
Kitchen	87%
Office	92%
Household	83%

Working Principles

Whole-body Teleoperation

The operator is physically connected to the system through a harness mechanism. Movement of the operator directly translates to robot movement—the operator literally drags the mobile base while manipulating the dual arms. This intuitive control scheme enables natural skill transfer from human demonstrations to robot policies.

Perception and Observation

The sensory system consists of multiple RGB cameras positioned for comprehensive environment coverage:

Two wrist-mounted cameras for close-up manipulation viewing
One top-down camera for workspace overview
Joint state sensors for real-time position and velocity feedback

Policy Learning

The system employs supervised Behavior Cloning, where a neural network learns to predict future action chunks (Action Chunks) from observed states. This approach, implemented through the ACT (Action Chunking Transformer) algorithm, enables the robot to execute complex multi-step tasks by predicting sequences of actions rather than individual movements.

Hardware Configuration

Dual Arms: Interbotix ViperX-300 (Robotis Dynamixel XM430 series)
Mobile Base: AgileX Tracer differential drive
Complete System Cost: Approximately $32,000

The relatively low cost compared to commercial alternatives makes Mobile-ALOHA accessible for research laboratories and educational institutions, democratizing advanced robotics research.

Key Resources

Project Homepage: https://mobile-aloha.github.io/
Paper: CoRL 2024
Code: https://github.com/MarkFzp/mobile-aloha
Training Code: act-plus-plus
Commercial Kit: Trossen Robotics

Technical Significance

Mobile-ALOHA represents a breakthrough in practical robot learning by demonstrating that effective mobile manipulation can be achieved through imitation learning with relatively small datasets (50 episodes). The system’s success validates several important principles:

Data Efficiency: Combining dynamic and static data can compensate for limited demonstration collection
Whole-body Coordination: Mobile base manipulation is learnable through behavior cloning
Low-cost Hardware: Sub-$50k systems can achieve useful manipulation capabilities
Open Source: Community-driven development accelerates research progress

The project has enabled researchers worldwide to pursue mobile manipulation research without requiring expensive custom hardware or extensive simulation-to-real transfer pipelines.