Core Technology
Hardware Architecture
Mobile-ALOHA represents a significant advancement in mobile manipulation systems by combining a mobile base with dual collaborative robotic arms. The system features dual 7-DOF (Degrees of Freedom) robotic arms, an omnidirectional wheel-driven chassis, and a 16-joint force feedback system enabling precise control and safe human-robot interaction.
Data Collection
The system employs a “shadow mode” teleoperation approach, where the operator is physically bonded to the system. The operator controls the robot by backdriving the mobile base while manipulating the arms. This approach synchronously records multi-modal data including RGB-D visual feeds, joint angles, force feedback, and base odometry—providing rich training data for imitation learning algorithms.
Training Method
The “Dynamic-Static Co-Training” algorithm represents the key innovation of Mobile-ALOHA. It combines 50 dynamic demonstration data episodes from Mobile-ALOHA with over 20,000 static manipulation episodes from the original ALOHA system. This collaborative training approach enables the system to leverage both mobile manipulation data and extensive static manipulation demonstrations, achieving superior performance on long-horizon tasks.
Typical Tasks
| Scenario | Success Rate |
|---|---|
| Kitchen | 87% |
| Office | 92% |
| Household | 83% |
Working Principles
Whole-body Teleoperation
The operator is physically connected to the system through a harness mechanism. Movement of the operator directly translates to robot movement—the operator literally drags the mobile base while manipulating the dual arms. This intuitive control scheme enables natural skill transfer from human demonstrations to robot policies.
Perception and Observation
The sensory system consists of multiple RGB cameras positioned for comprehensive environment coverage:
- Two wrist-mounted cameras for close-up manipulation viewing
- One top-down camera for workspace overview
- Joint state sensors for real-time position and velocity feedback
Policy Learning
The system employs supervised Behavior Cloning, where a neural network learns to predict future action chunks (Action Chunks) from observed states. This approach, implemented through the ACT (Action Chunking Transformer) algorithm, enables the robot to execute complex multi-step tasks by predicting sequences of actions rather than individual movements.
Hardware Configuration
- Dual Arms: Interbotix ViperX-300 (Robotis Dynamixel XM430 series)
- Mobile Base: AgileX Tracer differential drive
- Complete System Cost: Approximately $32,000
The relatively low cost compared to commercial alternatives makes Mobile-ALOHA accessible for research laboratories and educational institutions, democratizing advanced robotics research.
Key Resources
- Project Homepage: https://mobile-aloha.github.io/
- Paper: CoRL 2024
- Code: https://github.com/MarkFzp/mobile-aloha
- Training Code: act-plus-plus
- Commercial Kit: Trossen Robotics
Technical Significance
Mobile-ALOHA represents a breakthrough in practical robot learning by demonstrating that effective mobile manipulation can be achieved through imitation learning with relatively small datasets (50 episodes). The system’s success validates several important principles:
- Data Efficiency: Combining dynamic and static data can compensate for limited demonstration collection
- Whole-body Coordination: Mobile base manipulation is learnable through behavior cloning
- Low-cost Hardware: Sub-$50k systems can achieve useful manipulation capabilities
- Open Source: Community-driven development accelerates research progress
The project has enabled researchers worldwide to pursue mobile manipulation research without requiring expensive custom hardware or extensive simulation-to-real transfer pipelines.