Demonstration Video/Images
Video Recording Requirements
- Use HD camera (1080p or higher resolution recommended)
- Ensure stable shooting environment (use tripod for fixation)
- Maintain consistent shooting angle and lighting conditions
- Frame rate should be at least 30fps
Expert Demonstration Standards
- Should include complete task workflows
- Each action must be clearly visible
- Recommend shooting key actions from multiple angles
- Can add close-up shots to show detailed operations
Data Preprocessing
- Videos need to be segmented into keyframe sequences
- Recommend annotating timestamps and action labels
- Can perform image enhancement processing (denoising, contrast adjustment)
- May require background removal or object detection processing
Storage Formats
- Video: MP4 or AVI
- Image sequences: PNG or JPEG
- Store metadata accordingly
Motion Trajectories
Articulated Robotic Arms
- Joint Space Trajectories: Record joint angle values at each timestep (θ1, θ2, …, θn)
- Cartesian Space Trajectories: Record end-effector pose (position x, y, z and orientation Rx, Ry, Rz)
- Velocity/Acceleration Trajectories
Mobile Robots
- Position Trajectories: x, y coordinate changes
- Orientation Trajectories: heading angle changes
- Velocity Trajectories: linear velocity and angular velocity
Trajectory Data Acquisition Methods
Human Teleoperation Collection:
- Force feedback controllers
- VR device control
- Teach pendant programming
Simulation Generation:
- Motion planning algorithms (RRT, PRM)
- Physics engine simulation (Gazebo, Unity)
- Machine learning generation
State-Action Pairs
State Representation
- Sensor Data: IMU readings, force/torque sensor values
- Environment Perception: RGB/RGBD images, LiDAR point clouds
- Proprioceptive State: Joint angles, end-effector poses
- Other Modalities: Voice commands, tactile feedback
Action Representation
- Low-level Control: Joint torque commands, velocity commands
- High-level Commands: Cartesian space end-effector poses, gripper force
- Discrete Actions: Switch commands, mode switching commands
Data Collection Standards
- Time synchronization: All sensors need strict time alignment (error <1ms)
- Data augmentation: Add random lighting variations to images, add Gaussian noise to joint angles
Language Instructions
Collection Methods
- Manually编写指令
- Real-time voice recording (speech-to-text)
- Multi-language annotation
High-Quality Instruction Elements
- Object features (color, shape and other visual attributes)
- Spatial relationships (directional words like “left side”, “above”)
- Action details (“pick up slowly”, “rotate 90 degrees”)
- Environmental context (“in the kitchen area”, “avoid obstacles”)
Other
Reward Function Feedback
- Numerical scores
- Binary feedback (success/failure)
- Multi-level ratings
Human Preference Comparisons
- Compare two segments of policy execution effects
- Common dimensions: execution efficiency, safety degree, human-like performance
Common Datasets
- Open X-Embodiment: 22 robot types, 1 million+ trajectories
- RoboNet
- RLBench
- D4RL