Demonstration Video/Images

Video Recording Requirements

  • Use HD camera (1080p or higher resolution recommended)
  • Ensure stable shooting environment (use tripod for fixation)
  • Maintain consistent shooting angle and lighting conditions
  • Frame rate should be at least 30fps

Expert Demonstration Standards

  • Should include complete task workflows
  • Each action must be clearly visible
  • Recommend shooting key actions from multiple angles
  • Can add close-up shots to show detailed operations

Data Preprocessing

  • Videos need to be segmented into keyframe sequences
  • Recommend annotating timestamps and action labels
  • Can perform image enhancement processing (denoising, contrast adjustment)
  • May require background removal or object detection processing

Storage Formats

  • Video: MP4 or AVI
  • Image sequences: PNG or JPEG
  • Store metadata accordingly

Motion Trajectories

Articulated Robotic Arms

  1. Joint Space Trajectories: Record joint angle values at each timestep (θ1, θ2, …, θn)
  2. Cartesian Space Trajectories: Record end-effector pose (position x, y, z and orientation Rx, Ry, Rz)
  3. Velocity/Acceleration Trajectories

Mobile Robots

  1. Position Trajectories: x, y coordinate changes
  2. Orientation Trajectories: heading angle changes
  3. Velocity Trajectories: linear velocity and angular velocity

Trajectory Data Acquisition Methods

Human Teleoperation Collection:

  • Force feedback controllers
  • VR device control
  • Teach pendant programming

Simulation Generation:

  • Motion planning algorithms (RRT, PRM)
  • Physics engine simulation (Gazebo, Unity)
  • Machine learning generation

State-Action Pairs

State Representation

  • Sensor Data: IMU readings, force/torque sensor values
  • Environment Perception: RGB/RGBD images, LiDAR point clouds
  • Proprioceptive State: Joint angles, end-effector poses
  • Other Modalities: Voice commands, tactile feedback

Action Representation

  • Low-level Control: Joint torque commands, velocity commands
  • High-level Commands: Cartesian space end-effector poses, gripper force
  • Discrete Actions: Switch commands, mode switching commands

Data Collection Standards

  • Time synchronization: All sensors need strict time alignment (error <1ms)
  • Data augmentation: Add random lighting variations to images, add Gaussian noise to joint angles

Language Instructions

Collection Methods

  1. Manually编写指令
  2. Real-time voice recording (speech-to-text)
  3. Multi-language annotation

High-Quality Instruction Elements

  • Object features (color, shape and other visual attributes)
  • Spatial relationships (directional words like “left side”, “above”)
  • Action details (“pick up slowly”, “rotate 90 degrees”)
  • Environmental context (“in the kitchen area”, “avoid obstacles”)

Other

Reward Function Feedback

  • Numerical scores
  • Binary feedback (success/failure)
  • Multi-level ratings

Human Preference Comparisons

  • Compare two segments of policy execution effects
  • Common dimensions: execution efficiency, safety degree, human-like performance

Common Datasets

  • Open X-Embodiment: 22 robot types, 1 million+ trajectories
  • RoboNet
  • RLBench
  • D4RL