AI Investigation #105: Robot Learning Data Collection - F...

10/16/2025

artificial-intelligence robotics embodied-ai deep-learning machine-learning ai system-architecture

Demonstration Video/Images

Video Recording Requirements

Use HD camera (1080p or higher resolution recommended)
Ensure stable shooting environment (use tripod for fixation)
Maintain consistent shooting angle and lighting conditions
Frame rate should be at least 30fps

Expert Demonstration Standards

Should include complete task workflows
Each action must be clearly visible
Recommend shooting key actions from multiple angles
Can add close-up shots to show detailed operations

Data Preprocessing

Videos need to be segmented into keyframe sequences
Recommend annotating timestamps and action labels
Can perform image enhancement processing (denoising, contrast adjustment)
May require background removal or object detection processing

Storage Formats

Video: MP4 or AVI
Image sequences: PNG or JPEG
Store metadata accordingly

Motion Trajectories

Articulated Robotic Arms

Joint Space Trajectories: Record joint angle values at each timestep (θ1, θ2, …, θn)
Cartesian Space Trajectories: Record end-effector pose (position x, y, z and orientation Rx, Ry, Rz)
Velocity/Acceleration Trajectories

Mobile Robots

Position Trajectories: x, y coordinate changes
Orientation Trajectories: heading angle changes
Velocity Trajectories: linear velocity and angular velocity

Trajectory Data Acquisition Methods

Human Teleoperation Collection:

Force feedback controllers
VR device control
Teach pendant programming

Simulation Generation:

Motion planning algorithms (RRT, PRM)
Physics engine simulation (Gazebo, Unity)
Machine learning generation

State-Action Pairs

State Representation

Sensor Data: IMU readings, force/torque sensor values
Environment Perception: RGB/RGBD images, LiDAR point clouds
Proprioceptive State: Joint angles, end-effector poses
Other Modalities: Voice commands, tactile feedback

Action Representation

Low-level Control: Joint torque commands, velocity commands
High-level Commands: Cartesian space end-effector poses, gripper force
Discrete Actions: Switch commands, mode switching commands

Data Collection Standards

Time synchronization: All sensors need strict time alignment (error <1ms)
Data augmentation: Add random lighting variations to images, add Gaussian noise to joint angles

Language Instructions

Collection Methods

Manually编写指令
Real-time voice recording (speech-to-text)
Multi-language annotation

High-Quality Instruction Elements

Object features (color, shape and other visual attributes)
Spatial relationships (directional words like “left side”, “above”)
Action details (“pick up slowly”, “rotate 90 degrees”)
Environmental context (“in the kitchen area”, “avoid obstacles”)

Other

Reward Function Feedback

Numerical scores
Binary feedback (success/failure)
Multi-level ratings

Human Preference Comparisons

Compare two segments of policy execution effects
Common dimensions: execution efficiency, safety degree, human-like performance

Common Datasets

Open X-Embodiment: 22 robot types, 1 million+ trajectories
RoboNet
RLBench
D4RL