Modern AI Methods Overview

  • Reinforcement Learning (RL)
  • Imitation Learning (IL / BC)
  • Transformer Large Models (ACT, VLA)
  • Multimodal Perception Fusion

Vision-Language-Action Models (VLA)

RT-1

  • Training data: 130,000 human demonstrations
  • Tasks: 700+ kitchen scenarios
  • Input: 6 consecutive images + natural language instructions
  • Output: 11-dimensional discrete action vector
  • Success rate: 85%+

RT-2

  • Parameters: 5.5B (PaLI-X)
  • Innovation: Knowledge transfer, action discretization, hybrid training
  • Improvement: +47% open vocabulary tasks, +60% adaptation, +35% complex instructions

Diffusion Policy

RDT-1B

  • 256-layer Transformer
  • Generates 64-step dual-arm coordinated action sequences
  • Improvement: +23% fine grasping, -37% time for complex assembly, +41% dynamic environments

Advantages

  • Natural multimodal action distribution processing
  • Handles environment uncertainty
  • Strong sequence modeling capability

Challenges

  • Slow sampling speed
  • Solutions: Knowledge distillation, Transformer
  1. Early Stage: Pure reward optimization (RL)
  2. Middle Stage: Imitation Learning (IL)
  3. Current Stage: Large models + generative policies