Key Points

Architecture Improvements

Trillion-scale sparse MoE architecture, supports up to ~1 million token context, native multimodal support for text, image, audio, video frames, PDF unified processing.

Reasoning Engine

Introduces thinking_level reasoning depth control parameter and Deep Think mode. When encountering extremely difficult problems, dynamically invests additional computational resources to explore multiple solution paths.

Benchmark Performance

  • HLE accuracy: 37-38%
  • ARC-AGI-2: 31.1% (45.1% in Deep Think mode)
  • ScreenSpot-Pro: 72.7%
  • LiveCodeBench Elo: ~2439

User Experience

More concise and direct responses, abandoning excessive politeness. Million-character memory window ensures long conversation consistency.