Key Points
Architecture Improvements
Trillion-scale sparse MoE architecture, supports up to ~1 million token context, native multimodal support for text, image, audio, video frames, PDF unified processing.
Reasoning Engine
Introduces thinking_level reasoning depth control parameter and Deep Think mode. When encountering extremely difficult problems, dynamically invests additional computational resources to explore multiple solution paths.
Benchmark Performance
- HLE accuracy: 37-38%
- ARC-AGI-2: 31.1% (45.1% in Deep Think mode)
- ScreenSpot-Pro: 72.7%
- LiveCodeBench Elo: ~2439
User Experience
More concise and direct responses, abandoning excessive politeness. Million-character memory window ensures long conversation consistency.