Tag: Deep Learning

28 articles

AI Research #135: Gemini 3 Pro Back on Top - MoE, Million-token Context and Deep Think

Explains Gemini 3 Pro's advantages through sparse MoE architecture, million-token context, native multimodal (text/image/video/PDF), thinking depth control (thinking_leve...

AI Research #130: Qwen2.5-Omni Practical Applications

Office assistant, education and training, programming and operations, search-enhanced RAG, device control/plugin agents, and companion entertainment.

AI Research #129: Qwen2.5-Omni-7B Key Specs - VRAM, Context and Deployment

Runs stably at FP16 ~14GB VRAM, with INT8/INT4 quantization (<4GB) enabling deployment on consumer GPUs or edge devices.

AI Research #128: Qwen2.5-Omni Training Pipeline - Three-stage Multi-modal Training

Complete training pipeline breakdown for Qwen2.5-Omni: Thinker based on Qwen2.5, vision initialized from Qwen2.5-VL, audio from Whisper-large-v3.

AI Research #127: Qwen2.5-Omni Deep Dive - Thinker-Talker Dual-core Architecture

Engineering breakdown of Qwen2.5-Omni (2024-2025) Thinker-Talker dual-core architecture: unified Transformer decoder for text/image/video/audio fusion, TMRoPE.

AI Research #125: Tesla FSD Business Model and Competitive Landscape

As of end 2022, Tesla had ~$2.9 billion in FSD-related deferred revenue Q4 2022 recognized $324 million in FSD revenue

AI Research #124: Tesla FSD V14 Deep Analysis

Tesla FSD V14 real-world performance and road tests, comparing V13.2 on urban roads and highways: key disengagement metrics, lane changes/ramps, destination arrival...

AI Research #123: FSD V14 Deep Analysis - Vision-Only SDF vs V12

3D environment reconstruction Precision: 10cm (3× improvement over V12's ~33cm resolution) Multi-frame spatiotemporal fusion for dynamic object tracking

AI Research #121: DeepSeek-OCR Research Directions

Frontier approaches and engineering implementation for DeepSeek-OCR (2025, including 3B parameter direction).

AI Research #119: DeepSeek-OCR PyTorch FlashAttn 2.7.3 Inference and Deployment

Comprehensive guide for DeepSeek-OCR local/private deployment based on Python 3.12, PyTorch 2.6.0, Transformers 4.46.3 and FlashAttention 2.7.3.

AI Research #120: DeepSeek-OCR from 0 to 1 - Getting Started and Engineering Essentials

Complete getting started path and engineering essentials for DeepSeek-OCR (as of 2025), covering environment setup (Python/PyTorch 2.x, Transformers 4.

AI Research #118: Embodied AI Mobile-ALOHA - Mobile Base + Dual-Arm Collaboration

Mobile-ALOHA: An open-source mobile manipulation solution combining mobile chassis and dual-arm collaboration.

AI Research #116: Tesla HW3.0 vs HW4.0 - Camera Resolution, Compute and Perception Upgrade

Comprehensive comparison of Tesla HW3.0 and HW4.0 hardware: camera resolution upgraded from 1.2MP to 5MP with better HDR/night vision

AI Investigation #108: Complete Robot Model Training Pipeline - From Pre-training to Reinforcement Learning and Human Feedback

Full robot training pipeline: pre-training, fine-tuning (LoRA), reinforcement learning, imitation learning, and human feedback for safe autonomous decision-making.

AI Investigation #107: RL and Robot Training Data Format Analysis

Constructed in state-action-reward sequence form, supporting spatiotemporal understanding of models like Transformers.

AI Investigation #106: Robot Learning Data Collection Tools and Methods - Sensors, APIs, Teleoperation and Simulation

Core data collection methods and application scenarios, covering over ten methods from manual entry, sensor collection, web crawlers, API calls, log collection.

AI Investigation #105: Robot Learning Data Collection - From Demonstration Videos to State-Action Pairs

Data collection is a critical step in robot learning development, covering demonstration video collection, trajectory recording, state-action pair generation...

AI Investigation #103: Embodied AI Technology Landscape

Comprehensive overview of embodied AI tech stack: hardware (GPU, sensors, actuators), software (ROS, simulation), and algorithms (deep learning, RL, VLA models).

AI Investigation #102: Intelligent Robotic Arms, Autonomous Driving and Humanoid Robots - Imitation Learning, Reinforcement Learning and Multimodal Fusion Trends

Different types of robots have huge differences in structure, tasks and control methods, so AI algorithm adaptation strategies also need to be tailored.

AI Investigation #101: Modern AI Methods - VLA, RT-1, RT-2 and Diffusion Models for Robot Control

Modern AI robot control methods are undergoing a major transition from reinforcement learning and imitation learning to multimodal agents driven by large models.

AI Investigation #100: Modern AI Methods - Reinforcement Learning, Imitation Learning and Transformers for Robot Control

Modern AI methods for robot control cover Reinforcement Learning (RL), Imitation Learning (IL), and Transformer-based large model methods.

AI Investigation #99: Sensor Fusion Technology - Camera, LiDAR, IMU and Radar Fusion

Sensor Fusion is a core technology in autonomous driving, robotics and smart security.

AI Investigation #98: Visual SLAM - ORB-SLAM, RTAB-Map and VINS-Fusion

Visual SLAM is a technology that achieves autonomous positioning and environment mapping without relying on LiDAR, using only cameras.

AI Investigation #97: SLAM Algorithm Comparison and Application Scenarios

Multi-sensor fusion and SLAM are core technologies for robot perception and navigation.

AI Investigation #96: Robot Scenario Testing - From Extreme Environments to Real-time Simulation

Complete guide to robot scenario testing, covering three dimensions: environment testing, load testing, and anomaly testing.

AI Investigation #95: Robot Scenario Testing - From Extreme Environment Simulation to Automated Fault Injection

Camera Instant Frame Loss: 5-100ms frame drop LiDAR Noise Surge: Random noise 5-20% IMU Data Jump: 1-3x normal values

AI Investigation #93: Robot Simulation Tools - Comprehensive Comparison from Gazebo to Isaac Sim

Simulation tools are an important part of robot R&D, enabling algorithm verification and system debugging in risk-free environments, accelerating iteration.

AI Investigation #92: Robot Motion Control - From Traditional Models to Deep Learning Methods

Robot motion control can be divided into two categories: traditional model-based methods and deep learning-based intelligent control.