Tag: 深度学习

28 articles

AI Research #135: Gemini 3 Pro Back on Top - MoE, Million...

Explains Gemini 3 Pro's advantages through sparse MoE architecture, million-token context, native multimodal (text/image/video/PDF), thinking depth control (thinking_level), and Deep Think mode. St...

AI Research #130: Qwen2.5-Omni Practical Applications

Office assistant, education and training, programming and operations, search-enhanced RAG, device control/plugin agents, and companion entertainment. Covers...

AI Research #129: Qwen2.5-Omni-7B Key Specs - VRAM, Conte...

Runs stably at FP16 ~14GB VRAM, with INT8/INT4 quantization (<4GB) enabling deployment on consumer GPUs or edge devices. Combined with FlashAttention 2 and...

AI Research #128: Qwen2.5-Omni Training Pipeline - Three-...

Complete training pipeline breakdown for Qwen2.5-Omni: Thinker based on Qwen2.5, vision initialized from Qwen2.5-VL, audio from Whisper-large-v3. Uses...

AI Research #127: Qwen2.5-Omni Deep Dive - Thinker-Talker...

Engineering breakdown of Qwen2.5-Omni (2024-2025) Thinker-Talker dual-core architecture: unified Transformer decoder for text/image/video/audio fusion, TMRoPE...

AI Research #125: Tesla FSD Business Model and Competitor...

FSD V14 (2025) business model and competitive landscape. Analyzes pricing logic for one-time purchase (~15,000) vs subscription (~199/month) and deferred...

AI Research #124: Tesla FSD V14 Deep Analysis

Tesla FSD V14 real-world performance and road tests, comparing V13.2 on urban roads and highways: key disengagement metrics, lane changes/ramps, destination arrival, and long-tail scenarios. V14 sh...

AI Research #123: FSD V14 Deep Analysis - Vision-Only SDF...

FSD V14 (2025) technical evolution compared to V12 (2023), focusing on vision-only approach, SDF (Signed Distance Field) occupancy reconstruction, end-to-end...

AI Research #121: DeepSeek-OCR Research Directions

Frontier approaches and engineering implementation for DeepSeek-OCR (2025, including 3B parameter direction). Summarizes research directions including...

AI Research #119: DeepSeek-OCR PyTorch FlashAttn 2.7.3 In...

Comprehensive guide for DeepSeek-OCR local/private deployment based on Python 3.12, PyTorch 2.6.0, Transformers 4.46.3 and FlashAttention 2.7.3. Includes ~3B parameter model inference, deployment o...

AI Research #120: DeepSeek-OCR from 0 to 1 - Getting Star...

Complete getting started path and engineering essentials for DeepSeek-OCR (as of 2025), covering environment setup (Python/PyTorch 2.x, Transformers 4.x), model loading, output parsing, parameter e...

AI Research #118: Embodied AI Mobile-ALOHA - Mobile + Dua...

Mobile-ALOHA: An open-source mobile manipulation solution combining mobile chassis and dual-arm collaboration. Uses whole-body teleoperation for low-cost...

AI Research #116: Tesla HW3.0 vs HW4.0 - Camera Resolutio...

Comprehensive comparison of Tesla HW3.0 and HW4.0 hardware: camera resolution upgraded from 1.2MP to 5MP with better HDR/night vision; FSD computing power...

AI Investigation #108: Complete Robot Model Training Proc...

Full robot training pipeline: pre-training, fine-tuning (LoRA), reinforcement learning, imitation learning, and human feedback for safe autonomous decision-making.

AI Investigation #107: RL and Robot Training Data Format ...

Data formats and development processes in robot and reinforcement learning systems, including time series trajectories, state-action pairs, offline RL data,...

AI Investigation #106: Robot Learning Data Collection Too...

Core data collection methods and application scenarios, covering over ten methods from manual entry, sensor collection, web crawlers, API calls, log collection...

AI Investigation #105: Robot Learning Data Collection - F...

Data collection is a critical step in robot learning development, covering demonstration video collection, trajectory recording, state-action pair generation, and data quality control strategies.

AI Investigation #103: Embodied AI Technology Landscape

Comprehensive overview of embodied AI tech stack: hardware (GPU, sensors, actuators), software (ROS, simulation), and algorithms (deep learning, RL, VLA models).

AI Investigation #102: Intelligent Robotic Arms, Autonomo...

Different types of robots have huge differences in structure, tasks and control methods, so AI algorithm adaptation strategies also need to be tailored.

AI Investigation #101: Modern AI Methods - VLA, RT-1, RT-...

Modern AI robot control methods are undergoing a major transition from reinforcement learning and imitation learning to multimodal agents driven by large models. The combination of Vision-Language-...

AI Investigation #100: Modern AI Methods - Reinforcement ...

Modern AI methods for robot control cover Reinforcement Learning (RL), Imitation Learning (IL), and Transformer-based large model methods. Reinforcement...

AI Investigation #99: Sensor Fusion Technology - Camera, ...

Sensor Fusion is a core technology in autonomous driving, robotics and smart security. Through multi-sensor data fusion of cameras, LiDAR, radar, IMU,...

AI Investigation #98: Visual SLAM - ORB-SLAM, RTAB-Map, V...

Visual SLAM is a technology that achieves autonomous positioning and environment mapping without relying on LiDAR, using only cameras. By extracting environmental features (corners, edges, textures...

AI Investigation #97: SLAM Algorithm Comparison and Appli...

Multi-sensor fusion and SLAM are core technologies for robot perception and navigation. By fusing IMU, GPS, wheel odometry, LiDAR, visual odometry and other...

AI Investigation #96: Robot Scenario Testing - From Extre...

Complete guide to robot scenario testing, covering three dimensions: environment testing, load testing, and anomaly testing. Traditional manual testing has...

AI Investigation #95: Robot Scenario Testing - From Extre...

Before robots enter practical applications, systematic scenario testing must be conducted, covering boundary conditions like extreme weather, complex terrain,...

AI Investigation #93: Robot Simulation Tools - Comprehens...

Simulation tools are an important part of robot R&D, enabling algorithm verification and system debugging in risk-free environments, accelerating iteration.

AI Investigation #92: Robot Motion Control - From Traditi...

Robot motion control can be divided into two categories: traditional model-based methods and deep learning-based intelligent control. The former emphasizes kinematics/dynamics modeling, trajectory ...