Gleam Lab · Tag Archive

Tag: 深度学习

28 articles collected by topic for tutorials, cases, engineering practice, and research notes.

AI Research #135: Gemini 3 Pro Back on Top - MoE, Million-token Context and Deep Think

Explains Gemini 3 Pro's advantages through sparse MoE architecture, million-token context, native multimodal (text/image/video/PDF), thinking depth control (thinking_leve...

12/2/2025

AI Research #130: Qwen2.5-Omni Practical Applications

Office assistant, education and training, programming and operations, search-enhanced RAG, device control/plugin agents, and companion entertainment.

11/19/2025

AI Research #129: Qwen2.5-Omni-7B Key Specs - VRAM, Context and Deployment

Runs stably at FP16 ~14GB VRAM, with INT8/INT4 quantization (<4GB) enabling deployment on consumer GPUs or edge devices.

11/18/2025

AI Research #128: Qwen2.5-Omni Training Pipeline - Three-stage Multi-modal Training

Complete training pipeline breakdown for Qwen2.5-Omni: Thinker based on Qwen2.5, vision initialized from Qwen2.5-VL, audio from Whisper-large-v3.

11/17/2025

AI Research #127: Qwen2.5-Omni Deep Dive - Thinker-Talker Dual-core Architecture

Engineering breakdown of Qwen2.5-Omni (2024-2025) Thinker-Talker dual-core architecture: unified Transformer decoder for text/image/video/audio fusion, TMRoPE.

11/16/2025

AI Research #125: Tesla FSD Business Model and Competitive Landscape

As of end 2022, Tesla had ~$2.9 billion in FSD-related deferred revenue Q4 2022 recognized $324 million in FSD revenue

11/11/2025

AI Research #124: Tesla FSD V14 Deep Analysis

Tesla FSD V14 real-world performance and road tests, comparing V13.2 on urban roads and highways: key disengagement metrics, lane changes/ramps, destination arrival...

11/10/2025

AI Research #123: FSD V14 Deep Analysis - Vision-Only SDF vs V12

3D environment reconstruction Precision: 10cm (3× improvement over V12's ~33cm resolution) Multi-frame spatiotemporal fusion for dynamic object tracking

11/7/2025

AI Research #121: DeepSeek-OCR Research Directions

Frontier approaches and engineering implementation for DeepSeek-OCR (2025, including 3B parameter direction).

11/5/2025

AI Research #119: DeepSeek-OCR PyTorch FlashAttn 2.7.3 Inference and Deployment

Comprehensive guide for DeepSeek-OCR local/private deployment based on Python 3.12, PyTorch 2.6.0, Transformers 4.46.3 and FlashAttention 2.7.3.

11/4/2025

AI Research #120: DeepSeek-OCR from 0 to 1 - Getting Started and Engineering Essentials

Complete getting started path and engineering essentials for DeepSeek-OCR (as of 2025), covering environment setup (Python/PyTorch 2.x, Transformers 4.

11/4/2025

AI Research #118: Embodied AI Mobile-ALOHA - Mobile Base + Dual-Arm Collaboration

Mobile-ALOHA: An open-source mobile manipulation solution combining mobile chassis and dual-arm collaboration.

11/3/2025

AI Research #116: Tesla HW3.0 vs HW4.0 - Camera Resolution, Compute and Perception Upgrade

Comprehensive comparison of Tesla HW3.0 and HW4.0 hardware: camera resolution upgraded from 1.2MP to 5MP with better HDR/night vision

10/31/2025

AI Investigation #108: Complete Robot Model Training Pipeline - From Pre-training to Reinforcement Learning and Human Feedback

Full robot training pipeline: pre-training, fine-tuning (LoRA), reinforcement learning, imitation learning, and human feedback for safe autonomous decision-making.

10/20/2025

AI Investigation #107: RL and Robot Training Data Format Analysis

Constructed in state-action-reward sequence form, supporting spatiotemporal understanding of models like Transformers.

10/18/2025

AI Investigation #106: Robot Learning Data Collection Tools and Methods - Sensors, APIs, Teleoperation and Simulation

Core data collection methods and application scenarios, covering over ten methods from manual entry, sensor collection, web crawlers, API calls, log collection.

10/17/2025

AI Investigation #105: Robot Learning Data Collection - From Demonstration Videos to State-Action Pairs

Data collection is a critical step in robot learning development, covering demonstration video collection, trajectory recording, state-action pair generation...

10/16/2025

AI Investigation #103: Embodied AI Technology Landscape

Comprehensive overview of embodied AI tech stack: hardware (GPU, sensors, actuators), software (ROS, simulation), and algorithms (deep learning, RL, VLA models).

10/14/2025

AI Investigation #102: Intelligent Robotic Arms, Autonomous Driving and Humanoid Robots - Imitation Learning, Reinforcement Learning and Multimodal Fusion Trends

Different types of robots have huge differences in structure, tasks and control methods, so AI algorithm adaptation strategies also need to be tailored.

10/13/2025

Tag: 深度学习

AI Research #135: Gemini 3 Pro Back on Top - MoE, Million-token Context and Deep Think

AI Research #130: Qwen2.5-Omni Practical Applications

AI Research #129: Qwen2.5-Omni-7B Key Specs - VRAM, Context and Deployment

AI Research #128: Qwen2.5-Omni Training Pipeline - Three-stage Multi-modal Training

AI Research #127: Qwen2.5-Omni Deep Dive - Thinker-Talker Dual-core Architecture

AI Research #125: Tesla FSD Business Model and Competitive Landscape

AI Research #124: Tesla FSD V14 Deep Analysis

AI Research #123: FSD V14 Deep Analysis - Vision-Only SDF vs V12

AI Research #121: DeepSeek-OCR Research Directions

AI Research #119: DeepSeek-OCR PyTorch FlashAttn 2.7.3 Inference and Deployment

AI Research #120: DeepSeek-OCR from 0 to 1 - Getting Started and Engineering Essentials

AI Research #118: Embodied AI Mobile-ALOHA - Mobile Base + Dual-Arm Collaboration

AI Research #116: Tesla HW3.0 vs HW4.0 - Camera Resolution, Compute and Perception Upgrade

AI Investigation #108: Complete Robot Model Training Pipeline - From Pre-training to Reinforcement Learning and Human Feedback

AI Investigation #107: RL and Robot Training Data Format Analysis

AI Investigation #106: Robot Learning Data Collection Tools and Methods - Sensors, APIs, Teleoperation and Simulation

AI Investigation #105: Robot Learning Data Collection - From Demonstration Videos to State-Action Pairs

AI Investigation #103: Embodied AI Technology Landscape

AI Investigation #102: Intelligent Robotic Arms, Autonomous Driving and Humanoid Robots - Imitation Learning, Reinforcement Learning and Multimodal Fusion Trends

AI Investigation #101: Modern AI Methods - VLA, RT-1, RT-2 and Diffusion Models for Robot Control

AI Investigation #100: Modern AI Methods - Reinforcement Learning, Imitation Learning and Transformers for Robot Control

AI Investigation #99: Sensor Fusion Technology - Camera, LiDAR, IMU and Radar Fusion

AI Investigation #98: Visual SLAM - ORB-SLAM, RTAB-Map and VINS-Fusion

AI Investigation #97: SLAM Algorithm Comparison and Application Scenarios

AI Investigation #96: Robot Scenario Testing - From Extreme Environments to Real-time Simulation

AI Investigation #95: Robot Scenario Testing - From Extreme Environment Simulation to Automated Fault Injection

AI Investigation #93: Robot Simulation Tools - Comprehensive Comparison from Gazebo to Isaac Sim

AI Investigation #92: Robot Motion Control - From Traditional Models to Deep Learning Methods