Tag: LLM

35 articles

LLM Application Engineering: Key Practices from Demo to Production

Core experience moving LLM applications from prototype to production: context management, error handling, cost control, observability. No basics, just real pitfalls.

3/10/2026

Real-time Voice Interaction Pipeline Latency Optimization

When building voice interaction systems, latency is the core experience metric.

3/8/2026

AI Research #135: Gemini 3 Pro Back on Top - MoE, Million-token Context and Deep Think

Explains Gemini 3 Pro's advantages through sparse MoE architecture, million-token context, native multimodal (text/image/video/PDF), thinking depth control (thinking_leve...

12/2/2025

AI Research #130: Qwen2.5-Omni Practical Applications

Office assistant, education and training, programming and operations, search-enhanced RAG, device control/plugin agents, and companion entertainment.

11/19/2025

AI Research #129: Qwen2.5-Omni-7B Key Specs - VRAM, Context and Deployment

Runs stably at FP16 ~14GB VRAM, with INT8/INT4 quantization (<4GB) enabling deployment on consumer GPUs or edge devices.

11/18/2025

AI Research #127: Qwen2.5-Omni Deep Dive - Thinker-Talker Dual-core Architecture

Engineering breakdown of Qwen2.5-Omni (2024-2025) Thinker-Talker dual-core architecture: unified Transformer decoder for text/image/video/audio fusion, TMRoPE.

11/16/2025

AI Investigation #75: From LLM to LBM - Hierarchical Robot Control Architecture Driven by Large Models

The integration of Large Language Models (LLM) with robot real-time control is driving intelligent upgrades in robotics.

9/11/2025

AI Research 13 - LLM and Agent Research: The Rise and Development of LLM Agents

2024 is called the 'Year of Agents'. LLM trends show parallel development of 'bigger and stronger' and 'smaller and more specialized'.

6/24/2025

AI Research 12 - LLM and Agent Research: Overview of Major LLM Application Directions

Major LLM application directions in 2024-2025 include enterprise applications (code assistance, customer service, knowledge management) and consumer applications.

6/18/2025

LangChain-26 Custom Agent Complete Tutorial: Building a Custom Agent

A Custom Agent refers to an intelligent agent program customized by users based on specific requirements, which can execute specific tasks or workflows.

4/15/2024

LangChain-24 AgentExecutor Comprehensive Guide

This article introduces how to use the Langchain library in Python for document retrieval, load web content, configure OpenAIEmbeddings, and integrate GPT-3.

4/14/2024

LangChain-25 ReAct Framework Detailed Explanation and Integration Practice

This article introduces ReAct, a framework that uses logical reasoning and action sequences to achieve goal-oriented tasks through LLM decision-making and operations.

4/14/2024

LangChain-22 Text Embedding and FAISS Practical Explanation

Text Embedding involves the process of mapping high-dimensional data (such as text, images, etc.) to lower-dimensional spaces.

4/13/2024

LangChain-23 Vector AI Semantic Search System: Vector Databases and Retrieval

Vector Storage, also known as Vector Database, is a database system specifically optimized for storing and retrieving high-dimensional vector data.

4/13/2024

LangChain-20 Document Loaders: TextLoader, CSVLoader, PyPDFLoader and More

This article introduces various document loaders provided by the LangChain library, such as TextLoader, CSVLoader, DirectoryLoader, etc., demonstrating how to load and pr...

4/12/2024

LangChain Text Splitter: Character, Word, HTML and Code-based Splitting

This article introduces various TextSplitters in the LangChain library, including character-based, word-based, HTML tag-based, and programming language-based splitters...

4/12/2024

LangChain Cache Mechanism: InMemoryCache and SQLiteCache Explained

LangChain provides a comprehensive caching mechanism to significantly reduce LLM call latency and costs. Its core includes InMemoryCache (in-memory cache) and SQLiteCache...

4/11/2024

LangChain-19 TokenUsage Callback Function Explained

Explains how to integrate OpenAI GPT-3 model in Python through LangChain library, demonstrating how to use the getopenaicallback function to obtain callbacks and execute...

4/11/2024

LangChain-16 Using Tools: Mastering LLM Tool Calling

LangChain is a powerful open-source framework designed to help developers more efficiently build and deploy applications based on Large Language Models (LLMs).

4/10/2024

LangChain-17 Function Calling AI Function Calling Explained

Function Calling is a core technology for Large Language Models (like GPT-4, Claude, Gemini) to interact with external systems.

4/10/2024

LangChain-14 OpenAI Content Moderation (Moderation) Explained

Moderation refers to the process of reviewing and managing user-generated content (UGC) through manual or automated means.

4/9/2024

LangChain-15 Intelligent Knowledge Retrieval: AgentExecutor Practice

Build an intelligent knowledge retrieval system using Wikipedia search plugin, AgentExecutor, and LangChain tools. Covers agent initialization, tool binding...

4/9/2024

LangChain-12 Routing By Semantic Similarity

This article introduces a method using large models (like OpenAI) and Prompt templates to handle unexpected inputs in program design by calculating the similarity between...

4/8/2024

LangChain-13 Memory ConversationBufferMemory: Conversation Context Management

This article introduces how to use tools in the LangChain library to manage conversation context of large models in Python.

4/8/2024

LangChain-11 Code Writing FunctionCalling: Autoregressive Language Modeling

This article introduces how GPT models work based on autoregressive language modeling, which generates coherent text by predicting the probability of the next token.

4/7/2024

LangChain 09 - Query SQL DB with RUN GPT

RUN GPT provides a powerful database query function, allowing users to input natural language to query database content.

4/6/2024

LangChain 10 - Agents Langchainhub Guide

This article introduces how to use LangChainHub's Hub mechanism through Python code to easily access and share Prompts.

4/6/2024

LangChain 07 - Multiple Chains

How to use Runnable and Prompts in LangChain to create chainable conversation flows for multi-stage question answering, with practical examples of sequential and parallel...

4/5/2024

LangChain 08 - Query SQL DB with GPT

This article introduces how to use LangChain framework to import Chinook SQLite database through Python script and use GPT model to execute SQL queries, such as calculati...

4/5/2024

LangChain 05 - RAG Enhanced Conversational Retrieval

Conversational Search is an intelligent search technology that combines natural language processing and context understanding capabilities.

4/4/2024

LangChain 06 - RAG with Source Document

Retrieval-Augmented Generation (RAG) with Source Document is an AI technology framework that combines retrieval with large language model generation.

4/4/2024

LangChain 03 - astream_events Streaming Output with FAISS Practice

This article introduces how to use DocArrayInMemorySearch to vectorize text data, combined with OpenAIEmbeddings and GPT-3.5 model, to implement relevant information retr...

4/3/2024

LangChain 04 - RAG Retrieval-Augmented Generation

This article explains in detail how to use RAG technology in LangChain, combined with OpenAI's GPT-3.5 model, to improve text generation quality through retrieval and gen...

4/3/2024

LangChain 01 - Getting Started: Quick Hello World Guide

This article introduces how to use the LangChain library with OpenAI API and GPT-3.5-turbo model to create a template for generating jokes about specific topics (like cat...

4/2/2024

LangChain 02 - JsonOutputParser and Streaming JSON Data Processing Guide

This article explains how to install and use LangChain and OpenAI API in Python, retrieve specified country and its population data through async functions.

4/2/2024