Tag: LLM

35 articles

LLM Application Engineering: Key Practices from Demo to P...

Core experience moving LLM applications from prototype to production: context management, error handling, cost control, observability. No basics, just real pitfalls.

3/10/2026

Real-time Voice Interaction Pipeline Latency Optimization...

Documenting the process of building ASR→LLM→TTS real-time voice pipeline: why latency is high, how pipeline concurrency reduces first-byte latency, VAD endpoint detection pitfalls, and practical co...

3/8/2026

AI Research #135: Gemini 3 Pro Back on Top - MoE, Million...

Explains Gemini 3 Pro's advantages through sparse MoE architecture, million-token context, native multimodal (text/image/video/PDF), thinking depth control (thinking_level), and Deep Think mode. St...

12/2/2025

AI Research #130: Qwen2.5-Omni Practical Applications

Office assistant, education and training, programming and operations, search-enhanced RAG, device control/plugin agents, and companion entertainment. Covers...

11/19/2025

AI Research #129: Qwen2.5-Omni-7B Key Specs - VRAM, Conte...

Runs stably at FP16 ~14GB VRAM, with INT8/INT4 quantization (<4GB) enabling deployment on consumer GPUs or edge devices. Combined with FlashAttention 2 and...

11/18/2025

AI Research #127: Qwen2.5-Omni Deep Dive - Thinker-Talker...

Engineering breakdown of Qwen2.5-Omni (2024-2025) Thinker-Talker dual-core architecture: unified Transformer decoder for text/image/video/audio fusion, TMRoPE...

11/16/2025

AI Investigation #75: From LLM to LBM - Robot Hierarchica...

The integration of Large Language Models (LLM) with robot real-time control is driving intelligent upgrades in robotics. LLMs show great potential in...

9/11/2025

AI Research 13 - LLM and Agent Research: The Rise and Dev...

2024 is called the 'Year of Agents'. LLM trends show parallel development of 'bigger and stronger' and 'smaller and more specialized'. OpenAI o1 series, Claude, and other multimodal models continue...

6/24/2025

AI Research 12 - LLM and Agent Research: Overview of Majo...

Major LLM application directions in 2024-2025 include enterprise applications (code assistance, customer service, knowledge management) and consumer applications (general conversation, content crea...

6/18/2025

LangChain-26 Custom Agent Complete Tutorial Building a Cu...

This article demonstrates how to create a chat agent using the Langchain library and GPT-4 model in Python by defining tool functions and integrating them with LLM to achieve queries for informatio...

4/15/2024

LangChain-24 AgentExecutor Comprehensive Guide

This article introduces how to use the Langchain library in Python for document retrieval, load web content, configure OpenAIEmbeddings, and integrate GPT-3.5-turbo model for Q&A. It demonstrates h...

4/14/2024

LangChain-25 ReAct Framework Detailed Explanation Integra...

This article introduces ReAct, a framework that uses logical reasoning and action sequences to achieve goal-oriented tasks through LLM decision-making and operations. The core components include Th...

4/14/2024

LangChain-22 Text Embedding and FAISS Practical Explanation

This article introduces the key role of TextEmbedding in NLP, how to convert text into real number vectors to represent semantic relationships, and how to combine OpenAIEmbeddings and FAISS for eff...

4/13/2024

LangChain-23 Vector AI Semantic Search System Vector Data...

This article introduces how to use Chroma vector database to process and retrieve high-dimensional vector embeddings from documents, vectorize them using...

4/13/2024

LangChain-20 Document Loaders TextLoader, CSVLoader, PyPD...

This article introduces various document loaders provided by the LangChain library, such as TextLoader, CSVLoader, DirectoryLoader, etc., demonstrating how to load and process data in various formats.

4/12/2024

LangChain Text Splitter: Character, Word, HTML and Code-b...

This article introduces various TextSplitters in the LangChain library, including character-based, word-based, HTML tag-based, and programming language-based splitters, as well as their application...

4/12/2024

LangChain Cache Mechanism: InMemoryCache and SQLiteCache ...

LangChain provides a comprehensive caching mechanism to significantly reduce LLM call latency and costs. Its core includes InMemoryCache (in-memory cache) and SQLiteCache (persistent cache).

4/11/2024

LangChain-19 TokenUsage Callback Function Explained

Explains how to integrate OpenAI GPT-3 model in Python through LangChain library, demonstrating how to use the `get_openai_callback` function to obtain callbacks and execute requests.

4/11/2024

LangChain-16 Using Tools: Mastering LLM Tool Calling

LangChain is currently one of the most popular LLM application development frameworks, specifically designed for building intelligent assistants, automation...

4/10/2024

LangChain-17 Function Calling AI Function Calling Explained

Function Calling is a core technology for Large Language Models (like GPT-4, Claude, Gemini) to interact with external systems. It enables AI to not only understand language but also execute tasks,...

4/10/2024

LangChain-14 OpenAI Content Moderation (Moderation) Expla...

Content moderation is a core component of modern internet platform safety and compliance, used to identify, filter, and manage user-generated content (UGC) to prevent the spread of illegal, low-qua...

4/9/2024

LangChain-15 Intelligent Knowledge Retrieval: AgentExecut...

Build an intelligent knowledge retrieval system using Wikipedia search plugin, AgentExecutor, and LangChain tools. Covers agent initialization, tool binding, and multi-step reasoning workflows.

4/9/2024

LangChain-12 Routing By Semantic Similarity

This article introduces a method using large models (like OpenAI) and Prompt templates to handle unexpected inputs in program design by calculating the similarity between queries and preset templates.

4/8/2024

LangChain-13 Memory ConversationBufferMemory: Conversatio...

This article introduces how to use tools in the LangChain library to manage conversation context of large models in Python. Through components like...

4/8/2024

LangChain-11 Code Writing FunctionCalling: Autoregressive...

This article introduces how to use the GPT-3.5-Turbo model to write Python code to solve users' abstract calculation problems, such as 2+2 and complex mathematical expressions, demonstrating the mo...

4/7/2024

LangChain 09 - Query SQL DB with RUN GPT

This article introduces how to use Python libraries like langchain and ChatOpenAI (GPT-3.5-turbo) combined with SQLite database to create a program to execute SQL queries and return results in natu...

4/6/2024

LangChain 10 - Agents Langchainhub Guide

This article introduces how to use LangChainHub's Hub mechanism through Python code to easily access and share Prompts. Although the project hasn't been...

4/6/2024

LangChain 07 - Multiple Chains

How to use Runnable and Prompts in LangChain to create chainable conversation flows for multi-stage question answering, with practical examples of sequential and parallel chain composition.

4/5/2024

LangChain 08 - Query SQL DB with GPT

This article introduces how to use LangChain framework to import Chinook SQLite database through Python script and use GPT model to execute SQL queries, such as calculating employee count.

4/5/2024

LangChain 05 - RAG Enhanced Conversational Retrieval

This article introduces how to use tools in LangChain library, such as OpenAIEmbeddings and ChatModels, combined with document retrieval technology, to create a program that generates answers based...

4/4/2024

LangChain 06 - RAG with Source Document

Retrieval-Augmented Generation (RAG) with Source Document is an AI technology framework that combines retrieval with large language model generation. Its core...

4/4/2024

LangChain 03 - astream_events Streaming Output with FAISS...

This article introduces how to use DocArrayInMemorySearch to vectorize text data, combined with OpenAIEmbeddings and GPT-3.5 model, to implement relevant information retrieval and answer generation...

4/3/2024

LangChain 04 - RAG Retrieval-Augmented Generation

This article explains in detail how to use RAG technology in LangChain, combined with OpenAI's GPT-3.5 model, to improve text generation quality through retrieval and generation. Provides installat...

4/3/2024

LangChain 01 - Getting Started: Quick Hello World Guide

This article introduces how to use the LangChain library with OpenAI API and GPT-3.5-turbo model to create a template for generating jokes about specific topics (like cats). The author demonstrates...

4/2/2024

LangChain 02 - JsonOutputParser and Streaming JSON Data P...

This article explains how to install and use LangChain and OpenAI API in Python, retrieve specified country and its population data through async functions, and demonstrates the process of progress...

4/2/2024