LangChain-15 Intelligent Knowledge Retrieval: AgentExecut...

Background Description

This article introduces a complex knowledge retrieval and processing workflow, including:

Using Wikipedia search plugin (WikipediaQueryRun with WikipediaAPIWrapper)
AgentExecutor execution mechanism as the central controller
Controlling context length through token calculation (num_tokens)
Content compression strategy when exceeding thresholds (typically 2048 or 4096 tokens)

Installing Dependencies

pip install --upgrade --quiet langchain langchain-openai wikipedia

Main Code Implementation

Initialize the Wikipedia tool with top_k_results=5 and doc_content_chars_max=10_000
Create a prompt template containing system, user, and agent_scratchpad
Use GPT-4-Turbo for better accuracy
Build OpenAIFunctionsAgentOutputParser agent

Context Token Control Implementation

def condense_prompt(prompt: ChatPromptValue) -> ChatPromptValue:
    messages = prompt.to_messages()
    num_tokens = llm.get_num_tokens_from_messages(messages)
    ai_function_messages = messages[2:]
    while num_tokens > 4_000:
        ai_function_messages = ai_function_messages[2:]
        num_tokens = llm.get_num_tokens_from_messages(
            messages[:2] + ai_function_messages
        )
    messages = messages[:2] + ai_function_messages
    return ChatPromptValue(messages=messages)

This function implements a sliding window approach:

Keeps the most recent conversation turns
Removes earlier messages when token count exceeds 4000
Retains the system and user messages at the beginning

Key Application Scenarios

Open-domain question answering systems
Knowledge graph construction for information supplementation
Scenarios requiring balance between information completeness and processing efficiency

Key Points

The article emphasizes that excessive context leads to high API costs and provides practical code for controlling context length through token-based compression.