Background Description

This article introduces a complex knowledge retrieval and processing workflow, including:

  • Using Wikipedia search plugin (WikipediaQueryRun with WikipediaAPIWrapper)
  • AgentExecutor execution mechanism as the central controller
  • Controlling context length through token calculation (num_tokens)
  • Content compression strategy when exceeding thresholds (typically 2048 or 4096 tokens)

Installing Dependencies

pip install --upgrade --quiet langchain langchain-openai wikipedia

Main Code Implementation

  • Initialize the Wikipedia tool with top_k_results=5 and doc_content_chars_max=10_000
  • Create a prompt template containing system, user, and agent_scratchpad
  • Use GPT-4-Turbo for better accuracy
  • Build OpenAIFunctionsAgentOutputParser agent

Context Token Control Implementation

def condense_prompt(prompt: ChatPromptValue) -> ChatPromptValue:
    messages = prompt.to_messages()
    num_tokens = llm.get_num_tokens_from_messages(messages)
    ai_function_messages = messages[2:]
    while num_tokens > 4_000:
        ai_function_messages = ai_function_messages[2:]
        num_tokens = llm.get_num_tokens_from_messages(
            messages[:2] + ai_function_messages
        )
    messages = messages[:2] + ai_function_messages
    return ChatPromptValue(messages=messages)

This function implements a sliding window approach:

  • Keeps the most recent conversation turns
  • Removes earlier messages when token count exceeds 4000
  • Retains the system and user messages at the beginning

Key Application Scenarios

  • Open-domain question answering systems
  • Knowledge graph construction for information supplementation
  • Scenarios requiring balance between information completeness and processing efficiency

Key Points

The article emphasizes that excessive context leads to high API costs and provides practical code for controlling context length through token-based compression.