Background Description
This article introduces a complex knowledge retrieval and processing workflow, including:
- Using Wikipedia search plugin (WikipediaQueryRun with WikipediaAPIWrapper)
- AgentExecutor execution mechanism as the central controller
- Controlling context length through token calculation (num_tokens)
- Content compression strategy when exceeding thresholds (typically 2048 or 4096 tokens)
Installing Dependencies
pip install --upgrade --quiet langchain langchain-openai wikipedia
Main Code Implementation
- Initialize the Wikipedia tool with
top_k_results=5anddoc_content_chars_max=10_000 - Create a prompt template containing system, user, and agent_scratchpad
- Use GPT-4-Turbo for better accuracy
- Build OpenAIFunctionsAgentOutputParser agent
Context Token Control Implementation
def condense_prompt(prompt: ChatPromptValue) -> ChatPromptValue:
messages = prompt.to_messages()
num_tokens = llm.get_num_tokens_from_messages(messages)
ai_function_messages = messages[2:]
while num_tokens > 4_000:
ai_function_messages = ai_function_messages[2:]
num_tokens = llm.get_num_tokens_from_messages(
messages[:2] + ai_function_messages
)
messages = messages[:2] + ai_function_messages
return ChatPromptValue(messages=messages)
This function implements a sliding window approach:
- Keeps the most recent conversation turns
- Removes earlier messages when token count exceeds 4000
- Retains the system and user messages at the beginning
Key Application Scenarios
- Open-domain question answering systems
- Knowledge graph construction for information supplementation
- Scenarios requiring balance between information completeness and processing efficiency
Key Points
The article emphasizes that excessive context leads to high API costs and provides practical code for controlling context length through token-based compression.