LangChain-19 TokenUsage Callback Function Explained

TokenUsage Callback Function Overview

TokenUsage is a practical callback function provided by LangChain, specifically used to track and record token consumption during language model calls. It can help developers:

Precisely monitor API call costs
Optimize prompt design
Analyze model usage efficiency

Core Functions

TokenUsage callback primarily records three types of token data:

Input tokens (prompt_tokens): Counts the number of prompt tokens sent to the model
Output tokens (completion_tokens): Counts the number of tokens in the model’s returned results
Total tokens (total_tokens): The sum of the above two

Implementation Methods

Basic Usage Example

from langchain.callbacks import TokenUsage
from langchain.llms import OpenAI

# Initialize callback
token_usage = TokenUsage()

# Pass callback when creating LLM instance
llm = OpenAI(
    temperature=0,
    callbacks=[token_usage]
)

# Execute query
response = llm("Please introduce the Python language")

# Get token usage
print(f"Input tokens: {token_usage.prompt_tokens}")
print(f"Output tokens: {token_usage.completion_tokens}")
print(f"Total tokens: {token_usage.total_tokens}")

Usage in Chain Calls

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# Create prompt template
prompt = PromptTemplate(
    input_variables=["topic"],
    template="Please explain the following topic in detail: {topic}"
)

# Create chain
chain = LLMChain(
    llm=llm,
    prompt=prompt,
    callbacks=[token_usage]
)

# Execute chain
result = chain.run(topic="Machine Learning")

Advanced Application Scenarios

Batch Processing Monitoring

topics = ["Deep Learning", "Neural Networks", "Natural Language Processing"]
total_tokens = 0

for topic in topics:
    chain.run(topic=topic)
    total_tokens += token_usage.total_tokens
    token_usage.reset()  # Reset counter

print(f"Total token consumption for batch processing: {total_tokens}")

Cost Estimation Tool

def estimate_cost(total_tokens):
    # Assuming using GPT-3.5 model, $0.002 per 1000 tokens
    return total_tokens / 1000 * 0.002

cost = estimate_cost(total_tokens)
print(f"Estimated API call cost: ${cost:.4f}")

Notes

Multiple Callbacks: Can be used together with other callbacks (like StdOutCallback)
Async Environment: Need to ensure callback functions are thread-safe in async calls
Reset Mechanism: Long-running applications should periodically reset counters
Model Differences: Token calculation methods may vary slightly between different models

Best Practice Recommendations

Always enable TokenUsage callback in development environment
Set token consumption alerts for critical business operations
Regularly analyze historical token usage data
Optimize prompt structure based on token consumption
Consider logging token usage to monitoring systems

Install Dependencies

pip install -qU langchain-core langchain-openai

Using get_openai_callback

get_openai_callback is a practical tool built into the LangChain framework that provides:

Total token consumption (including input and output)
Total number of API calls
Estimated cost consumption
Detailed records for each call

Usage Example

from langchain.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-3.5-turbo",
)

with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    print(cb)

Running Results

Tokens Used: 26
    Prompt Tokens: 12
    Completion Tokens: 14
Successful Requests: 1
Total Cost (USD): $4.6e-05