Concept Explanation

Routing By Semantic Similarity is a mechanism that uses Natural Language Processing (NLP) technology to understand user query intent and distribute information based on semantic similarity. This method automatically routes requests to the most appropriate processing unit or service by calculating the semantic relevance between user input and predefined categories or targets.

Technical Principles

  1. Semantic Embedding: Use pre-trained language models (such as BERT, GPT, or Sentence-BERT) to convert text into high-dimensional vector representations
  2. Similarity Calculation: Use metrics like cosine similarity or Euclidean distance to evaluate the semantic distance between queries and targets
  3. Routing Decision: Set similarity thresholds or use top-k selection strategies to determine the best matching target

Typical Application Scenarios

  • Customer Service Systems: Automatically classify customer issues and route them to specialists in the relevant domain
  • Search Engines: Adjust search strategies and result ranking based on query semantics
  • API Gateways: Understand API request intent and route to appropriate microservices
  • Content Recommendation: Recommend semantically related products or information based on user input content

Implementation Steps

  1. Predefine target categories or service endpoints
  2. Generate representative semantic embeddings for each target (can be aggregated from example texts)
  3. Process user queries in real-time and generate query vectors
  4. Calculate similarity scores between the query and all targets
  5. Make routing decisions based on similarity scores and business rules
  6. Optionally record routing results for subsequent model optimization

Advantages and Challenges

Advantages:

  • Reduces reliance on exact keyword matching
  • Can understand synonyms and expression variants
  • Adapts to the diversity of natural language expressions

Challenges:

  • Requires high-quality semantic models and training data
  • Similarity threshold settings require domain knowledge
  • May produce misrouting in cases of semantic ambiguity

Performance Optimization Directions

  1. Domain Adaptation: Fine-tune semantic models on specific domain data
  2. Hybrid Strategy: Combine semantic similarity with traditional rule engines
  3. Dynamic Learning: Continuously optimize routing strategies based on user feedback
  4. Caching Mechanism: Establish fast matching channels for high-frequency queries

Practical Code Examples

Problem Background

When we design programs, we typically use if-else for exact matching, such as type.equals("eat"). If the type passed is not the expected value, the program cannot process it. With the reasoning capability of large models, we can help understand user questions and infer the corresponding solution.

Installing Dependencies

pip install --upgrade --quiet  langchain-core langchain langchain-openai

Code Implementation

from langchain.utils.math import cosine_similarity
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings


# Define physics template
physics_template = """You are a very smart physics professor. \
You are great at answering questions about physics in a concise and easy to understand manner. \
When you don't know the answer to a question you admit that you don't know.

Here is a question:
{query}"""

# Define math template
math_template = """You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.

Here is a question:
{query}"""

# Initialize embedding model
embeddings = OpenAIEmbeddings()

# Convert templates to vectors
prompt_templates = [physics_template, math_template]
prompt_embeddings = embeddings.embed_documents(prompt_templates)


# Define routing function
def prompt_router(input):
    # Embed user query
    query_embedding = embeddings.embed_query(input["query"])
    # Calculate similarity with all templates
    similarity = cosine_similarity([query_embedding], prompt_embeddings)[0]
    # Select the most similar template
    most_similar = prompt_templates[similarity.argmax()]
    print("Using MATH" if most_similar == math_template else "Using PHYSICS")
    return PromptTemplate.from_template(most_similar)


# Build chain
chain = (
    {"query": RunnablePassthrough()}
    | RunnableLambda(prompt_router)
    | ChatOpenAI()
    | StrOutputParser()
)

# Test
message1 = chain.invoke("What is the speed of light?")
print(f"message1: {message1}")

message2 = chain.invoke("什么是微积分?")
print(f"message2: {message2}")

Running Results

Using PHYSICS
message1: The speed of light is approximately 299,792,458 meters per second, which is the fastest speed at which any object can travel in the universe. It is a fundamental constant of nature and is denoted by the letter "c" in physics equations.
Using MATH
message2: 微积分是数学中的一个分支,主要涉及研究函数的变化率和积分。它可以用来解决许多实际问题,例如物理学、工程学和经济学等领域的问题。微积分可以帮助我们理解和描述物体的运动、变化和增长等现象。在微积分中,常用的概念包括导数、极限、积分和微分方程等。