LangChain-23 Vector AI语义检索系统向量数据库+大语言模型

背景描述

向量存储(Vector Storage)，也称为向量数据库(Vector Database)，是一种专门优化用于存储和检索高维向量数据的数据库系统。这类数据库的核心设计目标是高效处理由人工智能模型生成的向量嵌入(Vector Embeddings)，这些嵌入通常是128维到2048维甚至更高维度的浮点数数组，能够精确表示复杂数据在抽象语义空间中的位置关系。

在技术实现上，向量数据库采用了几项关键技术：

近似最近邻(ANN)算法：如HNSW(Hierarchical Navigable Small World)图、IVF(Inverted File)索引等，可以在亚线性时间内完成高维空间搜索
专用存储格式：针对向量数据的特性优化存储布局，提高内存访问效率
分布式架构：支持水平扩展以处理海量向量数据

主要应用场景包括：

语义搜索：将查询转换为向量后直接搜索最相关的文档
推荐系统：基于用户和物品的向量相似度生成推荐
异常检测：通过向量距离识别异常样本
多模态检索：统一处理文本、图像、音频等多种模态的向量表示

安装依赖

pip install chromadb
# pip install faiss-cpu 的代码也差不多 都是向量数据库

编写代码

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('./state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())

# similarity search
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

# similarity search by vector
embedding_vector = OpenAIEmbeddings().embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)

实际案例

1. 文档加载与预处理

支持.docx/.doc格式解析，使用python-docx库提取段落、表格和图片描述
预处理流程：自动过滤页眉页脚、合并跨页表格、标准化文档结构
异常处理：当遇到加密文档时触发OCR备用方案

2. 向量化引擎配置

# 引擎选择逻辑
if security_level > 3:  # 涉密要求高时
    embeddings = Text2VecPipeline(
        model_name="text2vec-base-chinese",
        device="cuda:0"  # GPU加速
    )
else:  # 常规场景
    embeddings = OpenAIEmbeddings(
        model="text-embedding-3-large",
        deployment="your-azure-endpoint"  # 企业级部署
    )

批量处理策略：每50页为一个chunk进行并行计算
向量维度统一为1024维，使用PCA降维保持兼容性
缓存机制：对未修改的章节跳过重复计算

3. 持久化存储设计

数据库	最大优势	适用场景
Chroma	轻量级	开发环境测试
Milvus	高并发	生产环境集群
FAISS	快速检索	本地化部署

4. LangChain功能架构

需求追溯：自动标注回答对应的原始需求条款
冲突检测：当提问涉及多条款矛盾时触发预警
版本差异提示：“当前描述与v2.0版本存在以下不同…“

5. 大模型服务配置

硬件要求：最低16GB内存 + RTX 3060，推荐A100 40GB显存
量化方案：4-bit量化减少70%显存占用，精度损失<2%
混合调度策略：首次响应使用ChatGLM3确保数据不出域

6. API服务扩展

@app.route('/api/v1/query', methods=['GET'])
def handle_query():
    require_references = request.args.get('ref', 'true')
    detail_level = int(request.args.get('detail', 1))
    if not verify_signature(request.headers):
        return jsonify({"error": "Invalid token"}), 403

增加Swagger UI文档
支持OAuth2.0认证
响应缓存（Redis集群）
负载均衡配置（Nginx + Gunicorn）

完整代码示例

need_embedding = False

persist_directory = 'chroma'
if need_embedding:
    loader = Docx2txtLoader("./short.docx")
    documents = loader.load()

    text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=500)
    texts = text_splitter.split_documents(documents)

    embeddings = HuggingFaceEmbeddings(model_name='./text2vec-base-chinese')
    db = Chroma.from_documents(texts, embeddings, persist_directory=persist_directory)
    db.persist()
else:
    embeddings = HuggingFaceEmbeddings(model_name='./text2vec-base-chinese')
    db = Chroma(persist_directory=persist_directory, embedding_function=embeddings)

retriever = db.as_retriever()

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its orignal language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

template = """Answer the question based only on the following context, 请用中文回复:
{context}

Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")

def llm():
    result = ChatOpenAI(temperature=0.8)
    return result

def _combine_documents(docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)

memory = ConversationBufferMemory(
    return_messages=True, output_key="answer", input_key="question"
)

loaded_memory = RunnablePassthrough.assign(
    chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)

final_chain = loaded_memory | standalone_question | retrieved_documents | answer

app = Flask(__name__)

@app.route("/get/<question>")
def get(question):
    inputs = {"question": f"{question}"}
    result = final_chain.invoke(inputs)
    return str(result['answer'])

app.run(host='0.0.0.0', port=8888, debug=True)