Background
Vector Storage, also known as Vector Database, is a database system specifically optimized for storing and retrieving high-dimensional vector data. The core design goal of such databases is to efficiently handle vector embeddings generated by AI models, which are typically floating-point arrays of 128 to 2048 dimensions or even higher, precisely representing the positional relationships of complex data in abstract semantic space.
In terms of technical implementation, vector databases employ several key technologies:
- Approximate Nearest Neighbor (ANN) Algorithms: Such as HNSW (Hierarchical Navigable Small World) graphs and IVF (Inverted File) indexes, enabling sub-linear time complexity for high-dimensional space searches
- Specialized Storage Formats: Optimized storage layouts for vector data characteristics, improving memory access efficiency
- Distributed Architecture: Supporting horizontal scaling to handle massive vector data
Main application scenarios include:
- Semantic Search: Convert queries to vectors and directly search for the most relevant documents
- Recommendation Systems: Generate recommendations based on vector similarity between users and items
- Anomaly Detection: Identify anomalous samples through vector distance
- Multi-modal Retrieval: Unified processing of vector representations for text, images, audio, and other modalities
Install Dependencies
pip install chromadb
# pip install faiss-cpu code is similar as well, both are vector databases
Write Code
# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('./state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())
# similarity search
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)
# similarity search by vector
embedding_vector = OpenAIEmbeddings().embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)
Practical Cases
1. Document Loading and Preprocessing
- Supports .docx/.doc format parsing, using python-docx library to extract paragraphs, tables, and image descriptions
- Preprocessing flow: Automatically filters headers/footers, merges tables spanning pages, standardizes document structure
- Exception handling: Triggers OCR backup solution when encountering encrypted documents
2. Vectorization Engine Configuration
# Engine selection logic
if security_level > 3: # High security requirements
embeddings = Text2VecPipeline(
model_name="text2vec-base-chinese",
device="cuda:0" # GPU acceleration
)
else: # Regular scenarios
embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
deployment="your-azure-endpoint" # Enterprise deployment
)
- Batch processing strategy: Parallel computation for every 50 pages as a chunk
- Vector dimension unified to 1024 dimensions, using PCA dimensionality reduction for compatibility
- Caching mechanism: Skips redundant computation for unmodified chapters
3. Persistent Storage Design
| Database | Primary Advantage | Applicable Scenarios |
|---|---|---|
| Chroma | Lightweight | Development environment testing |
| Milvus | High concurrency | Production environment clusters |
| FAISS | Fast retrieval | Local deployment |
4. LangChain Functional Architecture
- Requirement tracing: Automatically annotate source requirement clauses corresponding to answers
- Conflict detection: Trigger warnings when questions involve contradictions across multiple clauses
- Version difference prompts: “The current description differs from version 2.0 as follows…“
5. Large Model Service Configuration
- Hardware requirements: Minimum 16GB RAM + RTX 3060, recommended A100 40GB VRAM
- Quantization scheme: 4-bit quantization reduces VRAM usage by 70%, with accuracy loss <2%
- Hybrid scheduling strategy: First response uses ChatGLM3 to ensure data stays within the domain
6. API Service Extensions
@app.route('/api/v1/query', methods=['GET'])
def handle_query():
require_references = request.args.get('ref', 'true')
detail_level = int(request.args.get('detail', 1))
if not verify_signature(request.headers):
return jsonify({"error": "Invalid token"}), 403
- Added Swagger UI documentation
- OAuth2.0 authentication support
- Response caching (Redis cluster)
- Load balancing configuration (Nginx + Gunicorn)
Complete Code Example
need_embedding = False
persist_directory = 'chroma'
if need_embedding:
loader = Docx2txtLoader("./short.docx")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=500)
texts = text_splitter.split_documents(documents)
embeddings = HuggingFaceEmbeddings(model_name='./text2vec-base-chinese')
db = Chroma.from_documents(texts, embeddings, persist_directory=persist_directory)
db.persist()
else:
embeddings = HuggingFaceEmbeddings(model_name='./text2vec-base-chinese')
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
retriever = db.as_retriever()
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its orignal language.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
template = """Answer the question based only on the following context, 请用中文回复:
{context}
Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")
def llm():
result = ChatOpenAI(temperature=0.8)
return result
def _combine_documents(docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"):
doc_strings = [format_document(doc, document_prompt) for doc in docs]
return document_separator.join(doc_strings)
memory = ConversationBufferMemory(
return_messages=True, output_key="answer", input_key="question"
)
loaded_memory = RunnablePassthrough.assign(
chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)
final_chain = loaded_memory | standalone_question | retrieved_documents | answer
app = Flask(__name__)
@app.route("/get/<question>")
def get(question):
inputs = {"question": f"{question}"}
result = final_chain.invoke(inputs)
return str(result['answer'])
app.run(host='0.0.0.0', port=8888, debug=True)