Version Matrix
Ensuring version compatibility is critical for successful DeepSeek-OCR deployment. The following version matrix has been tested and validated:
- Python: 3.12
- PyTorch: 2.6.0
- Transformers: 4.46.3
- FlashAttention: 2.7.3
Using different versions may lead to compatibility issues or performance degradation. These specific versions are recommended for production deployment.
Environment Configuration
System Requirements
- CUDA 12.1+ for GPU acceleration
- Minimum 8GB GPU memory for inference
- 16GB+ system RAM recommended
- SSD storage for model files
Installation Steps
# Create virtual environment
python -m venv deepseek-ocr
source deepseek-ocr/bin/activate
# Install PyTorch with CUDA support
pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Install transformers and dependencies
pip install transformers==4.46.3
# Install FlashAttention
pip install flash-attn==2.7.3 --no-build-isolation
# Install DeepSeek-OCR
pip install deepseek-ocr
Model Loading
Basic Inference Example
from deepseek_ocr import DeepSeekOCR
# Initialize OCR engine
ocr = DeepSeekOCR(
model_path="deepseek/ocr-3b",
device="cuda",
precision="bf16"
)
# Process image
result = ocr.process("document.png")
# Output structured results
for item in result:
print(f"Text: {item['text']}")
print(f"Confidence: {item['confidence']}")
print(f"Bounding Box: {item['bbox']}")
Supported Data Formats
DeepSeek-OCR supports comprehensive document processing capabilities:
- Image Formats: PNG, JPG, JPEG, BMP, TIFF, WebP
- Document Formats: PDF (single/multi-page), DjVu
- Input Modes: Single image, batch processing, directory scan
Training and Fine-tuning
For custom OCR scenarios, fine-tuning the base model may improve accuracy:
- Base Model: DeepSeek-OCR-3B
- Training Framework: PyTorch Lightning
- Recommended GPU: A100 (40GB) or H100
- Fine-tuning Approaches:
- LoRA adapter training (recommended for limited resources)
- Full parameter fine-tuning (requires significant GPU memory)
Model Specifications
- Parameters: Approximately 3 billion
- Model Format: safetensors
- Model Size: Approximately 6.6GB (BF16 precision)
- Precision Options: BF16, FP16, INT8, INT4
| Precision | VRAM Required | Quality |
|---|---|---|
| BF16 | ~7GB | Best |
| FP16 | ~7GB | Best |
| INT8 | ~4GB | Good |
| INT4 | ~2GB | Acceptable |
Deployment Options
Local Inference Service
Deploy as a local API service for production environments:
from deepseek_ocr import DeepSeekOCRServer
server = DeepSeekOCRServer(
model_path="deepseek/ocr-3b",
host="0.0.0.0",
port=8000
)
server.start()
HuggingFace Spaces
Quick deployment option using HuggingFace infrastructure:
- Visit HuggingFace Spaces
- Select DeepSeek-OCR template
- Configure hardware (CPU/GPU)
- Deploy with one click
vLLM Integration
For high-throughput production scenarios:
from vllm import LLM, SamplingParams
# Load model with vLLM
llm = LLM(model="deepseek/ocr-3b")
# Process OCR tasks
outputs = llm.generate(prompts, sampling_params)
Error Quick Reference
| Error | Cause | Solution |
|---|---|---|
| CUDA out of memory | Insufficient GPU VRAM | Use INT8/INT4 quantization |
| FlashAttention build failed | Missing CUDA toolkit | Install CUDA 12.1+ |
| Model not found | Incorrect path | Verify model_path parameter |
| Low accuracy | Domain mismatch | Fine-tune with domain data |
Performance Optimization
Batch Processing
For high-volume OCR workloads, batch processing significantly improves throughput:
results = ocr.process_batch(
image_paths=["doc1.png", "doc2.png", "doc3.png"],
batch_size=8
)
Quantization
Reduce resource requirements with minimal accuracy loss:
ocr = DeepSeekOCR(
model_path="deepseek/ocr-3b",
precision="int8" # or "int4"
)
Production Considerations
- Monitoring: Implement Prometheus metrics for inference latency
- Caching: Enable result caching for repeated documents
- Load Balancing: Use multiple GPU instances for horizontal scaling
- Health Checks: Implement regular model health verification