TL;DR
- Scenario: Offline + real-time dashboards/monitoring/operations analysis, need high concurrency, sub-second aggregation
- Conclusion: When “time partitioning + pre-aggregation + stream-batch integration” is strong requirement, Druid is more convenient; flexibility lower than ClickHouse/ES, need trade-off based on business modeling
- Output: Summary, version matrix and common fault quick reference & fix SOP
Version Matrix
| Version | Release Date | Note |
|---|---|---|
| 34.0.0 | 2025-08-11 | Latest stable version verified in docs; upgrade on demand and evaluate vectorization and concurrency parameters. |
| 33.0.0 | 2025-04-29 | Previous major version, official still provides release info for reference. |
| 32.0.1 | 2025-03-19 | Maintenance version; since 32.0 Java 11 deprecated, need plan JDK upgrade. |
Druid Introduction
Data analysis infrastructure can be divided into categories based on different business needs and technical characteristics:
-
Batch Processing Analysis Based on Hadoop/Spark
- Typical case: Use Hadoop’s MapReduce or Spark Core for large-scale dataset processing
- Application scenarios: Suitable for offline analysis, data mining on historical data
-
Hybrid Architecture (Hadoop/Spark + RDBMS)
- Implementation: Use Hadoop/Spark for preprocessing, import aggregated results to MySQL/Oracle
-
NoSQL Storage Architecture
- Solution: Save processed results to HBase, MongoDB and other NoSQL databases
-
Stream Processing Architecture
- Technology selection: Use Storm/Spark Streaming/Flink for real-time data stream processing
-
Real-time Analysis Database Architecture
- Representative product: Use Druid and other OLAP analytical databases
- Core advantage: Supports sub-second query response, suitable for real-time analysis scenarios
Druid is a distributed in-memory real-time analysis system launched by Metamarkets, solving the problem of how to perform fast, interactive queries and analysis on large-scale datasets. Druid is an open-source data analysis engine tool designed for sub-second queries on real-time and historical data, mainly applied to OLAP queries on data. Druid provides low-latency (real-time) data ingestion, flexible data exploration, and fast data aggregation.
Architecture Components
Ingestion Layer
- Supervisor: Manages data ingestion tasks
- Data Sources: Basic concept in Druid, represents a set of data
Storage Layer
- Druid uses columnar storage to optimize query performance
- Segments: Data in Druid is divided into blocks called segments
Query Layer
- Druid supports multiple query types including real-time queries, batch queries and complex aggregation queries
- Query Coordinator: Manages query requests and distributes them to different nodes
Service Layer
- Broker: Accepts query requests and routes them to appropriate Historical or real-time nodes
- Historical: Stores historical data and processes queries
- MiddleManager: Responsible for ingesting and processing real-time data streams
- Coordinator: Responsible for cluster management and data balancing
- Overlord: Responsible for task scheduling and management
- Router: Responsible for request routing
Comparison with OLAP
- SparkSQL / Impala / ClickHouse: Support massive data, strong flexibility, but no guarantee on response time
- Search engine architecture systems (like Elasticsearch): Achieve sub-second response on search queries
- Druid/Kylin: Pre-aggregate data at ingestion time, achieve second-level response on ultra-large datasets
Technical Features
Columnar Storage
- Druid independently stores and compresses each column, only need to read content required by specific queries
Streaming and Batch Ingestion
- Supports Apache Kafka, HDFS, AWS S3 and other ready-made connectors
Local Search Index
- Druid creates inverted index on strings to support fast search and sorting
Flexible Schema
- Druid can handle changing schemas and nested data
Time-based Partition Optimization
- Druid intelligently partitions data based on time
SQL Support
- Druid supports native JSON language, also supports SQL via HTTP or JDBC
Horizontal Scalability
- Druid has been used in production, receiving millions of events per second, storing data for years and providing sub-second queries
Application Scenarios
What Druid excels at:
- Sub-second response for most query scenarios
- Both real-time event stream writing and batch data import
- Pre-aggregation before data writing saves storage space and improves query efficiency
- Strong horizontal scaling capability
- Active community
Specific directions:
- Real-time Data Analysis: Fast analysis and visualization on real-time data streams (logs, sensor data, etc.)
- Business Intelligence: Used for business analysis, supports fast report generation and self-service data exploration
- Monitoring and Metrics Analysis: Suitable for monitoring application performance, user behavior analysis and operational metrics
- Network Analysis: Process network traffic data for quick anomaly detection or trend analysis
- Complex Event Processing (CEP): Real-time processing and analysis of event streams
- Machine Learning Preprocessing: As input for machine learning models
Error Quick Reference
| Symptom | Root Cause | Fix |
|---|---|---|
| Real-time ingestion delay rising (Kafka lag increasing) | Insufficient task slots, supervisor config improper, or expansion not loaded | Overlord supervisor/{id}/stats, MiddleManager metrics increase taskCount/concurrent slots, verify and load druid-kafka-indexing-service extension |
| Query sometimes fast sometimes slow, obvious jitter | Too many small segments causing scheduling and scan overhead | Observe segment count, size and time distribution; enable/strengthen Compaction, merge small segments |
| Broker/Historical memory pressure or OOM | Result set too large, high concurrency, unreasonable context parameters | Limit return amount; enable/adjust vectorization and query context |
| Cannot consume Kafka (提示需加载扩展) | Kafka indexing extension not loaded to Overlord/MiddleManager | Startup logs and druid.extensions.loadList; load extension on both ends and restart |
| Java version/deprecation warning after upgrade | Running on Java 11, version deprecated or support will be removed | Migrate to JDK 17/21; official plans to end Java 11 support in 37.0.0 |