Apache Druid Real-time OLAP Architecture & Selection Points

TL;DR

Scenario: Offline + real-time dashboards/monitoring/operations analysis, need high concurrency, sub-second aggregation
Conclusion: When “time partitioning + pre-aggregation + stream-batch integration” is strong requirement, Druid is more convenient; flexibility lower than ClickHouse/ES, need trade-off based on business modeling
Output: Summary, version matrix and common fault quick reference & fix SOP

Version Matrix

Version	Release Date	Note
34.0.0	2025-08-11	Latest stable version verified in docs; upgrade on demand and evaluate vectorization and concurrency parameters.
33.0.0	2025-04-29	Previous major version, official still provides release info for reference.
32.0.1	2025-03-19	Maintenance version; since 32.0 Java 11 deprecated, need plan JDK upgrade.

Druid Introduction

Data analysis infrastructure can be divided into categories based on different business needs and technical characteristics:

Batch Processing Analysis Based on Hadoop/Spark
- Typical case: Use Hadoop’s MapReduce or Spark Core for large-scale dataset processing
- Application scenarios: Suitable for offline analysis, data mining on historical data
Hybrid Architecture (Hadoop/Spark + RDBMS)
- Implementation: Use Hadoop/Spark for preprocessing, import aggregated results to MySQL/Oracle
NoSQL Storage Architecture
- Solution: Save processed results to HBase, MongoDB and other NoSQL databases
Stream Processing Architecture
- Technology selection: Use Storm/Spark Streaming/Flink for real-time data stream processing
Real-time Analysis Database Architecture
- Representative product: Use Druid and other OLAP analytical databases
- Core advantage: Supports sub-second query response, suitable for real-time analysis scenarios

Druid is a distributed in-memory real-time analysis system launched by Metamarkets, solving the problem of how to perform fast, interactive queries and analysis on large-scale datasets. Druid is an open-source data analysis engine tool designed for sub-second queries on real-time and historical data, mainly applied to OLAP queries on data. Druid provides low-latency (real-time) data ingestion, flexible data exploration, and fast data aggregation.

Architecture Components

Ingestion Layer

Supervisor: Manages data ingestion tasks
Data Sources: Basic concept in Druid, represents a set of data

Storage Layer

Druid uses columnar storage to optimize query performance
Segments: Data in Druid is divided into blocks called segments

Query Layer

Druid supports multiple query types including real-time queries, batch queries and complex aggregation queries
Query Coordinator: Manages query requests and distributes them to different nodes

Service Layer

Broker: Accepts query requests and routes them to appropriate Historical or real-time nodes
Historical: Stores historical data and processes queries
MiddleManager: Responsible for ingesting and processing real-time data streams
Coordinator: Responsible for cluster management and data balancing
Overlord: Responsible for task scheduling and management
Router: Responsible for request routing

Comparison with OLAP

SparkSQL / Impala / ClickHouse: Support massive data, strong flexibility, but no guarantee on response time
Search engine architecture systems (like Elasticsearch): Achieve sub-second response on search queries
Druid/Kylin: Pre-aggregate data at ingestion time, achieve second-level response on ultra-large datasets

Technical Features

Columnar Storage

Druid independently stores and compresses each column, only need to read content required by specific queries

Streaming and Batch Ingestion

Supports Apache Kafka, HDFS, AWS S3 and other ready-made connectors

Local Search Index

Druid creates inverted index on strings to support fast search and sorting

Flexible Schema

Druid can handle changing schemas and nested data

Time-based Partition Optimization

Druid intelligently partitions data based on time

SQL Support

Druid supports native JSON language, also supports SQL via HTTP or JDBC

Horizontal Scalability

Druid has been used in production, receiving millions of events per second, storing data for years and providing sub-second queries

Application Scenarios

What Druid excels at:

Sub-second response for most query scenarios
Both real-time event stream writing and batch data import
Pre-aggregation before data writing saves storage space and improves query efficiency
Strong horizontal scaling capability
Active community

Specific directions:

Real-time Data Analysis: Fast analysis and visualization on real-time data streams (logs, sensor data, etc.)
Business Intelligence: Used for business analysis, supports fast report generation and self-service data exploration
Monitoring and Metrics Analysis: Suitable for monitoring application performance, user behavior analysis and operational metrics
Network Analysis: Process network traffic data for quick anomaly detection or trend analysis
Complex Event Processing (CEP): Real-time processing and analysis of event streams
Machine Learning Preprocessing: As input for machine learning models

Error Quick Reference

Symptom	Root Cause	Fix
Real-time ingestion delay rising (Kafka lag increasing)	Insufficient task slots, supervisor config improper, or expansion not loaded	Overlord supervisor/{id}/stats, MiddleManager metrics increase taskCount/concurrent slots, verify and load druid-kafka-indexing-service extension
Query sometimes fast sometimes slow, obvious jitter	Too many small segments causing scheduling and scan overhead	Observe segment count, size and time distribution; enable/strengthen Compaction, merge small segments
Broker/Historical memory pressure or OOM	Result set too large, high concurrency, unreasonable context parameters	Limit return amount; enable/adjust vectorization and query context
Cannot consume Kafka (提示需加载扩展)	Kafka indexing extension not loaded to Overlord/MiddleManager	Startup logs and druid.extensions.loadList; load extension on both ends and restart
Java version/deprecation warning after upgrade	Running on Java 11, version deprecated or support will be removed	Migrate to JDK 17/21; official plans to end Java 11 support in 37.0.0