TL;DR

  • Scenario: Offline + real-time dashboards/monitoring/operations analysis, need high concurrency, sub-second aggregation
  • Conclusion: When “time partitioning + pre-aggregation + stream-batch integration” is strong requirement, Druid is more convenient; flexibility lower than ClickHouse/ES, need trade-off based on business modeling
  • Output: Summary, version matrix and common fault quick reference & fix SOP

Version Matrix

VersionRelease DateNote
34.0.02025-08-11Latest stable version verified in docs; upgrade on demand and evaluate vectorization and concurrency parameters.
33.0.02025-04-29Previous major version, official still provides release info for reference.
32.0.12025-03-19Maintenance version; since 32.0 Java 11 deprecated, need plan JDK upgrade.

Druid Introduction

Data analysis infrastructure can be divided into categories based on different business needs and technical characteristics:

  1. Batch Processing Analysis Based on Hadoop/Spark

    • Typical case: Use Hadoop’s MapReduce or Spark Core for large-scale dataset processing
    • Application scenarios: Suitable for offline analysis, data mining on historical data
  2. Hybrid Architecture (Hadoop/Spark + RDBMS)

    • Implementation: Use Hadoop/Spark for preprocessing, import aggregated results to MySQL/Oracle
  3. NoSQL Storage Architecture

    • Solution: Save processed results to HBase, MongoDB and other NoSQL databases
  4. Stream Processing Architecture

    • Technology selection: Use Storm/Spark Streaming/Flink for real-time data stream processing
  5. Real-time Analysis Database Architecture

    • Representative product: Use Druid and other OLAP analytical databases
    • Core advantage: Supports sub-second query response, suitable for real-time analysis scenarios

Druid is a distributed in-memory real-time analysis system launched by Metamarkets, solving the problem of how to perform fast, interactive queries and analysis on large-scale datasets. Druid is an open-source data analysis engine tool designed for sub-second queries on real-time and historical data, mainly applied to OLAP queries on data. Druid provides low-latency (real-time) data ingestion, flexible data exploration, and fast data aggregation.

Architecture Components

Ingestion Layer

  • Supervisor: Manages data ingestion tasks
  • Data Sources: Basic concept in Druid, represents a set of data

Storage Layer

  • Druid uses columnar storage to optimize query performance
  • Segments: Data in Druid is divided into blocks called segments

Query Layer

  • Druid supports multiple query types including real-time queries, batch queries and complex aggregation queries
  • Query Coordinator: Manages query requests and distributes them to different nodes

Service Layer

  • Broker: Accepts query requests and routes them to appropriate Historical or real-time nodes
  • Historical: Stores historical data and processes queries
  • MiddleManager: Responsible for ingesting and processing real-time data streams
  • Coordinator: Responsible for cluster management and data balancing
  • Overlord: Responsible for task scheduling and management
  • Router: Responsible for request routing

Comparison with OLAP

  • SparkSQL / Impala / ClickHouse: Support massive data, strong flexibility, but no guarantee on response time
  • Search engine architecture systems (like Elasticsearch): Achieve sub-second response on search queries
  • Druid/Kylin: Pre-aggregate data at ingestion time, achieve second-level response on ultra-large datasets

Technical Features

Columnar Storage

  • Druid independently stores and compresses each column, only need to read content required by specific queries

Streaming and Batch Ingestion

  • Supports Apache Kafka, HDFS, AWS S3 and other ready-made connectors

Local Search Index

  • Druid creates inverted index on strings to support fast search and sorting

Flexible Schema

  • Druid can handle changing schemas and nested data

Time-based Partition Optimization

  • Druid intelligently partitions data based on time

SQL Support

  • Druid supports native JSON language, also supports SQL via HTTP or JDBC

Horizontal Scalability

  • Druid has been used in production, receiving millions of events per second, storing data for years and providing sub-second queries

Application Scenarios

What Druid excels at:

  • Sub-second response for most query scenarios
  • Both real-time event stream writing and batch data import
  • Pre-aggregation before data writing saves storage space and improves query efficiency
  • Strong horizontal scaling capability
  • Active community

Specific directions:

  • Real-time Data Analysis: Fast analysis and visualization on real-time data streams (logs, sensor data, etc.)
  • Business Intelligence: Used for business analysis, supports fast report generation and self-service data exploration
  • Monitoring and Metrics Analysis: Suitable for monitoring application performance, user behavior analysis and operational metrics
  • Network Analysis: Process network traffic data for quick anomaly detection or trend analysis
  • Complex Event Processing (CEP): Real-time processing and analysis of event streams
  • Machine Learning Preprocessing: As input for machine learning models

Error Quick Reference

SymptomRoot CauseFix
Real-time ingestion delay rising (Kafka lag increasing)Insufficient task slots, supervisor config improper, or expansion not loadedOverlord supervisor/{id}/stats, MiddleManager metrics increase taskCount/concurrent slots, verify and load druid-kafka-indexing-service extension
Query sometimes fast sometimes slow, obvious jitterToo many small segments causing scheduling and scan overheadObserve segment count, size and time distribution; enable/strengthen Compaction, merge small segments
Broker/Historical memory pressure or OOMResult set too large, high concurrency, unreasonable context parametersLimit return amount; enable/adjust vectorization and query context
Cannot consume Kafka (提示需加载扩展)Kafka indexing extension not loaded to Overlord/MiddleManagerStartup logs and druid.extensions.loadList; load extension on both ends and restart
Java version/deprecation warning after upgradeRunning on Java 11, version deprecated or support will be removedMigrate to JDK 17/21; official plans to end Java 11 support in 37.0.0