TL;DR
- Scenario: Quickly experience Apache Druid 30.0.0 locally/single-machine, verify real-time and historical queries and console access.
- Conclusion: Following single-server quickstart config can smoothly start service; ports and memory/JVM are the most common pitfalls.
- Output: Download/extraction and environment variable list, startup command reference, access entry and common error quick reference card.
Version Matrix
| Target | Status | Note |
|---|---|---|
| Druid 30.0.0 download and extraction | Verified | Complete per commands and screenshots in article |
| Environment variables (DRUID_HOME/PATH) | Verified | Write to /etc/profile and refresh |
| Single-machine nano-quickstart startup | Verified | bin/start-nano-quickstart, for lowest config verification |
| Console access 8888 | Verified | Provides example address; note port opening/security group/firewall |
| micro-quickstart (4C/16G) | Unconfirmed | Recommend also increase jvm.config heap and processing threads |
| small/medium/large/xlarge | Unconfirmed | Need evaluate JVM and processing/cache parameters based on machine specs and data volume |
| ZooKeeper 2181 conflict handling | Verified | Avoided by stopping occupying process; can also change port/use external ZK |
| Batch/stream ingestion (Kafka/HDFS/S3) | Unconfirmed | Can skip for single-machine verification, configure for cluster/production |
System Architecture
Apache Druid is an open-source, distributed high-performance real-time analytical database system, designed specifically for fast aggregation and querying of large-scale datasets. It’s particularly suitable for processing time-series data, event data and log data, widely used in internet advertising analysis, online transaction monitoring, network security log analysis and other fields.
Druid’s core architecture uses modular design, consisting of several key components:
-
Coordinator Node
- Responsible for data segment lifecycle management
- Monitors data node status
- Executes data balancing and replication strategies
-
Historical Node
- Stores and queries immutable data segments
- Uses memory-mapped files for efficient queries
- Supports multiple compression formats (like LZ4, Zstandard)
-
Broker Node
- Receives client query requests
- Routes queries to relevant nodes
- Aggregates and returns final results
-
Ingestion Node
- Real-time data ingestion
- Supports batch and streaming modes
- Provides support for multiple data formats (JSON, CSV, etc.)
-
Deep Storage
- Persistent storage layer
- Supports HDFS, S3 and other distributed file systems
- Ensures data high availability
These components work together, enabling Druid to achieve sub-second query response times and support data ingestion of millions of events per second.
Core Components
Ingestion Layer
- Data Sources: Druid supports multiple data sources like Kafka, HDFS, Amazon S3. Data ingestion can be batch or streaming.
- Task Management: Uses task coordinator to manage data ingestion tasks, ensuring smooth and highly available data flow.
Storage Layer
- Segment: Druid divides data into multiple chunks called “Segments”. Each segment typically contains data within a time period, optimized for fast queries.
- Time Partitioning: Druid partitions data by time to improve query performance. Data is indexed by timestamp, facilitating efficient time range queries.
Query Layer
- Broker: Responsible for receiving user query requests and routing them to corresponding data nodes (Historical and Real-time nodes).
- Query Execution: Druid supports multiple query types including aggregation queries, filter queries and group-by queries.
Historical Node
- Stores and manages long-term data segments, responsible for processing queries on historical data.
Real-time Node
- Used for real-time data ingestion, real-time processing and generating queryable segments. Suitable for applications requiring low-latency data access.
Coordinator Node
- Responsible for managing Druid cluster nodes, monitoring node health, data distribution and load balancing.
Data Flow
- Data Ingestion: Data flows into Druid from external sources (like Kafka message queue), after task management and transformation is ingested.
- Data Storage: Data is segmented and stored in Historical and Real-time nodes, partitioned by time and compressed to optimize storage.
- Query Processing: Users send query requests through query interface (like SQL or Druid’s specific query language), Broker node distributes requests to corresponding data nodes, aggregates and processes query results, then returns.
Query Optimization
- Columnar Storage: Druid uses columnar storage format, improving compression ratio and query performance.
- Indexing: Druid creates indexes for each field, accelerating filter and aggregation operations.
- Pre-aggregation: Pre-computes commonly used aggregation operations to reduce real-time query computation burden.
Download and Extraction
wget https://dlcdn.apache.org/druid/30.0.0/apache-druid-30.0.0-bin.tar.gz
tar -zxvf apache-druid-30.0.0-bin.tar.gz
mv apache-druid-30.0.0 /opt/servers/
cd /opt/servers/apache-druid-30.0.0
ls
Configuration Files
Configuration files for single-server deployment are located at:
conf/druid/single-server/
├── large
├── medium
├── micro-quickstart
├── nano-quickstart
├── small
└── xlarge
Startup Requirements
| Config | CPU | Memory | Startup Command | Config Directory |
|---|---|---|---|---|
| Nano-Quickstart | 1 | 4GB | bin/start-nano-quickstart | conf/druid/single-server/nano-quickstart/* |
| Micro Quickstart | 4 | 16GB | bin/start-micro-quickstart | conf/druid/single-server/micro-quickstart/* |
| Small | 8 | 64GB | bin/start-small | conf/druid/single-server/small/* |
| Medium | 16 | 128GB | bin/start-medium | conf/druid/single-server/medium/* |
| Large | 32 | 256GB | bin/start-large | conf/druid/single-server/large/* |
| LargeX | 64 | 512GB | bin/start-xlarge | conf/druid/single-server/xlarge/* |
Environment Variable Configuration
vim /etc/profile
Write the following:
# druid
export DRUID_HOME=/opt/servers/apache-druid-30.0.0
export PATH=$PATH:$DRUID_HOME/bin
After refreshing environment variables, stop other services occupying ports (like ZooKeeper):
zkServer.sh stop
Then start Druid service:
bin/start-nano-quickstart
View Page
Access console:
http://h121.wzk.icu:8888/
Error Quick Reference
| Symptom | Root Cause | Location Method | Fix |
|---|---|---|---|
| 8888 cannot access | Port not opened or only listening on localhost | ss -lntp/firewall rules, no exception in logs | Open 8888/change binding address; confirm reverse proxy and security group |
| OOM/GC too high at startup | JVM heap too small, too many processing threads | OutOfMemoryError in var/sv/*/logs | Adjust -Xms/-Xmx in conf/**/jvm.config and processing threads |
| 2181 port occupied | Local ZK or other process already running | lsof -i :2181 | Stop occupying process or adjust Druid/ZK port and restart |
| Console “No datasource found” | Ingestion not completed or segments not loaded | Coordinator/Historical logs and Segments tab | Wait for task completion; check Deep Storage and Historical availability |
| Query slow/timeout | Time partition/filter miss, off-heap memory insufficient | Broker/Historical logs, query plan | Optimize time filter and dimension index; increase processing threads and memory buffer |
| Kafka stream ingestion failed | Topic unreachable or auth error | Indexing Service logs contain connection/auth errors | Verify bootstrap.servers and SASL/SSL config, independently test connectivity with kcat |
| Time range offset | Timezone/timestamp parsing inconsistency | Check timezone/format between ingestion and query | Unify ingestion timestampSpec and query timezone; do transform if needed |
| Permission/executable error | Executable bit/directory permission insufficient | ls -l bin/, system logs | chmod +x bin/*; ensure running user has read/write permission on install directory |
| ”Config not found”/parameter not taking effect | Not refreshed after modification or path wrong | echo $DRUID_HOME/which druid | Re-source /etc/profile, verify config directory matches actual startup script |
| Segments not assigned/query missing data | Historical offline or load unbalanced | Coordinator UI segment assignment status | Start/scale Historical; check storage reachability and replica balancing strategy |
Summary
Official recommendation for large systems is to use cluster deployment for fault tolerance and resource contention reduction. Single-machine deployment is suitable for development testing and quick verification scenarios. Production environments should use cluster deployment to ensure high availability and performance.