TL;DR
- Scenario: 2C4G/2C2G three-node mixed deployment, Druid 30.0.0, Kafka/HDFS/MySQL collaboration.
- Conclusion: Can run on low config, but core is DirectMemory and processing.buffer convergence and infrastructure accessibility.
- Output: Step-by-step configuration points, version matrix, common fault quick reference and location steps.
Version Matrix
| Component / Config | Version / Parameter | Verified | Note |
|---|---|---|---|
| Apache Druid | 30.0.0 ($DRUID_HOME) | Yes | Process division: master(coordinator+overlord), data(historical+middleManager), query(broker+router). Mixed deployment on h121/h122/h123. |
| Metadata Storage | MySQL (connector 8.0.19) | Yes | Connector placed in extensions/mysql-metadata-storage; druid.metadata.storage.* points to h122. |
| Deep Storage | HDFS (/druid/segments) | Yes | Depends on core-site default FS; production recommends using hdfs://host:port/ absolute path. |
| Indexing Logs | HDFS (/druid/indexing-logs) | Yes | Needs Hadoop config in _common to take effect. |
| ZooKeeper | h121,h122,h123:2181 | Yes | druid.zk.paths.base=/druid, ensure ACL/network reachable. |
| Kafka Real-time Ingestion | Version not marked | Partial | MiddleManager can connect; recommend stress test before scaling task slots. |
| JDK | Not marked | Not verified | Druid 30 typically recommends Java 11/17; please confirm with production. |
| Coordinator/Overlord JVM | -Xms/-Xmx=512m | Yes | Low throughput usable; management plane prioritizes stability. |
| Historical JVM | -Xms/-Xmx=512m; MaxDirectMemory=1g | Yes | Pairs with buffer=50,000,000 convergence. |
| MiddleManager JVM | -Xms/-Xmx=128m | Yes | Demo only; increase when task volume grows. |
| processing.buffer.sizeBytes | 50,000,000 | Yes | Must satisfy MaxDirectMemory ≈ buffer×(numMergeBuffers+numThreads+1). |
Overall Introduction
Apache Druid is a high-performance, distributed columnar storage database, specialized for real-time analysis and querying of large-scale datasets. It’s suitable for OLAP scenarios, especially performing excellently when processing large-scale real-time data streams. Druid’s architecture consists of several components: data ingestion, storage, query and management.
For cluster configuration, Druid typically consists of:
- Data Ingestion Layer: Uses MiddleManager nodes to handle real-time data ingestion from different data sources (like Kafka, HDFS).
- Storage Layer: Data is stored on Historical nodes, which manage older data and support efficient queries. Data is stored in columnar format, optimizing query performance.
- Query Layer: Broker nodes act as query routers, receiving user query requests and distributing them to Historical or Real-time nodes, then aggregating results and returning to users.
- Coordination Layer: Coordinator nodes manage cluster state and data allocation, ensuring even data distribution and automatic node failure handling.
Druid’s configuration files allow users to customize parameters like JVM settings, memory allocation and data sharding strategies for optimization based on different workloads and performance requirements. Additionally, Druid supports multiple query languages including SQL for flexible data analysis.
Cluster Planning
Cluster deployment distribution:
- Master Node: Deploy Coordinator and Overlord processes
- Data Node: Run Historical and MiddleManager processes
- Query Node: Deploy Broker and Router processes
Actual Deployment:
| Node | Config | Deployed Services |
|---|---|---|
| h121.wzk.icu | 2C4G | ZooKeeper, Kafka, Druid |
| h122.wzk.icu | 2C4G | ZooKeeper, Kafka, Druid, MySQL (built during Hive era) |
| h123.wzk.icu | 2C2G | ZooKeeper, Druid |
Environment Variables
vim /etc/profile
Write the following:
# druid
export DRUID_HOME=/opt/servers/apache-druid-30.0.0
export PATH=$PATH:$DRUID_HOME/bin
Configuration Files
Link Hadoop configuration files:
- core-site.xml
- hdfs-site.xml
- yarn-site.xml
- mapred-site.xml
Link above files to conf/druid/cluster/_common
Execute:
cd $DRUID_HOME/conf/druid/cluster/_common
ln -s $HADOOP_HOME/etc/hadoop/core-site.xml core-site.xml
ln -s $HADOOP_HOME/etc/hadoop/hdfs-site.xml hdfs-site.xml
ln -s $HADOOP_HOME/etc/hadoop/yarn-site.xml yarn-site.xml
ln -s $HADOOP_HOME/etc/hadoop/mapred-site.xml mapred-site.xml
ls
MySQL
Link MySQL driver to: $DRUID_HOME/extensions/mysql-metadata-storage
cd $DRUID_HOME/extensions/mysql-metadata-storage
cp $HIVE_HOME/lib/mysql-connector-java-8.0.19.jar mysql-connector-java-8.0.19.jar
ls
Modify Configuration
vim $DRUID_HOME/conf/druid/cluster/_common/common.runtime.properties
Modify the following:
# Add "mysql-metadata-storage"
druid.extensions.loadList=["mysql-metadata-storage", "druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches", "druid-multi-stage-query"]
# Write each machine's own IP or hostname
# Here is h121 node
druid.host=h121.wzk.icu
# Fill in zk address
druid.zk.service.host=h121.wzk.icu:2181,h122.wzk.icu:2181,h123.wzk.icu:2181
druid.zk.paths.base=/druid
# Comment out previous derby config
# Add mysql config
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://h122.wzk.icu:3306/druid
druid.metadata.storage.connector.user=hive
druid.metadata.storage.connector.password=hive@wzk.icu
# Comment out local config
# Add HDFS config, use HDFS as deep storage
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments
# Comment out indexer.logs local disk config
# Add indexer.logs for HDFS config
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs
coordinator-overlord
Parameter sizes adjust based on actual situation:
vim $DRUID_HOME/conf/druid/cluster/master/coordinator-overlord/jvm.config
Modify as follows:
-server
-Xms512m
-Xmx512m
-XX:+ExitOnOutOfMemoryError
-XX:+UseG1GC
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
historical
vim $DRUID_HOME/conf/druid/cluster/data/historical/jvm.config
Modify as follows:
-server
-Xms512m
-Xmx512m
-XX:MaxDirectMemorySize=1g
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
Also one parameter:
vim $DRUID_HOME/conf/druid/cluster/data/historical/runtime.properties
Modify as follows:
# Equivalent to 50MiB
druid.processing.buffer.sizeBytes=50000000
Note:
- druid.processing.buffer.sizeBytes: Size of off-heap hash table for aggregation per query
- maxDirectMemory = druid.processing.buffer.sizeBytes * (durid.processing.numMergeBuffers + druid.processing.numThreads + 1)
- If druid.processing.buffer.sizeBytes is too large, need to increase maxDirectMemory, otherwise historical service cannot start
middleManager
vim $DRUID_HOME/conf/druid/cluster/data/middleManager/jvm.config
Config as follows (not modified):
-server
-Xms128m
-Xmx128m
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
Error Quick Reference
| Symptom | Root Cause | Location | Fix |
|---|---|---|---|
| Historical exits on start/Direct buffer OOM | Buffer too large doesn’t match MaxDirectMemory | Check historical logs for “Cannot allocate memory / direct buffer” hint; verify runtime.properties vs jvm.config | Lower druid.processing.buffer.sizeBytes or increase -XX:MaxDirectMemorySize; converge per formula (buffer ×(merge+threads+1)). |
| ”No suitable driver / MySQL driver not found” | Connector not in correct directory or naming wrong | Check $DRUID_HOME/extensions/mysql-metadata-storage | Ensure mysql-connector-java-8.0.19.jar exists and restart takes effect. |
| Deep storage write failure / “No FileSystem for scheme: hdfs” | Hadoop client dependency/config not effective | Check middleManager/historical logs; verify *-site.xml under _common | Ensure druid-hdfs-storage loaded; soft-link Hadoop config, add Hadoop client dependencies if needed. |
| Broker query 500 / “No servers found” | No available Historical/Realtime nodes or segments not loaded | Check Coordinator Segments/Rules in Web console; Broker logs | Start Historical/Realtime; confirm segments loaded and routing rules valid. |
| ZK connection timeout / ConnectionLoss | zk address/port wrong or network unreachable | zookeeper client zkCli direct connection test | Fix druid.zk.service.host; open 2181; ensure /druid exists. |
| MiddleManager task stuck/logs not falling to HDFS | HDFS directory permission/quota issue | hdfs dfs -ls/-chmod check directory; task log errors | Grant permission/pre-create directory; adjust fs.permissions.umask-mode if needed. |
| Time column offset 8 hours | JVM timezone parameter non-standard | Check timezone info in query service and task logs | Set -Duser.timezone to Asia/Shanghai (or GMT+08:00), verify ingestion spec timestampSpec. |
| HDFS path resolution exception | Using relative path but defaultFS not configured | Check fs.defaultFS in core-site.xml | Change storage directory to hdfs://host:port/… or add defaultFS. |
| Hostname unreachable / DNS failure | Using internal domain name not resolved | ping/host verify h121.wzk.icu etc | Write mapping to /etc/hosts or use IP; sync modify druid.host. |
【To be continued in Part 2】