Apache Druid Cluster Deployment [Part 1]: MySQL Metadata ...

TL;DR

Scenario: 2C4G/2C2G three-node mixed deployment, Druid 30.0.0, Kafka/HDFS/MySQL collaboration.
Conclusion: Can run on low config, but core is DirectMemory and processing.buffer convergence and infrastructure accessibility.
Output: Step-by-step configuration points, version matrix, common fault quick reference and location steps.

Version Matrix

Component / Config	Version / Parameter	Verified	Note
Apache Druid	30.0.0 ($DRUID_HOME)	Yes	Process division: master(coordinator+overlord), data(historical+middleManager), query(broker+router). Mixed deployment on h121/h122/h123.
Metadata Storage	MySQL (connector 8.0.19)	Yes	Connector placed in extensions/mysql-metadata-storage; druid.metadata.storage.* points to h122.
Deep Storage	HDFS (/druid/segments)	Yes	Depends on core-site default FS; production recommends using hdfs://host:port/ absolute path.
Indexing Logs	HDFS (/druid/indexing-logs)	Yes	Needs Hadoop config in _common to take effect.
ZooKeeper	h121,h122,h123:2181	Yes	druid.zk.paths.base=/druid, ensure ACL/network reachable.
Kafka Real-time Ingestion	Version not marked	Partial	MiddleManager can connect; recommend stress test before scaling task slots.
JDK	Not marked	Not verified	Druid 30 typically recommends Java 11/17; please confirm with production.
Coordinator/Overlord JVM	-Xms/-Xmx=512m	Yes	Low throughput usable; management plane prioritizes stability.
Historical JVM	-Xms/-Xmx=512m; MaxDirectMemory=1g	Yes	Pairs with buffer=50,000,000 convergence.
MiddleManager JVM	-Xms/-Xmx=128m	Yes	Demo only; increase when task volume grows.
processing.buffer.sizeBytes	50,000,000	Yes	Must satisfy MaxDirectMemory ≈ buffer×(numMergeBuffers+numThreads+1).

Overall Introduction

Apache Druid is a high-performance, distributed columnar storage database, specialized for real-time analysis and querying of large-scale datasets. It’s suitable for OLAP scenarios, especially performing excellently when processing large-scale real-time data streams. Druid’s architecture consists of several components: data ingestion, storage, query and management.

For cluster configuration, Druid typically consists of:

Data Ingestion Layer: Uses MiddleManager nodes to handle real-time data ingestion from different data sources (like Kafka, HDFS).
Storage Layer: Data is stored on Historical nodes, which manage older data and support efficient queries. Data is stored in columnar format, optimizing query performance.
Query Layer: Broker nodes act as query routers, receiving user query requests and distributing them to Historical or Real-time nodes, then aggregating results and returning to users.
Coordination Layer: Coordinator nodes manage cluster state and data allocation, ensuring even data distribution and automatic node failure handling.

Druid’s configuration files allow users to customize parameters like JVM settings, memory allocation and data sharding strategies for optimization based on different workloads and performance requirements. Additionally, Druid supports multiple query languages including SQL for flexible data analysis.

Cluster Planning

Cluster deployment distribution:

Master Node: Deploy Coordinator and Overlord processes
Data Node: Run Historical and MiddleManager processes
Query Node: Deploy Broker and Router processes

Actual Deployment:

Node	Config	Deployed Services
h121.wzk.icu	2C4G	ZooKeeper, Kafka, Druid
h122.wzk.icu	2C4G	ZooKeeper, Kafka, Druid, MySQL (built during Hive era)
h123.wzk.icu	2C2G	ZooKeeper, Druid

Environment Variables

vim /etc/profile

Write the following:

# druid
export DRUID_HOME=/opt/servers/apache-druid-30.0.0
export PATH=$PATH:$DRUID_HOME/bin

Configuration Files

Link Hadoop configuration files:

core-site.xml
hdfs-site.xml
yarn-site.xml
mapred-site.xml

Link above files to conf/druid/cluster/_common

Execute:

cd $DRUID_HOME/conf/druid/cluster/_common
ln -s $HADOOP_HOME/etc/hadoop/core-site.xml core-site.xml
ln -s $HADOOP_HOME/etc/hadoop/hdfs-site.xml hdfs-site.xml
ln -s $HADOOP_HOME/etc/hadoop/yarn-site.xml yarn-site.xml
ln -s $HADOOP_HOME/etc/hadoop/mapred-site.xml mapred-site.xml
ls

MySQL

Link MySQL driver to: $DRUID_HOME/extensions/mysql-metadata-storage

cd $DRUID_HOME/extensions/mysql-metadata-storage
cp $HIVE_HOME/lib/mysql-connector-java-8.0.19.jar mysql-connector-java-8.0.19.jar
ls

Modify Configuration

vim $DRUID_HOME/conf/druid/cluster/_common/common.runtime.properties

Modify the following:

# Add "mysql-metadata-storage"
druid.extensions.loadList=["mysql-metadata-storage", "druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches", "druid-multi-stage-query"]

# Write each machine's own IP or hostname
# Here is h121 node
druid.host=h121.wzk.icu

# Fill in zk address
druid.zk.service.host=h121.wzk.icu:2181,h122.wzk.icu:2181,h123.wzk.icu:2181
druid.zk.paths.base=/druid

# Comment out previous derby config
# Add mysql config
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://h122.wzk.icu:3306/druid
druid.metadata.storage.connector.user=hive
druid.metadata.storage.connector.password=hive@wzk.icu

# Comment out local config
# Add HDFS config, use HDFS as deep storage
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments

# Comment out indexer.logs local disk config
# Add indexer.logs for HDFS config
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs

coordinator-overlord

Parameter sizes adjust based on actual situation:

vim $DRUID_HOME/conf/druid/cluster/master/coordinator-overlord/jvm.config

Modify as follows:

-server
-Xms512m
-Xmx512m
-XX:+ExitOnOutOfMemoryError
-XX:+UseG1GC
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

historical

vim $DRUID_HOME/conf/druid/cluster/data/historical/jvm.config

Modify as follows:

-server
-Xms512m
-Xmx512m
-XX:MaxDirectMemorySize=1g
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

Also one parameter:

vim $DRUID_HOME/conf/druid/cluster/data/historical/runtime.properties

Modify as follows:

# Equivalent to 50MiB
druid.processing.buffer.sizeBytes=50000000

Note:

druid.processing.buffer.sizeBytes: Size of off-heap hash table for aggregation per query
maxDirectMemory = druid.processing.buffer.sizeBytes * (durid.processing.numMergeBuffers + druid.processing.numThreads + 1)
If druid.processing.buffer.sizeBytes is too large, need to increase maxDirectMemory, otherwise historical service cannot start

middleManager

vim $DRUID_HOME/conf/druid/cluster/data/middleManager/jvm.config

Config as follows (not modified):

-server
-Xms128m
-Xmx128m
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

Error Quick Reference

Symptom	Root Cause	Location	Fix
Historical exits on start/Direct buffer OOM	Buffer too large doesn’t match MaxDirectMemory	Check historical logs for “Cannot allocate memory / direct buffer” hint; verify runtime.properties vs jvm.config	Lower druid.processing.buffer.sizeBytes or increase -XX:MaxDirectMemorySize; converge per formula (buffer ×(merge+threads+1)).
”No suitable driver / MySQL driver not found”	Connector not in correct directory or naming wrong	Check $DRUID_HOME/extensions/mysql-metadata-storage	Ensure mysql-connector-java-8.0.19.jar exists and restart takes effect.
Deep storage write failure / “No FileSystem for scheme: hdfs”	Hadoop client dependency/config not effective	Check middleManager/historical logs; verify *-site.xml under _common	Ensure druid-hdfs-storage loaded; soft-link Hadoop config, add Hadoop client dependencies if needed.
Broker query 500 / “No servers found”	No available Historical/Realtime nodes or segments not loaded	Check Coordinator Segments/Rules in Web console; Broker logs	Start Historical/Realtime; confirm segments loaded and routing rules valid.
ZK connection timeout / ConnectionLoss	zk address/port wrong or network unreachable	zookeeper client zkCli direct connection test	Fix druid.zk.service.host; open 2181; ensure /druid exists.
MiddleManager task stuck/logs not falling to HDFS	HDFS directory permission/quota issue	hdfs dfs -ls/-chmod check directory; task log errors	Grant permission/pre-create directory; adjust fs.permissions.umask-mode if needed.
Time column offset 8 hours	JVM timezone parameter non-standard	Check timezone info in query service and task logs	Set -Duser.timezone to Asia/Shanghai (or GMT+08:00), verify ingestion spec timestampSpec.
HDFS path resolution exception	Using relative path but defaultFS not configured	Check fs.defaultFS in core-site.xml	Change storage directory to hdfs://host:port/… or add defaultFS.
Hostname unreachable / DNS failure	Using internal domain name not resolved	ping/host verify h121.wzk.icu etc	Write mapping to /etc/hosts or use IP; sync modify druid.host.

【To be continued in Part 2】