Big Data 191 - Elasticsearch Cluster Planning & Tuning
TL;DR
- Scenario: From 2-4 node business search to ELK/OLAP large clusters, role division, capacity calculation and performance tuning
- Conclusion: First define roles and capacity boundaries, then shard/replica and write strategy, finally close the loop with search and ops metrics
- Output: Reusable cluster planning template + shard/replica rules + write/search tuning leverage + common troubleshooting
Version Matrix
| Version | Applicability Notes | Notes |
|---|---|---|
| Elasticsearch 7.x | Role planning, shard/replica and write/search tuning approach applicable | Node roles recommended via node.roles; mapping snippets need 7.x syntax adjustment |
| Elasticsearch 8.x | Planning and tuning methodology equally applicable | Security and certificates default to “stronger constraints”, cross-node and client access need synchronous validation; node roles primarily via node.roles |
Elasticsearch Cluster Mode
Elasticsearch is a distributed search engine. Cluster mode refers to a cluster (cluster) composed of multiple nodes, working together to provide higher performance, scalability and fault tolerance. Each cluster has a unique name, and all nodes belonging to the same cluster use the same cluster name to identify each other.
1. Master Node
Core Responsibilities:
- Manage cluster-level metadata operations including index creation/deletion, shard allocation
- Monitor cluster status, track node joins or leaves
- Handle cluster-wide configuration changes
Characteristics:
- Typically doesn’t participate in actual data search and storage operations
- For high availability, recommend configuring 3 master nodes (odd number) to form master node cluster
- Example: When creating a new index, client request is first handled by master node, which decides the new index’s shard allocation scheme
2. Data Node
Core Functions:
- Store actual shards of index data
- Handle data-related CRUD operations (Create, Read, Update, Delete)
- Execute computationally intensive tasks like search and aggregation
Important Characteristics:
- Store multiple copies of data ensuring high data availability
- Require strong CPU, memory and storage resources
- Application scenario: E-commerce website search service typically deploys multiple data nodes to handle large product data storage and retrieval
3. Coordinating Node
Workflow:
- Receive client requests
- Route requests to relevant data nodes
- Collect responses from each node
- Merge results and return to client
Characteristics:
- Doesn’t store data, doesn’t participate in master node election
- By default all nodes have coordinating functionality
- Dedicated coordinating nodes typically used in large clusters to reduce request processing burden on other nodes
- Example: In distributed log analysis system, coordinating node distributes search queries to data nodes storing logs
Note: In actual deployment, a node can assume multiple roles (e.g., both data node and coordinating node), but production environments usually recommend separating master nodes from data nodes to ensure cluster stability.
Cluster Planning
Importance of Planning
When deploying an Elasticsearch cluster, planning is a critical foundation step that directly affects cluster performance, future scalability, and daily operations. Reasonable planning avoids frequent architecture adjustments later, ensuring stable and efficient cluster operation.
Core Planning Elements
1. Master Node Planning
- Quantity: Recommend configuring at least three dedicated master nodes (master-only nodes) to form a quorum, ensuring high availability. When one node fails, remaining two nodes can still normally elect a new master node.
- Resource Allocation: Since master nodes don’t handle data storage and search requests, can configure relatively smaller resources (e.g., 2-4 core CPU, 4-8GB memory).
- Deployment Location: Should distribute master nodes across different racks or Availability Zones (AZ) to prevent single point of failure. For example in AWS, can deploy three master nodes across us-east-1a, us-east-1b and us-east-1c.
2. Data Node Planning
- Resource Requirements:
- Memory: Recommend 32GB+ memory per node, half (16GB) for JVM heap, remaining for filesystem cache.
- Storage: Determine based on data volume and growth expectation, recommend SSD storage for better IO performance. For example, business expecting 10TB annual data growth can configure 3 nodes × 2TB SSD.
- Shard Strategy: Each node should carry no more than 20-25 shards, each shard size controlled at 30-50GB. For example, 10TB data can be divided into 200 × 50GB shards, needing 8-10 data nodes.
- Scalability: Use horizontal scaling strategy, initially configure per minimum requirements, then add nodes as data grows.
3. Coordinating Node Planning
- Function Separation: Unload computationally intensive operations like aggregation and sorting from data nodes to coordinating nodes, reducing data node burden.
- Resource Requirements: Need stronger CPU resources (8-16 cores) for request routing and result aggregation, moderate memory (16-32GB).
- Deployment Scenario: Especially suitable for scenarios with large search query volume (e.g., millions of queries daily) or many complex aggregation queries.
- Load Balancing: Use with load balancers (e.g., Nginx, HAProxy) for even request distribution and fault failover.
Planning Considerations
What we need to consider in business:
- How much is the current data volume? How is data growth?
- What are your machine specifications? CPU? Memory? Disk capacity?
Calculation Basis
- Elasticsearch JVM Heap can be set to maximum 32GB
- 30GB heap can handle approximately 10TB data
- If memory is very large, e.g., 128GB, consider deploying multiple Elasticsearch instances on one machine
Application Scenarios
- Used to build business search function modules, mostly vertical domain search. Data volume tens of millions to billions, generally 2-4 machine scale
- Used for large-scale real-time OLAP (Online Processing Analysis), classic like ELK Stack, data scale can reach hundreds of billions or more, dozens to hundreds of nodes
Role Assignment
Node roles:
- MasterNode: node.master set to true, node can act as master node
- DataNode: node.data set to true, default is data node
- CoordinateNode: A node that only functions as receiving requests, forwarding requests to other nodes, aggregating data returned from various nodes is called coordinating node. (If only serving as coordinating node, need to set MasterNode and DataNode to false, a node can serve one role or multiple roles)
How to assign roles:
- Small-scale clusters, no need for strict separation
- Medium-large scale clusters (10+ nodes), should consider dedicated roles. Especially when concurrent queries are large and query merge volume is large, can add independent coordinating nodes. The benefit of separating roles is divided labor, no mutual interference.
Setting Shards
- Note: Number of shards cannot be changed after setting, unless index is recreated.
- Shard setting reference principles: Elasticsearch recommends max JVM heap space: 30-32GB, so limit max shard capacity to 30GB, then make reasonable estimates for shard count. At the start, a good approach is creating shards based on 1.5-3x node count. For example: if you have three nodes, recommended shard count not exceeding 9 (3*3). When performance starts to degrade, add nodes, ES will balance shard placement. For date-based index needs with very few search scenarios on index data, perhaps hundreds or thousands of indexes, but each index size only 1GB or less. For this scenario, recommend only 1 shard per index.
Setting Replicas
- For high availability, set replica to 2. But requires at least 3 nodes in cluster to store primary shards and replicas separately.
- If concurrent load is high and query performance degrades, increase replica count to improve concurrent query capability.
- Note: When adding replicas, master node will automatically coordinate and copy data to new replica nodes, replica count can be adjusted anytime.
PUT /wzk_temp_index/_settings
{
"number_of_replicas": 1
}
Cluster Tuning
Index Tuning
Replica 0
If cluster is writing data for first time, can set to 0 first, then set to 2 (e.g., for high availability) after writing completes per business requirements, saving cluster overhead during index creation.
PUT /wzk_temp_index/_settings
{
"number_of_replicas": 0
}
Auto-generated ID
Auto-generate DocID. From Elasticsearch write flow, if external ID is provided when writing Doc, Elasticsearch will first try to read original Doc’s version number to determine if data needs update. This involves disk operations. Auto-generating DocID can avoid this issue.
Mappings
Set mappings reasonably:
- Set fields that don’t need indexing to not_analyzed or no. Not analyzing or indexing fields can reduce a lot of CPU computation. Especially Binary type, CPU is high by default, and such analysis is meaningless.
- Reduce field content length. If large content in original data doesn’t need full indexing, try to reduce unnecessary content.
- Use different analyzers. Different tokenizers have varying computational complexity during indexing.
Source Field
Adjust _source field. _source field stores original doc data. For data that doesn’t need storage, can filter via index excludes, or disable source. Generally used for index and data separation, can reduce I/O pressure, but most business scenarios won’t disable Source.
Disable norms
norms are used to calculate doc scoring in search. If scoring is not needed, can disable it.
"title": {
"type": "string",
"norms": {
"enabled": false
}
}
Refresh Interval
Adjust index refresh interval. Default is 1 second, forcing ES to create a new Segment every second to ensure newly written data is near real-time visible and searchable. For example, if adjusted to 30s, reduces refresh frequency, freeing refresh operation resources for index operations.
PUT /wzk_temp_index/_settings
{
"index" : {
"refresh_interval": "30s"
}
}
This approach sacrifices visibility to improve index operation performance.
Batch Processing
Batch processing merges multiple index operation requests into one batch for processing, similar to MySQL JDBC Batch. For example, 10000 documents per batch is a good performance size. How many documents per batch is appropriate depends on many factors.
Search Tuning
After storing over 1 billion documents, search tuning is needed.
Index Grouping by Time
ES used for log storage, log index management is generally date-based. Indexes are created based on day, week, month, year. When searching single day data, only need to query one index’s Shards. When searching multiple days, need to query multiple indexes’ Shards. This approach is similar to database sharding/partitioning - search small data range instead of finding needle in haystack.
Filter
Use Filter instead of Query During search, using Query requires scoring document relevance. Using Filter has no scoring, does less work, and theoretically Filter is faster. If search doesn’t need scoring, use Filter directly. If some searches need scoring, recommend using Bool query, which can combine scored and unscored:
GET /_search
{
"query": {
"bool": {
"must": {
"term": {
"user": "kimchy"
}
},
"filter": {
"term": {
"tag": "tech"
}
}
}
}
}
ID Field
Defining ID field as Keyword type is a common optimization practice. This definition is especially suitable for:
-
Usage Scenarios:
- When ID is mainly used for exact match queries (like terms queries)
- When ID won’t be used for range queries or numeric calculations
- Typical application scenarios: user ID, order number, product SKU and other identifier fields
-
Performance Optimization Principle:
- Keyword type is internally optimized to inverted index structure in Elasticsearch, especially suitable for exact match queries
- Compared to numeric types (like Integer), Keyword type avoids overhead for range query optimization
- In terms query scenarios, Keyword type has higher query efficiency than numeric types
-
Actual Case Data:
- After an e-commerce platform changed product ID from Integer to Keyword type:
- Query response time reduced from average 120ms to 85ms
- System throughput improved by ~30%
- Index size reduced by 15% (because no need to store numeric type metadata)
- After an e-commerce platform changed product ID from Integer to Keyword type:
-
Implementation Suggestion:
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
}
}
}
}
- Notes:
- This optimization mainly applies to string-form IDs or numeric IDs that don’t need numeric operations
- If range query functionality is indeed needed, recommend keeping numeric type
- For fields with both exact match and range query needs, can consider using multi-fields to define both types
Error Quick Reference
| Symptom | Cause | Diagnosis | Fix |
|---|---|---|---|
| Cluster won’t start / repeated master election (master not discovered) | Insufficient master nodes, discovery config error, network/firewall, clock drift | master logs, /_cluster/health, /_cat/nodes?v, discovery config and port connectivity | Ensure ≥3 dedicated master nodes across fault domains; validate discovery/seed config and ports; fix network and time sync |
| Shards permanently UNASSIGNED | Insufficient nodes, disk watermark, allocation filter, version incompatibility, recovery restricted | /_cat/shards?v, /_cluster/allocation/explain, disk watermark and routing rules | Expand or free disk; correct allocation filter and routing limits; adjust watermark threshold (under controllable risk) and trigger reallocation |
| Low write throughput / ingest stuck | Too frequent refresh, high merge pressure, disk IO bottleneck, too many mapping fields | segment/merge metrics, node IO, write latency and thread pools | During write period increase refresh_interval; batch merge requests; reduce unnecessary field indexing and analysis; upgrade SSD/improve IO |
| CPU spikes (especially during write) | Complex analyzer, field explosion, uncontrolled dynamic mapping | hot threads, mapping field count, indexing rate and CPU | Constrain fields and dynamic templates; choose appropriate analyzer; for fields not needing search set index:false |
| Memory alert / circuit breaker | Heap pressure, heavy aggregation/sorting, oversized requests | breaker logs, GC, slow queries and large aggregation requests | Split queries, limit aggregation dimensions and size; introduce dedicated coordinating nodes; control concurrency and request size |
| Slow queries (multi-day logs/multi-index) | Search scope too large, no filtering, too many shards causing fan-out | query profile, slow logs, hit index count and shard count | Index by time and filter by time first; use filter to reduce scoring; control shard count and size |
| Search thread pool rejected / 429 | Too high concurrency, uneven pressure on coordinating/data nodes | thread pool metrics, reject counts, node load distribution | Add coordinating nodes and LB; reduce concurrency/rate limit; optimize queries and replicas to bear concurrency |
| Cluster unstable (jitter/frequent resharding) | Master and data mixed, roles not isolated, resource contention | master logs, node roles and resource utilization | Dedicated master nodes; add coordinating nodes for large clusters; isolate JVM/CPU/disk resources |
| ”Set large heap but slower” | Heap exceeds compressed pointer threshold causing performance regression | JVM parameters and GC, startup logs | Control heap at 30-32GB; remaining memory for OS cache; if needed expand with multiple instances/nodes |