Big Data 191 - Elasticsearch Cluster Planning & Tuning

TL;DR

Scenario: From 2-4 node business search to ELK/OLAP large clusters, role division, capacity calculation and performance tuning
Conclusion: First define roles and capacity boundaries, then shard/replica and write strategy, finally close the loop with search and ops metrics
Output: Reusable cluster planning template + shard/replica rules + write/search tuning leverage + common troubleshooting

Version Matrix

Version	Applicability Notes	Notes
Elasticsearch 7.x	Role planning, shard/replica and write/search tuning approach applicable	Node roles recommended via `node.roles`; mapping snippets need 7.x syntax adjustment
Elasticsearch 8.x	Planning and tuning methodology equally applicable	Security and certificates default to “stronger constraints”, cross-node and client access need synchronous validation; node roles primarily via `node.roles`

Elasticsearch Cluster Mode

Elasticsearch is a distributed search engine. Cluster mode refers to a cluster (cluster) composed of multiple nodes, working together to provide higher performance, scalability and fault tolerance. Each cluster has a unique name, and all nodes belonging to the same cluster use the same cluster name to identify each other.

1. Master Node

Core Responsibilities:

Manage cluster-level metadata operations including index creation/deletion, shard allocation
Monitor cluster status, track node joins or leaves
Handle cluster-wide configuration changes

Characteristics:

Typically doesn’t participate in actual data search and storage operations
For high availability, recommend configuring 3 master nodes (odd number) to form master node cluster
Example: When creating a new index, client request is first handled by master node, which decides the new index’s shard allocation scheme

2. Data Node

Core Functions:

Store actual shards of index data
Handle data-related CRUD operations (Create, Read, Update, Delete)
Execute computationally intensive tasks like search and aggregation

Important Characteristics:

Store multiple copies of data ensuring high data availability
Require strong CPU, memory and storage resources
Application scenario: E-commerce website search service typically deploys multiple data nodes to handle large product data storage and retrieval

3. Coordinating Node

Workflow:

Receive client requests
Route requests to relevant data nodes
Collect responses from each node
Merge results and return to client

Characteristics:

Doesn’t store data, doesn’t participate in master node election
By default all nodes have coordinating functionality
Dedicated coordinating nodes typically used in large clusters to reduce request processing burden on other nodes
Example: In distributed log analysis system, coordinating node distributes search queries to data nodes storing logs

Note: In actual deployment, a node can assume multiple roles (e.g., both data node and coordinating node), but production environments usually recommend separating master nodes from data nodes to ensure cluster stability.

Cluster Planning

Importance of Planning

When deploying an Elasticsearch cluster, planning is a critical foundation step that directly affects cluster performance, future scalability, and daily operations. Reasonable planning avoids frequent architecture adjustments later, ensuring stable and efficient cluster operation.

Core Planning Elements

1. Master Node Planning

Quantity: Recommend configuring at least three dedicated master nodes (master-only nodes) to form a quorum, ensuring high availability. When one node fails, remaining two nodes can still normally elect a new master node.
Resource Allocation: Since master nodes don’t handle data storage and search requests, can configure relatively smaller resources (e.g., 2-4 core CPU, 4-8GB memory).
Deployment Location: Should distribute master nodes across different racks or Availability Zones (AZ) to prevent single point of failure. For example in AWS, can deploy three master nodes across us-east-1a, us-east-1b and us-east-1c.

2. Data Node Planning

Resource Requirements:
- Memory: Recommend 32GB+ memory per node, half (16GB) for JVM heap, remaining for filesystem cache.
- Storage: Determine based on data volume and growth expectation, recommend SSD storage for better IO performance. For example, business expecting 10TB annual data growth can configure 3 nodes × 2TB SSD.
Shard Strategy: Each node should carry no more than 20-25 shards, each shard size controlled at 30-50GB. For example, 10TB data can be divided into 200 × 50GB shards, needing 8-10 data nodes.
Scalability: Use horizontal scaling strategy, initially configure per minimum requirements, then add nodes as data grows.

3. Coordinating Node Planning

Function Separation: Unload computationally intensive operations like aggregation and sorting from data nodes to coordinating nodes, reducing data node burden.
Resource Requirements: Need stronger CPU resources (8-16 cores) for request routing and result aggregation, moderate memory (16-32GB).
Deployment Scenario: Especially suitable for scenarios with large search query volume (e.g., millions of queries daily) or many complex aggregation queries.
Load Balancing: Use with load balancers (e.g., Nginx, HAProxy) for even request distribution and fault failover.

Planning Considerations

What we need to consider in business:

How much is the current data volume? How is data growth?
What are your machine specifications? CPU? Memory? Disk capacity?

Calculation Basis

Elasticsearch JVM Heap can be set to maximum 32GB
30GB heap can handle approximately 10TB data
If memory is very large, e.g., 128GB, consider deploying multiple Elasticsearch instances on one machine

Application Scenarios

Used to build business search function modules, mostly vertical domain search. Data volume tens of millions to billions, generally 2-4 machine scale
Used for large-scale real-time OLAP (Online Processing Analysis), classic like ELK Stack, data scale can reach hundreds of billions or more, dozens to hundreds of nodes

Role Assignment

Node roles:

MasterNode: node.master set to true, node can act as master node
DataNode: node.data set to true, default is data node
CoordinateNode: A node that only functions as receiving requests, forwarding requests to other nodes, aggregating data returned from various nodes is called coordinating node. (If only serving as coordinating node, need to set MasterNode and DataNode to false, a node can serve one role or multiple roles)

How to assign roles:

Small-scale clusters, no need for strict separation
Medium-large scale clusters (10+ nodes), should consider dedicated roles. Especially when concurrent queries are large and query merge volume is large, can add independent coordinating nodes. The benefit of separating roles is divided labor, no mutual interference.

Setting Shards

Note: Number of shards cannot be changed after setting, unless index is recreated.
Shard setting reference principles: Elasticsearch recommends max JVM heap space: 30-32GB, so limit max shard capacity to 30GB, then make reasonable estimates for shard count. At the start, a good approach is creating shards based on 1.5-3x node count. For example: if you have three nodes, recommended shard count not exceeding 9 (3*3). When performance starts to degrade, add nodes, ES will balance shard placement. For date-based index needs with very few search scenarios on index data, perhaps hundreds or thousands of indexes, but each index size only 1GB or less. For this scenario, recommend only 1 shard per index.

Setting Replicas

For high availability, set replica to 2. But requires at least 3 nodes in cluster to store primary shards and replicas separately.
If concurrent load is high and query performance degrades, increase replica count to improve concurrent query capability.
Note: When adding replicas, master node will automatically coordinate and copy data to new replica nodes, replica count can be adjusted anytime.

PUT /wzk_temp_index/_settings
{
  "number_of_replicas": 1
}

Cluster Tuning

Index Tuning

Replica 0

If cluster is writing data for first time, can set to 0 first, then set to 2 (e.g., for high availability) after writing completes per business requirements, saving cluster overhead during index creation.

PUT /wzk_temp_index/_settings
{
  "number_of_replicas": 0
}

Auto-generated ID

Auto-generate DocID. From Elasticsearch write flow, if external ID is provided when writing Doc, Elasticsearch will first try to read original Doc’s version number to determine if data needs update. This involves disk operations. Auto-generating DocID can avoid this issue.

Mappings

Set mappings reasonably:

Set fields that don’t need indexing to not_analyzed or no. Not analyzing or indexing fields can reduce a lot of CPU computation. Especially Binary type, CPU is high by default, and such analysis is meaningless.
Reduce field content length. If large content in original data doesn’t need full indexing, try to reduce unnecessary content.
Use different analyzers. Different tokenizers have varying computational complexity during indexing.

Source Field

Adjust _source field. _source field stores original doc data. For data that doesn’t need storage, can filter via index excludes, or disable source. Generally used for index and data separation, can reduce I/O pressure, but most business scenarios won’t disable Source.

Disable norms

norms are used to calculate doc scoring in search. If scoring is not needed, can disable it.

"title": {
  "type": "string",
  "norms": {
    "enabled": false
  }
}

Refresh Interval

Adjust index refresh interval. Default is 1 second, forcing ES to create a new Segment every second to ensure newly written data is near real-time visible and searchable. For example, if adjusted to 30s, reduces refresh frequency, freeing refresh operation resources for index operations.

PUT /wzk_temp_index/_settings
{
  "index" : {
    "refresh_interval": "30s"
  }
}

This approach sacrifices visibility to improve index operation performance.

Batch Processing

Batch processing merges multiple index operation requests into one batch for processing, similar to MySQL JDBC Batch. For example, 10000 documents per batch is a good performance size. How many documents per batch is appropriate depends on many factors.

Search Tuning

After storing over 1 billion documents, search tuning is needed.

Index Grouping by Time

ES used for log storage, log index management is generally date-based. Indexes are created based on day, week, month, year. When searching single day data, only need to query one index’s Shards. When searching multiple days, need to query multiple indexes’ Shards. This approach is similar to database sharding/partitioning - search small data range instead of finding needle in haystack.

Filter

Use Filter instead of Query During search, using Query requires scoring document relevance. Using Filter has no scoring, does less work, and theoretically Filter is faster. If search doesn’t need scoring, use Filter directly. If some searches need scoring, recommend using Bool query, which can combine scored and unscored:

GET /_search
{
  "query": {
    "bool": {
      "must": {
        "term": {
          "user": "kimchy"
        }
      },
      "filter": {
        "term": {
          "tag": "tech"
        }
      }
    }
  }
}

ID Field

Defining ID field as Keyword type is a common optimization practice. This definition is especially suitable for:

Usage Scenarios:
- When ID is mainly used for exact match queries (like terms queries)
- When ID won’t be used for range queries or numeric calculations
- Typical application scenarios: user ID, order number, product SKU and other identifier fields
Performance Optimization Principle:
- Keyword type is internally optimized to inverted index structure in Elasticsearch, especially suitable for exact match queries
- Compared to numeric types (like Integer), Keyword type avoids overhead for range query optimization
- In terms query scenarios, Keyword type has higher query efficiency than numeric types
Actual Case Data:
- After an e-commerce platform changed product ID from Integer to Keyword type:
  - Query response time reduced from average 120ms to 85ms
  - System throughput improved by ~30%
  - Index size reduced by 15% (because no need to store numeric type metadata)
Implementation Suggestion:

{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      }
    }
  }
}

Notes:
- This optimization mainly applies to string-form IDs or numeric IDs that don’t need numeric operations
- If range query functionality is indeed needed, recommend keeping numeric type
- For fields with both exact match and range query needs, can consider using multi-fields to define both types

Error Quick Reference

Symptom	Cause	Diagnosis	Fix
Cluster won’t start / repeated master election (master not discovered)	Insufficient master nodes, discovery config error, network/firewall, clock drift	master logs, /_cluster/health, /_cat/nodes?v, discovery config and port connectivity	Ensure ≥3 dedicated master nodes across fault domains; validate discovery/seed config and ports; fix network and time sync
Shards permanently UNASSIGNED	Insufficient nodes, disk watermark, allocation filter, version incompatibility, recovery restricted	/_cat/shards?v, /_cluster/allocation/explain, disk watermark and routing rules	Expand or free disk; correct allocation filter and routing limits; adjust watermark threshold (under controllable risk) and trigger reallocation
Low write throughput / ingest stuck	Too frequent refresh, high merge pressure, disk IO bottleneck, too many mapping fields	segment/merge metrics, node IO, write latency and thread pools	During write period increase refresh_interval; batch merge requests; reduce unnecessary field indexing and analysis; upgrade SSD/improve IO
CPU spikes (especially during write)	Complex analyzer, field explosion, uncontrolled dynamic mapping	hot threads, mapping field count, indexing rate and CPU	Constrain fields and dynamic templates; choose appropriate analyzer; for fields not needing search set index:false
Memory alert / circuit breaker	Heap pressure, heavy aggregation/sorting, oversized requests	breaker logs, GC, slow queries and large aggregation requests	Split queries, limit aggregation dimensions and size; introduce dedicated coordinating nodes; control concurrency and request size
Slow queries (multi-day logs/multi-index)	Search scope too large, no filtering, too many shards causing fan-out	query profile, slow logs, hit index count and shard count	Index by time and filter by time first; use filter to reduce scoring; control shard count and size
Search thread pool rejected / 429	Too high concurrency, uneven pressure on coordinating/data nodes	thread pool metrics, reject counts, node load distribution	Add coordinating nodes and LB; reduce concurrency/rate limit; optimize queries and replicas to bear concurrency
Cluster unstable (jitter/frequent resharding)	Master and data mixed, roles not isolated, resource contention	master logs, node roles and resource utilization	Dedicated master nodes; add coordinating nodes for large clusters; isolate JVM/CPU/disk resources
”Set large heap but slower”	Heap exceeds compressed pointer threshold causing performance regression	JVM parameters and GC, startup logs	Control heap at 30-32GB; remaining memory for OS cache; if needed expand with multiple instances/nodes