Big Data 181 - Elasticsearch Segment Merge & Disk Directory Breakdown
1. Background: Why Need Segment Merge
Elasticsearch writes are “append-type”, when indexing documents doesn’t directly update existing documents:
- New document written to memory Buffer
- When refresh, Buffer content written to new Segment
- Segments are Immutable
Refresh frequency default is once per second, meaning new Segment created every second. More segments, more problems:
- File handle occupation (each segment needs to open multiple files)
- Memory overhead (each segment has independent data structure)
- CPU consumption (query needs to merge results from multiple segments)
- Disk space (deleted documents not released, only marked)
Segment Merge is merging small segments into large ones in background, cleaning deleted documents.
2. Merge Policy Config
2.1 Core Parameters
| Parameter | Default | Meaning |
|---|---|---|
index.merge.policy.floor_segment | 2MB | Segments smaller than this are ignored, don’t participate in merge |
index.merge.policy.max_merge_at_once | 10 | Maximum segments participating in each merge |
index.merge.policy.max_merged_segment | 5GB | Maximum segment size after merge |
index.merge.policy.segments_per_tier | 10 | Minimum segments per tier (works with max_merge_at_once) |
2.2 Config Example
{
"settings": {
"index.merge.policy.max_merge_at_once": 20,
"index.merge.policy.max_merged_segment": "10gb"
}
}
3. Force Merge
Force merge used for read-only/archive scenarios, manually merge segments into fewer:
POST /my-index/_forcemerge?max_num_segments=1
Note:
- Will generate large I/O, recommended to execute during business low peak
- Cannot be rolled back after completion
- Read-only indexes should use this operation for archiving
4. ES Data Directory Structure
nodes/
└── 0/
└── indices/
└── {index-uuid}/
└── 0/
├── _state/
│ └── state-1.st
├── index/
│ ├── _0.cfe
│ ├── _0.cfs
│ └── _0.si
└── translog/
└── translog.ckp
4.1 Core File Types
| File Suffix | Meaning |
|---|---|
| .cfe | Compound file entry |
| .cfs | Compound file data |
| .si | Segment info |
| .doc | DocValues |
| .dim | BKD tree (geo coordinates) |
| .pos | Position info |
| .tip | Term Index |
5. Summary
- Segment merge executed automatically by background threads
- Reasonably config Merge Policy can balance write performance and query efficiency
- Force Merge only applicable to read-only indexes
- Understanding directory structure helps troubleshooting issues