Big Data 182 - Elasticsearch Inverted Index Underlying Breakdown

1. Inverted Index Core Concepts

Inverted index consists of two parts:

Terms Dictionary: Collection of all Terms, sorted in dictionary order
Posting List: Document list corresponding to each Term

Forward: DocId → Document
Inverted: Term → [DocId1, DocId2, DocId3, ...]

2. Lucene Index File Structure

File Suffix	Meaning
.tim	Terms Dictionary, stores Term and Postings pointers
.tip	Terms Index, FST structure, stores Term prefix and Block positions
.doc	Stores DocId and term frequency

3. FST (Finite State Transducer)

FST is implementation structure of Term Index:

Pros: Can be fully loaded into memory, millisecond-level positioning
Principle: Uses directed acyclic graph to store Term prefix sharing

4. SkipList (Skip List)

Used to accelerate intersection calculation of multiple Posting Lists:

Time complexity reduced from O(N) to O(logN)
Reduces invalid comparisons

5. Query Flow

Quickly locate Term through Terms Index (FST)
Get Postings pointer from Terms Dictionary
Read Postings
Merge multiple Posting Lists using SkipList

6. Summary

Lucene segment immutable + flush/merge write model determines ES near real-time (NRT) characteristic: fast write, but queries need to merge results from multiple segments.