This article details Flink StateBackend three implementations and OperatorState management mechanism:

Core Content Includes:

  • Three StateBackends: MemoryStateBackend (state stored in JobManager memory), FsStateBackend (state in TM memory, Checkpoint stored in file system), RocksDBStateBackend (state stored in local RocksDB, supports super large state)
  • Using ManagedOperatorState: Access non-keyed state by implementing CheckpointedFunction or ListCheckpointed interface
  • State Redistribution Modes: Event-split redistribution (even distribution), Union redistribution (each operator gets all state)
  • Memory Data Structures: ListState corresponds to PartitionableListState (ArrayList), BroadcastState corresponds to HeapBroadcastState (HashMap)
  • Configuration Method: Per-job set through StreamExecutionEnvironment, default configured in flink-conf.yaml state.backend
  • Checkpoint Configuration: enableCheckpointing(), setCheckpointingMode(), setMinPauseBetweenCheckpoints() and other key parameters

Using ManagedOperatorState:

  • CheckpointedFunction interface description
  • ListCheckpointed interface description
  • Even-split redistribution mode
  • Union redistribution mode
  • StateBackend preservation mechanism (MemoryStateBackend, FsStateBackend, RocksDBStateBackend)
  • Configure StateBackend
  • Enable Checkpoint