Flink Time Semantics In Detail
In Flink’s stream processing, different concepts of time are involved:
EventTime: Time when events occur, for example, time when user clicks a link on a website, each log records its own generation time. If EventTime is used as the baseline to define time windows, EventTimeWindow will be formed, requiring messages to carry EventTime themselves.
IngestionTime: Time when data enters Flink, for example, when Source consumes data from Kafka. If IngestionTime is used as the baseline to define time windows, IngestingTimeWindow will be formed, based on Source’s SystemTime.
ProcessingTime: Time when a certain Flink node executes a certain Operation, for example, system time when TimeWindow processes data. Default time attribute is ProcessingTime. If ProcessingTime is used as the baseline to define time windows, ProcessingTimeWindow will be formed, based on Operator’s SystemTime.
In Flink’s stream processing, most business scenarios use EventTime. Only when EventTime cannot be used, would they be forced to use ProcessingTime or IngestionTime.
Watermark Mechanism In Detail
What is Watermark?
Watermark is a special timestamp used to tell Flink the progress of events in the data stream. It represents Flink’s estimated “current time”, meaning all events earlier than this timestamp have already arrived.
Core Principle:
- Watermark = Event Time - Allowed Late Time
- When Watermark time >= window end time and window has data, trigger computation
- Data beyond Watermark is considered late events
Window Trigger Conditions:
- There is data in [window_start_time, window_end_time) window
- Watermark time >= window_end_time
Example: Assume window is 10:00:00 ~ 10:10:00, allowed lateness is 3 seconds
- Watermark time = Event time - 3 seconds
- Need to receive data with event time 10:10:03 (watermark time is 10:10:00) to trigger window computation
Solving Out-of-Order Problem
Flink solves data out-of-order problem through Watermark mechanism:
- Watermark Mechanism: Track event time progress, indicating “time has advanced to which point”
- Window Trigger Strategy: Allow late data to join computation within a certain time range
- Side Output: Collect data that is too late for separate processing
Configuration Method
// Set to use event time
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
Application Scenarios
EventTime Applicable Scenarios:
- Calculate session windows in user behavior analysis
- Generate precise daily/hourly reports
- Detect time series patterns of anomalous events
ProcessingTime Applicable Scenarios:
- Low-latency monitoring alerts needed
- Approximate real-time statistics
- Scenarios not requiring strict time precision