Tag: 流处理

16 articles

Flink CEP: Complex Event Processing & Pattern Matching

Flink CEP detailed explanation: pattern sequence, individual patterns, combined patterns, matching skip strategies and practical cases.

Flink Memory Management: Network Buffer, State Backend & ...

Flink memory model detailed explanation: Network Buffer Pool, Task Heap, State Backend memory allocation, GC tuning and backpressure handling.

Flink Parallelism: Operator Chaining, Slot & Resource Sch...

Flink parallelism detailed explanation: Operator Chaining, Slot allocation strategy, parallelism settings and resource scheduling principle.

Flink Broadcast State: BroadcastState Practice & Rule Upd...

Flink Broadcast State explanation: BroadcastState principle, dynamic rule updates, state partitioning and memory management, demonstrating broadcast stream and non-broadcast stream join through cases.

Flink State Backend: State Storage & Performance Optimiza...

Flink State Backend detailed explanation: HashMapStateBackend, EmbeddedRocksDBStateBackend selection, memory configuration and performance tuning.

Flink State and Checkpoint: State Management, Fault Toler...

Flink stateful computation explanation: Keyed State, Operator State, Checkpoint configuration, Savepoint backup and recovery, production environment practices.

Flink Streaming Introduction: DataStream API & Program St...

Flink DataStream API getting started guide, program execution flow, environment acquisition, data source definition, operator chaining and execution mode details, demonstrating stream processing pr...

Flink Window and Watermark: Time Windows, Tumbling/Slidin...

Comprehensive analysis of Flink Window mechanism: tumbling windows, sliding windows, session windows, Watermark principle and generation strategies, late data processing mechanism.

Apache Flink Introduction: Unified Stream-Batch Real-Time...

Systematic introduction to Apache Flink's origin, core features, and architecture components: JobManager, TaskManager, Dispatcher responsibilities, unified stream-batch processing model, and compar...

Spark Streaming Integration with Kafka: Receiver and Dire...

Detailed explanation of two Spark Streaming integration modes with Kafka: Receiver-based high-level API vs Direct mode architecture differences, offset management, Exactly-Once semantics guarantee,...

Spark DStream Transformation Operators: map, reduceByKey,...

Systematically review Spark Streaming DStream stateless transformation operators and transform advanced operations, demonstrating three implementation approaches for blacklist filtering: leftOuterJ...

Spark Streaming Window Operations & State Tracking: updat...

In-depth explanation of Spark Streaming stateful computing: window operation parameter configuration, reduceByKeyAndWindow hot word statistics, updateStateByKey full-state maintenance and mapWithSt...

Spark Streaming Introduction: From DStream to Structured ...

Introduction to Spark's two generations of real-time computing frameworks: DStream micro-batch processing model's architecture and limitations, and how Structured Streaming solves EventTime process...

Spark Streaming Data Sources: File Stream, Socket, RDD Qu...

Comprehensive explanation of three Spark Streaming basic data sources: file stream directory monitoring, Socket TCP ingestion, RDD queue stream for testing simulation, with complete Scala code exam...

Spark RDD Deep Dive: Five Key Features

Comprehensive analysis of Spark core data abstraction RDD's five key features (partitions, compute function, dependencies, partitioner, preferred locations), lazy evaluation, fault tolerance, and n...

From MapReduce to Spark: Big Data Computing Evolution

Systematic overview of big data processing engine evolution from MapReduce to Spark to Flink, analyzing Spark in-memory computing model, unified ecosystem and core components.