Tag: distributed-system

33 articles

Flink on YARN Deployment: Environment Preparation, Resour...

Detailed explanation of three Flink deployment modes on YARN cluster: Session, Application, Per-Job modes, Hadoop dependency configuration, YARN resource application and job submission process.

11/27/2024

Apache Flink Introduction: Unified Stream-Batch Real-Time...

Systematic introduction to Apache Flink's origin, core features, and architecture components: JobManager, TaskManager, Dispatcher responsibilities, unified stream-batch processing model, and compar...

11/23/2024

SparkSQL Kernel: Five Join Strategies & Catalyst Optimize...

Deep dive into SparkSQL's five Join execution strategies (BHJ, SHJ, SMJ, Cartesian, BNLJ) selection conditions and use cases, along with the complete processing flow of Catalyst optimizer from SQL ...

11/9/2024

Spark Standalone Mode: Architecture & Performance Tuning

Comprehensive explanation of Spark Standalone cluster four core components, application submission flow, SparkContext internal architecture, Shuffle evolution history and RDD optimization strategies.

11/2/2024

Spark Serialization & RDD Execution Principle

Deep dive into Spark Driver-Executor process communication, Java/Kryo serialization selection, closure serialization problem troubleshooting, and RDD dependencies, Stage division and persistence st...

10/26/2024

Spark Cluster Architecture & Deployment Modes

Deep dive into Spark cluster core components Driver, Cluster Manager, Executor responsibilities, comparison of Standalone, YARN, Kubernetes deployment modes, and static vs dynamic resource allocati...

10/16/2024

From MapReduce to Spark: Big Data Computing Evolution

Systematic overview of big data processing engine evolution from MapReduce to Spark to Flink, analyzing Spark in-memory computing model, unified ecosystem and core components.

10/9/2024

Kafka High Performance: Zero-Copy, mmap & Sequential Write

Deep dive into Kafka's three I/O technologies achieving high throughput: sendfile zero-copy, mmap memory mapping and page cache sequential write, revealing kernel-level optimization behind million ...

10/5/2024

Kafka Replica Mechanism: ISR & Leader Election

Deep dive into Kafka replica mechanism, including ISR sync node set maintenance, Leader election process, and unclean election trade-offs between consistency and availability.

10/2/2024

Kafka Exactly-Once: Idempotence & Transactions

Systematic explanation of how Kafka achieves Exactly-Once semantics through idempotent producers and transactions, covering PID/sequence number principle, cross-partition transaction configuration ...

10/2/2024

Kafka Topic, Partition & Consumer: Rebalance Optimization

Deep dive into Kafka Topic, Partition, Consumer Group core mechanisms, covering custom deserialization, offset management and rebalance optimization configuration.

9/28/2024

Kafka Components: Producer, Broker, Consumer Full Flow

Deep dive into Kafka's three core components: Producer partitioning strategy and ACK mechanism, Broker Leader/Follower architecture, Consumer Group partition assignment and offset management.

9/14/2024

Redis High Availability: Master-Slave Replication & Sentinel

Deep dive into Redis high availability: master-slave replication, Sentinel automatic failover, and distributed lock design with Docker deployment examples.

9/11/2024

Kafka Architecture: High-Throughput Distributed Messaging

Systematic introduction to Kafka core architecture: Topic/Partition/Replica model, ISR mechanism, zero-copy optimization, message format and typical use cases.

9/11/2024

Redis Cache Problems: Penetration, Breakdown, Avalanche, ...

Systematic overview of the five most common Redis cache problems in high-concurrency scenarios: cache penetration, cache breakdown, cache avalanche, hot key, and big key. Analyzes the root cause of...

9/7/2024

Redis Distributed Lock: Optimistic Lock, WATCH and SETNX ...

Redis optimistic lock in practice: WATCH/MULTI/EXEC mechanism explained, Lua scripts for atomic operations, SETNX+EXPIRE distributed lock from basics to Redisson, with complete Java code examples.

9/7/2024

Redis Communication Internals: RESP Protocol and Reactor ...

Deep dive into Redis communication internals: RESP serialization protocol five data types, Pipeline batch processing mode, and how the epoll-based Reactor single-threaded event-driven architecture ...

9/4/2024

Redis Pub/Sub: Mechanism, Weak Transaction and Risks

Detailed explanation of Redis Pub/Sub working mechanism, three weak transaction flaws (no persistence, no acknowledgment, no retry), and alternative solutions in production.

8/24/2024

HBase Cluster Deployment and High Availability Configuration

Complete HBase distributed cluster deployment: configure RegionServer on multiple nodes, HMaster high availability, integrate with ZooKeeper for coordination, with start/stop scripts and verificati...

8/14/2024

HBase Overall Architecture: HMaster, HRegionServer and Da...

Comprehensive analysis of HBase distributed database overall architecture, including ZooKeeper coordination, HMaster management node, HRegionServer data node, Region storage unit, and four-dimensio...

8/10/2024

ZooKeeper Leader Election and ZAB Protocol Principles

Deep dive into ZooKeeper's Leader election mechanism and ZAB (ZooKeeper Atomic Broadcast) protocol, covering initial election process, message broadcast three phases, fault recovery strategy, and p...

8/7/2024

ZooKeeper Distributed Lock Java Implementation Details

Implement distributed lock based on ZooKeeper ephemeral sequential nodes, with complete Java code, covering lock competition, predecessor node monitoring, CountDownLatch synchronization, and recurs...

8/7/2024

ZooKeeper Watcher Principle and Command Line Practice Guide

Complete analysis of Watcher registration-trigger-notification flow from client, WatchManager to ZooKeeper server, and zkCli command line practice demonstrating node CRUD and monitoring.

8/3/2024

ZooKeeper Java API Practice: Node CRUD and Monitoring

Use ZkClient library to operate ZooKeeper via Java code, complete practical examples of session establishment, persistent node CRUD, child node change monitoring, and data change monitoring.

8/3/2024

ZooKeeper Cluster Configuration Details and Startup Verif...

Deep dive into zoo.cfg core parameter meanings, explain myid file configuration specifications, demonstrate 3-node cluster startup process and Leader election result verification.

7/31/2024

ZooKeeper ZNode Data Structure and Watcher Mechanism Details

Deep dive into ZooKeeper's four ZNode node types, ZXID transaction ID structure, and one-time trigger Watcher monitoring mechanism principles and practice.

7/31/2024

ZooKeeper Distributed Coordination Framework Introduction...

Introduction to ZooKeeper core concepts, Leader/Follower/Observer role division, ZAB protocol principles, and demonstration of 3-node cluster installation and configuration process.

7/27/2024

Apache Flume Architecture and Core Concepts

Introduction to Apache Flume positioning, core components (Source, Channel, Sink), event model and common data flow topologies, and installation configuration methods.

7/13/2024

HDFS Distributed File System Read/Write Principle

Deep dive into HDFS architecture: NameNode, DataNode, Client roles, Block storage mechanism, file read/write process (Pipeline write and nearest read), and HDFS basic commands.

7/2/2024

Hadoop Cluster SSH Passwordless Login Configuration and D...

Complete guide for Hadoop three-node cluster SSH passwordless login: generate RSA keys, distribute public keys, write rsync cluster distribution script, including pitfall notes and /etc/hosts confi...

6/30/2024

Hadoop Cluster Startup and Web UI Verification

Complete startup process for Hadoop three-node cluster: format NameNode, start HDFS and YARN, verify cluster status via Web UI, including start-dfs.sh and start-yarn.sh usage.

6/30/2024

Basic Environment Setup: Hadoop Cluster

Detailed tutorial on setting up Hadoop cluster environment on 3 cloud servers (2C4G configuration), including HDFS, MapReduce, YARN components introduction, Java and Hadoop environment configuratio...

6/28/2024

Hadoop Cluster XML Configuration Details

Detailed explanation of Hadoop cluster three-node XML configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, including NameNode, DataNode, ResourceManager configuration ...

6/28/2024