Tag: distributed-system
33 articles
Flink on YARN Deployment: Environment Preparation, Resour...
Detailed explanation of three Flink deployment modes on YARN cluster: Session, Application, Per-Job modes, Hadoop dependency configuration, YARN resource application and job submission process.
Apache Flink Introduction: Unified Stream-Batch Real-Time...
Systematic introduction to Apache Flink's origin, core features, and architecture components: JobManager, TaskManager, Dispatcher responsibilities, unified stream-batch processing model, and compar...
SparkSQL Kernel: Five Join Strategies & Catalyst Optimize...
Deep dive into SparkSQL's five Join execution strategies (BHJ, SHJ, SMJ, Cartesian, BNLJ) selection conditions and use cases, along with the complete processing flow of Catalyst optimizer from SQL ...
Spark Standalone Mode: Architecture & Performance Tuning
Comprehensive explanation of Spark Standalone cluster four core components, application submission flow, SparkContext internal architecture, Shuffle evolution history and RDD optimization strategies.
Spark Serialization & RDD Execution Principle
Deep dive into Spark Driver-Executor process communication, Java/Kryo serialization selection, closure serialization problem troubleshooting, and RDD dependencies, Stage division and persistence st...
Spark Cluster Architecture & Deployment Modes
Deep dive into Spark cluster core components Driver, Cluster Manager, Executor responsibilities, comparison of Standalone, YARN, Kubernetes deployment modes, and static vs dynamic resource allocati...
From MapReduce to Spark: Big Data Computing Evolution
Systematic overview of big data processing engine evolution from MapReduce to Spark to Flink, analyzing Spark in-memory computing model, unified ecosystem and core components.
Kafka High Performance: Zero-Copy, mmap & Sequential Write
Deep dive into Kafka's three I/O technologies achieving high throughput: sendfile zero-copy, mmap memory mapping and page cache sequential write, revealing kernel-level optimization behind million ...
Kafka Replica Mechanism: ISR & Leader Election
Deep dive into Kafka replica mechanism, including ISR sync node set maintenance, Leader election process, and unclean election trade-offs between consistency and availability.
Kafka Exactly-Once: Idempotence & Transactions
Systematic explanation of how Kafka achieves Exactly-Once semantics through idempotent producers and transactions, covering PID/sequence number principle, cross-partition transaction configuration ...
Kafka Topic, Partition & Consumer: Rebalance Optimization
Deep dive into Kafka Topic, Partition, Consumer Group core mechanisms, covering custom deserialization, offset management and rebalance optimization configuration.
Kafka Components: Producer, Broker, Consumer Full Flow
Deep dive into Kafka's three core components: Producer partitioning strategy and ACK mechanism, Broker Leader/Follower architecture, Consumer Group partition assignment and offset management.
Redis High Availability: Master-Slave Replication & Sentinel
Deep dive into Redis high availability: master-slave replication, Sentinel automatic failover, and distributed lock design with Docker deployment examples.
Kafka Architecture: High-Throughput Distributed Messaging
Systematic introduction to Kafka core architecture: Topic/Partition/Replica model, ISR mechanism, zero-copy optimization, message format and typical use cases.
Redis Cache Problems: Penetration, Breakdown, Avalanche, ...
Systematic overview of the five most common Redis cache problems in high-concurrency scenarios: cache penetration, cache breakdown, cache avalanche, hot key, and big key. Analyzes the root cause of...
Redis Distributed Lock: Optimistic Lock, WATCH and SETNX ...
Redis optimistic lock in practice: WATCH/MULTI/EXEC mechanism explained, Lua scripts for atomic operations, SETNX+EXPIRE distributed lock from basics to Redisson, with complete Java code examples.
Redis Communication Internals: RESP Protocol and Reactor ...
Deep dive into Redis communication internals: RESP serialization protocol five data types, Pipeline batch processing mode, and how the epoll-based Reactor single-threaded event-driven architecture ...
Redis Pub/Sub: Mechanism, Weak Transaction and Risks
Detailed explanation of Redis Pub/Sub working mechanism, three weak transaction flaws (no persistence, no acknowledgment, no retry), and alternative solutions in production.
HBase Cluster Deployment and High Availability Configuration
Complete HBase distributed cluster deployment: configure RegionServer on multiple nodes, HMaster high availability, integrate with ZooKeeper for coordination, with start/stop scripts and verificati...
HBase Overall Architecture: HMaster, HRegionServer and Da...
Comprehensive analysis of HBase distributed database overall architecture, including ZooKeeper coordination, HMaster management node, HRegionServer data node, Region storage unit, and four-dimensio...
ZooKeeper Leader Election and ZAB Protocol Principles
Deep dive into ZooKeeper's Leader election mechanism and ZAB (ZooKeeper Atomic Broadcast) protocol, covering initial election process, message broadcast three phases, fault recovery strategy, and p...
ZooKeeper Distributed Lock Java Implementation Details
Implement distributed lock based on ZooKeeper ephemeral sequential nodes, with complete Java code, covering lock competition, predecessor node monitoring, CountDownLatch synchronization, and recurs...
ZooKeeper Watcher Principle and Command Line Practice Guide
Complete analysis of Watcher registration-trigger-notification flow from client, WatchManager to ZooKeeper server, and zkCli command line practice demonstrating node CRUD and monitoring.
ZooKeeper Java API Practice: Node CRUD and Monitoring
Use ZkClient library to operate ZooKeeper via Java code, complete practical examples of session establishment, persistent node CRUD, child node change monitoring, and data change monitoring.
ZooKeeper Cluster Configuration Details and Startup Verif...
Deep dive into zoo.cfg core parameter meanings, explain myid file configuration specifications, demonstrate 3-node cluster startup process and Leader election result verification.
ZooKeeper ZNode Data Structure and Watcher Mechanism Details
Deep dive into ZooKeeper's four ZNode node types, ZXID transaction ID structure, and one-time trigger Watcher monitoring mechanism principles and practice.
ZooKeeper Distributed Coordination Framework Introduction...
Introduction to ZooKeeper core concepts, Leader/Follower/Observer role division, ZAB protocol principles, and demonstration of 3-node cluster installation and configuration process.
Apache Flume Architecture and Core Concepts
Introduction to Apache Flume positioning, core components (Source, Channel, Sink), event model and common data flow topologies, and installation configuration methods.
HDFS Distributed File System Read/Write Principle
Deep dive into HDFS architecture: NameNode, DataNode, Client roles, Block storage mechanism, file read/write process (Pipeline write and nearest read), and HDFS basic commands.
Hadoop Cluster SSH Passwordless Login Configuration and D...
Complete guide for Hadoop three-node cluster SSH passwordless login: generate RSA keys, distribute public keys, write rsync cluster distribution script, including pitfall notes and /etc/hosts confi...
Hadoop Cluster Startup and Web UI Verification
Complete startup process for Hadoop three-node cluster: format NameNode, start HDFS and YARN, verify cluster status via Web UI, including start-dfs.sh and start-yarn.sh usage.
Basic Environment Setup: Hadoop Cluster
Detailed tutorial on setting up Hadoop cluster environment on 3 cloud servers (2C4G configuration), including HDFS, MapReduce, YARN components introduction, Java and Hadoop environment configuratio...
Hadoop Cluster XML Configuration Details
Detailed explanation of Hadoop cluster three-node XML configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, including NameNode, DataNode, ResourceManager configuration ...