Tag: Distributed System
33 articles
Flink on YARN Deployment: Environment Preparation, Resource Manager
Detailed explanation of three Flink deployment modes on YARN cluster: Session, Application, Per-Job modes, Hadoop dependency configuration, YARN resource application and...
Big Data 90 - Apache Flink Introduction: Unified Stream-Batch Real-Time Computing
Systematic introduction to Apache Flink's origin, core features, and architecture components: JobManager, TaskManager, Dispatcher responsibilities, unified stream-batch p...
Big Data 84 - SparkSQL Internals: Five Join Strategies & Catalyst Optimizer
This is article 84 in the Big Data series, deeply analyzing SparkSQL kernel's Join strategy auto-selection logic and SQL parsing optimization flow.
Spark Standalone Mode: Architecture & Performance Tuning
Comprehensive explanation of Spark Standalone cluster four core components, application submission flow, SparkContext internal architecture, Shuffle evolution history and...
Spark Serialization & RDD Execution Principle
This is article 76 in the Big Data series, systematically reviewing Spark process communication mechanism, serialization strategy and RDD execution principle.
Spark Cluster Architecture & Deployment Modes
This is article 71 in the Big Data series, introducing Spark cluster core architecture, deployment mode comparisons, and static/dynamic resource management strategies.
From MapReduce to Spark: Big Data Computing Evolution
Systematic overview of big data processing engine evolution from MapReduce to Spark to Flink, analyzing Spark in-memory computing model, unified ecosystem and core compon...
Kafka High Performance: Zero-Copy, mmap & Sequential Write
This is article 66 in the Big Data series, deeply analyzing Kafka's underlying I/O optimization technologies achieving extremely high throughput.
Kafka Replica Mechanism: ISR & Leader Election
Deep dive into Kafka replica mechanism, including ISR sync node set maintenance, Leader election process, and unclean election trade-offs between consistency and availabi...
Kafka Exactly-Once: Idempotence & Transactions
Systematic explanation of how Kafka achieves Exactly-Once semantics through idempotent producers and transactions, covering PID/sequence number principle...
Kafka Topic, Partition & Consumer: Rebalance Optimization
Deep dive into Kafka Topic, Partition, Consumer Group core mechanisms, covering custom deserialization, offset management and rebalance optimization configuration.
Kafka Components: Producer, Broker, Consumer Full Flow
Deep dive into Kafka's three core components: Producer partitioning strategy and ACK mechanism, Broker Leader/Follower architecture, Consumer Group partition assignment a...
Redis High Availability: Master-Slave Replication & Sentinel
This is article 51 in the Big Data series, covering Redis high availability architecture: master-slave replication, Sentinel mode, and distributed lock design.
Kafka Architecture: High-Throughput Distributed Messaging
Systematic introduction to Kafka core architecture: Topic/Partition/Replica model, ISR mechanism, zero-copy optimization, message format and typical use cases.
Redis Cache Problems: Penetration, Breakdown, Avalancheand Solutions
Systematic overview of the five most common Redis cache problems in high-concurrency scenarios: cache penetration, cache breakdown, cache avalanche, hot key, and big key.
Big Data 50 - Redis Distributed Lock: Optimistic Lock, WATCH and SETNX
Redis optimistic lock in practice: WATCH/MULTI/EXEC mechanism explained, Lua scripts for atomic operations, SETNX+EXPIRE distributed lock from basics to Redisson...
Big Data 48 - Redis Communication Internals: RESP Protocol and Reactor Model
This is article 48 in the Big Data series. This article provides an in-depth analysis of Redis communication protocol RESP and Reactor-based event-driven architecture.
Redis Pub/Sub: Mechanism, Weak Transaction and Risks
Detailed explanation of Redis Pub/Sub working mechanism, three weak transaction flaws (no persistence, no acknowledgment, no retry), and alternative solutions in producti...
HBase Cluster Deployment and High Availability Configuration
This is article 35 in the Big Data series. Complete HBase distributed cluster deployment on three-node Hadoop + ZooKeeper cluster.
Big Data 33 - HBase Overall Architecture: HMaster, HRegionServer and Data Model
Comprehensive analysis of HBase distributed database overall architecture, including ZooKeeper coordination, HMaster management node, HRegionServer data node...
ZooKeeper Leader Election and ZAB Protocol Principles
This is article 31 in the Big Data series. Deep analysis of ZooKeeper Leader election mechanism and ZAB (ZooKeeper Atomic Broadcast) protocol implementation principles.
ZooKeeper Distributed Lock Java Implementation Details
This is article 32 in the Big Data series. Demonstrates how to implement fair distributed lock using ZooKeeper ephemeral sequential nodes, with complete Java code.
ZooKeeper Watcher Principle and Command Line Practice Guide
Complete analysis of Watcher registration-trigger-notification flow from client, WatchManager to ZooKeeper server, and zkCli command line practice demonstrating node CRUD...
ZooKeeper Java API Practice: Node CRUD and Monitoring
Use ZkClient library to operate ZooKeeper via Java code, complete practical examples of session establishment, persistent node CRUD, child node change monitoring...
ZooKeeper Cluster Configuration Details and Startup Verification
Deep dive into zoo.cfg core parameter meanings, explain myid file configuration specifications, demonstrate 3-node cluster startup process and Leader election result veri...
ZooKeeper ZNode Data Structure and Watcher Mechanism Details
Deep dive into ZooKeeper's four ZNode node types, ZXID transaction ID structure, and one-time trigger Watcher monitoring mechanism principles and practice.
ZooKeeper Distributed Coordination Framework Introduction and ZAB Protocol
Introduction to ZooKeeper core concepts, Leader/Follower/Observer role division, ZAB protocol principles, and demonstration of 3-node cluster installation and configurati...
Apache Flume Architecture and Core Concepts
Introduction to Apache Flume positioning, core components (Source, Channel, Sink), event model and common data flow topologies, and installation configuration methods.
HDFS Distributed File System Read/Write Principle
Deep dive into HDFS architecture: NameNode, DataNode, Client roles, Block storage mechanism, file read/write process (Pipeline write and nearest read), and HDFS basic com...
Hadoop Cluster SSH Passwordless Login Configuration and Distribution Script
Complete guide for Hadoop three-node cluster SSH passwordless login: generate RSA keys, distribute public keys, write rsync cluster distribution script.
Hadoop Cluster Startup and Web UI Verification
Complete startup process for Hadoop three-node cluster: format NameNode, start HDFS and YARN, verify cluster status via Web UI, including start-dfs.sh and start-yarn.
Basic Environment Setup: Hadoop Cluster
This article is migrated from Juejin. Original link: Big Data 01 - Basic Environment Setup
Hadoop Cluster XML Configuration Details
Detailed explanation of Hadoop cluster three-node XML configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.