Gleam Lab · Tag Archive

Tag: distributed-system

33 articles collected by topic for tutorials, cases, engineering practice, and research notes.

Flink on YARN Deployment: Environment Preparation, Resource Manager

Detailed explanation of three Flink deployment modes on YARN cluster: Session, Application, Per-Job modes, Hadoop dependency configuration, YARN resource application and...

11/27/2024

Big Data 90 - Apache Flink Introduction: Unified Stream-Batch Real-Time Computing

Systematic introduction to Apache Flink's origin, core features, and architecture components: JobManager, TaskManager, Dispatcher responsibilities, unified stream-batch p...

11/23/2024

Big Data 84 - SparkSQL Internals: Five Join Strategies & Catalyst Optimizer

This is article 84 in the Big Data series, deeply analyzing SparkSQL kernel's Join strategy auto-selection logic and SQL parsing optimization flow.

11/9/2024

Spark Standalone Mode: Architecture & Performance Tuning

Comprehensive explanation of Spark Standalone cluster four core components, application submission flow, SparkContext internal architecture, Shuffle evolution history and...

11/2/2024

Spark Serialization & RDD Execution Principle

This is article 76 in the Big Data series, systematically reviewing Spark process communication mechanism, serialization strategy and RDD execution principle.

10/26/2024

Spark Cluster Architecture & Deployment Modes

This is article 71 in the Big Data series, introducing Spark cluster core architecture, deployment mode comparisons, and static/dynamic resource management strategies.

10/16/2024

From MapReduce to Spark: Big Data Computing Evolution

Systematic overview of big data processing engine evolution from MapReduce to Spark to Flink, analyzing Spark in-memory computing model, unified ecosystem and core compon...

10/9/2024

Kafka High Performance: Zero-Copy, mmap & Sequential Write

This is article 66 in the Big Data series, deeply analyzing Kafka's underlying I/O optimization technologies achieving extremely high throughput.

10/5/2024

Kafka Replica Mechanism: ISR & Leader Election

Deep dive into Kafka replica mechanism, including ISR sync node set maintenance, Leader election process, and unclean election trade-offs between consistency and availabi...

10/2/2024

Kafka Exactly-Once: Idempotence & Transactions

Systematic explanation of how Kafka achieves Exactly-Once semantics through idempotent producers and transactions, covering PID/sequence number principle...

10/2/2024

Kafka Topic, Partition & Consumer: Rebalance Optimization

Deep dive into Kafka Topic, Partition, Consumer Group core mechanisms, covering custom deserialization, offset management and rebalance optimization configuration.

9/28/2024

Kafka Components: Producer, Broker, Consumer Full Flow

Deep dive into Kafka's three core components: Producer partitioning strategy and ACK mechanism, Broker Leader/Follower architecture, Consumer Group partition assignment a...

9/14/2024

Redis High Availability: Master-Slave Replication & Sentinel

This is article 51 in the Big Data series, covering Redis high availability architecture: master-slave replication, Sentinel mode, and distributed lock design.

9/11/2024

Kafka Architecture: High-Throughput Distributed Messaging

Systematic introduction to Kafka core architecture: Topic/Partition/Replica model, ISR mechanism, zero-copy optimization, message format and typical use cases.

9/11/2024

Redis Cache Problems: Penetration, Breakdown, Avalancheand Solutions

Systematic overview of the five most common Redis cache problems in high-concurrency scenarios: cache penetration, cache breakdown, cache avalanche, hot key, and big key.

9/7/2024

Big Data 50 - Redis Distributed Lock: Optimistic Lock, WATCH and SETNX

Redis optimistic lock in practice: WATCH/MULTI/EXEC mechanism explained, Lua scripts for atomic operations, SETNX+EXPIRE distributed lock from basics to Redisson...

9/7/2024

Big Data 48 - Redis Communication Internals: RESP Protocol and Reactor Model

This is article 48 in the Big Data series. This article provides an in-depth analysis of Redis communication protocol RESP and Reactor-based event-driven architecture.

9/4/2024

Redis Pub/Sub: Mechanism, Weak Transaction and Risks

Detailed explanation of Redis Pub/Sub working mechanism, three weak transaction flaws (no persistence, no acknowledgment, no retry), and alternative solutions in producti...

8/24/2024

HBase Cluster Deployment and High Availability Configuration

This is article 35 in the Big Data series. Complete HBase distributed cluster deployment on three-node Hadoop + ZooKeeper cluster.

8/14/2024

Big Data 33 - HBase Overall Architecture: HMaster, HRegionServer and Data Model

Comprehensive analysis of HBase distributed database overall architecture, including ZooKeeper coordination, HMaster management node, HRegionServer data node...

8/10/2024

ZooKeeper Leader Election and ZAB Protocol Principles

This is article 31 in the Big Data series. Deep analysis of ZooKeeper Leader election mechanism and ZAB (ZooKeeper Atomic Broadcast) protocol implementation principles.

8/7/2024

ZooKeeper Distributed Lock Java Implementation Details

This is article 32 in the Big Data series. Demonstrates how to implement fair distributed lock using ZooKeeper ephemeral sequential nodes, with complete Java code.

8/7/2024

ZooKeeper Watcher Principle and Command Line Practice Guide

Complete analysis of Watcher registration-trigger-notification flow from client, WatchManager to ZooKeeper server, and zkCli command line practice demonstrating node CRUD...

8/3/2024

ZooKeeper Java API Practice: Node CRUD and Monitoring

Use ZkClient library to operate ZooKeeper via Java code, complete practical examples of session establishment, persistent node CRUD, child node change monitoring...

8/3/2024

ZooKeeper Cluster Configuration Details and Startup Verification

Deep dive into zoo.cfg core parameter meanings, explain myid file configuration specifications, demonstrate 3-node cluster startup process and Leader election result veri...

7/31/2024

ZooKeeper ZNode Data Structure and Watcher Mechanism Details

Deep dive into ZooKeeper's four ZNode node types, ZXID transaction ID structure, and one-time trigger Watcher monitoring mechanism principles and practice.

7/31/2024

ZooKeeper Distributed Coordination Framework Introduction and ZAB Protocol

Introduction to ZooKeeper core concepts, Leader/Follower/Observer role division, ZAB protocol principles, and demonstration of 3-node cluster installation and configurati...

7/27/2024

Apache Flume Architecture and Core Concepts

Introduction to Apache Flume positioning, core components (Source, Channel, Sink), event model and common data flow topologies, and installation configuration methods.

7/13/2024

HDFS Distributed File System Read/Write Principle

Deep dive into HDFS architecture: NameNode, DataNode, Client roles, Block storage mechanism, file read/write process (Pipeline write and nearest read), and HDFS basic com...

7/2/2024

Hadoop Cluster SSH Passwordless Login Configuration and Distribution Script

Complete guide for Hadoop three-node cluster SSH passwordless login: generate RSA keys, distribute public keys, write rsync cluster distribution script.

6/30/2024

Hadoop Cluster Startup and Web UI Verification

Complete startup process for Hadoop three-node cluster: format NameNode, start HDFS and YARN, verify cluster status via Web UI, including start-dfs.sh and start-yarn.

6/30/2024

Basic Environment Setup: Hadoop Cluster

This article is migrated from Juejin. Original link: Big Data 01 - Basic Environment Setup

6/28/2024

Hadoop Cluster XML Configuration Details

Detailed explanation of Hadoop cluster three-node XML configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.

6/28/2024