Gleam Lab · Blog Archive

Blog Page 31

Technical exploration and engineering notes, 655 articles in total.

Gleam Lab technical blog cover: AI Engineering, Java backend, and long-form writing

All Articles Java243 Backend50 Microservices10 AI Engineering86 LLM35 Big Data271 Data Engineering57 Kubernetes / Cloud Native3 Real-time Voice1 Robotics40 Personal Growth29 More Tags →

Tutorial Series 4 min read Big Data Engineering

Sqoop Incremental Import and CDC Change Data Capture Principles

Introduce Sqoop's --incremental append incremental import mechanism, and deeply explain CDC (Change Data Capture) core concepts, capture method comparisons...

7/27/2024

big-datasqoopetl +1

Tutorial Series 4 min read Big Data Engineering

ZooKeeper Distributed Coordination Framework Introduction and ZAB Protocol

Introduction to ZooKeeper core concepts, Leader/Follower/Observer role division, ZAB protocol principles, and demonstration of 3-node cluster installation and configurati...

7/27/2024

big-datazookeeperdistributed-system

Tutorial Series 3 min read Big Data Engineering

Big Data 23 - Sqoop Partial Import: --query, --columns and --where

Detailed explanation of three ways Sqoop imports partial data from MySQL to HDFS by condition: custom query, specify columns, WHERE condition filtering, with applicable s...

7/24/2024

big-datasqoopetl +1

Tutorial Series 3 min read Big Data Engineering

Sqoop and Hive Integration: MySQL ↔ Hive Bidirectional Data Transfer

Demonstrates Sqoop importing MySQL data directly to Hive table, and exporting Hive data back to MySQL, covering key parameters like --hive-import, --create-hive-table usa...

7/24/2024

big-datasqoophive +2

Tutorial Series 3 min read Big Data Engineering

Sqoop Data Migration ETL Tool Introduction and Installation

Introduction to Apache Sqoop core principles, use cases, and installation configuration steps on Hadoop cluster, helping quickly get started with batch data migration bet...

7/20/2024

big-datasqoopetl +1

Tutorial Series 3 min read Big Data Engineering

Sqoop Practice: MySQL Full Data Import to HDFS

Complete example demonstrating Sqoop importing MySQL table data to HDFS, covering core parameter explanations, MapReduce parallel mechanism, and execution result verifica...

7/20/2024

big-datasqoophadoop +2

Tutorial Series 3 min read Big Data Engineering

Flume Collect Hive Logs to HDFS

Use Flume exec source to real-time track Hive log files, buffer via memory channel, configure HDFS sink to write with time-based partitioning, implement automatic log dat...

7/17/2024

big-dataflumehdfs +1

Tutorial Series 4 min read Big Data Engineering

Flume Dual Sink: Write Logs to Both HDFS and Local File

This is article 20 in the Big Data series. Demonstrates Flume replication mode with dual Sink architecture—same data written to both HDFS and local filesystem.

7/17/2024

big-dataflumehdfs +1

Tutorial Series 3 min read Big Data Engineering

Apache Flume Architecture and Core Concepts

Introduction to Apache Flume positioning, core components (Source, Channel, Sink), event model and common data flow topologies, and installation configuration methods.

7/13/2024

big-dataflumedata-engineering +1

Tutorial Series 3 min read Big Data Engineering

Flume Hello World: NetCat Source + Memory Channel + Logger Sink

Through Flume's simplest Hello World case, use netcat source to monitor port, memory channel for buffering, logger sink for console output, demonstrating complete Source→...

7/13/2024

big-dataflumedata-engineering

Tutorial Series 4 min read Big Data Engineering

Hive Metastore Three Modes and Remote Deployment

Detailed explanation of Hive Metastore's embedded, local, and remote deployment modes, and complete steps to configure high-availability remote Metastore on three-node cl...

7/10/2024

big-datahivedata-engineering

Tutorial Series 2 min read Big Data Engineering

HiveServer2 Configuration and Beeline Remote Connection

Introduction to HiveServer2 architecture and role, configure Hadoop proxy user and WebHDFS, implement cross-node JDBC remote access to Hive via Beeline client.

7/10/2024

big-datahivedata-engineering

Tutorial Series 3 min read Big Data Engineering

Hive DDL and DML Operations

Systematic explanation of Hive DDL (database/table creation, internal and external tables) and DML (data loading, insertion, query) operations.

7/8/2024

big-datahivesql +1

Tutorial Series 4 min read Big Data Engineering

Hive HQL Advanced: Data Import/Export and Query Practice

Deep dive into Hive's multiple data import methods (LOAD/INSERT/External Table/Sqoop), data export methods, and practical usage of HQL query operations like aggregation...

7/8/2024

big-datahivesql +1

Tutorial Series 4 min read Big Data Engineering

MapReduce JOIN Four Implementation Strategies

This is article 11 in the Big Data series. Introduces four classic strategies for implementing multi-table JOIN in MapReduce framework and their Java implementations.

7/4/2024

big-datahadoopmapreduce +1

Tutorial Series 3 min read Big Data Engineering

Hive Introduction: Architecture and Cluster Installation

Introduction to Hive data warehouse core concepts, architecture components and pros/cons, with detailed steps to install and configure Hive 2.3.9 on three-node Hadoop clu...

7/4/2024

big-datahivehadoop +1

Tutorial Series 2 min read Big Data Engineering

HDFS Java Client Practice: Upload/Download Files, Directory Operations and API Usage

This is article 9 in the Big Data series. Learn to operate HDFS through Java code, master Hadoop's Java Client API.

7/3/2024

big-datahdfshadoop +1

Tutorial Series 3 min read Big Data Engineering

Java Implementation MapReduce WordCount Complete Code

Implement Hadoop MapReduce WordCount from scratch: Hadoop serialization mechanism detailed explanation, writing Mapper, Reducer, Driver three components...

7/3/2024

big-datahadoopmapreduce +1

Tutorial Series 3 min read Big Data Engineering

HDFS Distributed File System Read/Write Principle

Deep dive into HDFS architecture: NameNode, DataNode, Client roles, Block storage mechanism, file read/write process (Pipeline write and nearest read), and HDFS basic com...

7/2/2024

big-datahdfshadoop +1

Tutorial Series 2 min read Big Data Engineering

HDFS CLI Practice Complete Command Guide

Complete HDFS CLI practice: hadoop fs common commands including directory operations, file upload/download, permission management, with three-node cluster live demo.

7/2/2024

big-datahdfshadoop +1