Tag: Hadoop
17 articles
AI Investigation #51: Big Data Technology Evolution - Obsolete Frameworks, Architectures and the Reasons Behind Them
Big data technology evolution: MapReduce replaced by Spark, Storm replaced by Flink, Pig/Hive gradually phased out.
AI Investigation #50: Big Data Evolution - Two Decades of Architectural Change from Hadoop to Flink
Two decades of big data evolution: from 2006 MapReduce batch processing to 2013 Spark in-memory computing, to 2019 Flink real-time computing.
AI Research 49 - Big Data Survey Report: Development History from 1997 to 2025
Big data development began in 1997 when NASA proposed the concept, 2003-2006 Google published GFS, MapReduce, Bigtable three major papers leading distributed computing re...
Big Data 140 - ClickHouse CollapsingMergeTree & External Data Sources
ClickHouse external data source engine guide: DDL templates, key parameters and read/write pipelines for ENGINE=HDFS, ENGINE=MySQL, ENGINE=Kafka, and distributed table co...
Sqoop Practice: MySQL Full Data Import to HDFS
Complete example demonstrating Sqoop importing MySQL table data to HDFS, covering core parameter explanations, MapReduce parallel mechanism, and execution result verifica...
MapReduce JOIN Four Implementation Strategies
This is article 11 in the Big Data series. Introduces four classic strategies for implementing multi-table JOIN in MapReduce framework and their Java implementations.
Hive Introduction: Architecture and Cluster Installation
Introduction to Hive data warehouse core concepts, architecture components and pros/cons, with detailed steps to install and configure Hive 2.3.9 on three-node Hadoop clu...
HDFS Java Client Practice: Upload/Download Files, Directory Operations and API Usage
This is article 9 in the Big Data series. Learn to operate HDFS through Java code, master Hadoop's Java Client API.
Java Implementation MapReduce WordCount Complete Code
Implement Hadoop MapReduce WordCount from scratch: Hadoop serialization mechanism detailed explanation, writing Mapper, Reducer, Driver three components...
HDFS Distributed File System Read/Write Principle
Deep dive into HDFS architecture: NameNode, DataNode, Client roles, Block storage mechanism, file read/write process (Pipeline write and nearest read), and HDFS basic com...
HDFS CLI Practice Complete Command Guide
Complete HDFS CLI practice: hadoop fs common commands including directory operations, file upload/download, permission management, with three-node cluster live demo.
Hadoop Cluster WordCount Distributed Computing Practice
Complete WordCount execution on Hadoop cluster: upload files to HDFS, submit MapReduce job, view running status through YARN UI, verify true distributed computing.
Hadoop JobHistoryServer Configuration and Log Aggregation
Configure Hadoop JobHistoryServer to record MapReduce job execution history, enable YARN log aggregation, view job details and logs via Web UI.
Hadoop Cluster SSH Passwordless Login Configuration and Distribution Script
Complete guide for Hadoop three-node cluster SSH passwordless login: generate RSA keys, distribute public keys, write rsync cluster distribution script.
Hadoop Cluster Startup and Web UI Verification
Complete startup process for Hadoop three-node cluster: format NameNode, start HDFS and YARN, verify cluster status via Web UI, including start-dfs.sh and start-yarn.
Basic Environment Setup: Hadoop Cluster
This article is migrated from Juejin. Original link: Big Data 01 - Basic Environment Setup
Hadoop Cluster XML Configuration Details
Detailed explanation of Hadoop cluster three-node XML configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.