Tag: Hadoop
17 articles
AI Investigation #51: Big Data Technology Evolution - Obs...
Big data technology evolution: MapReduce replaced by Spark, Storm replaced by Flink, Pig/Hive gradually phased out. This article analyzes why these technologies were eliminated and the technical re...
AI Investigation #50: Big Data Evolution - Two Decades of...
Two decades of big data evolution: from 2006 MapReduce batch processing to 2013 Spark in-memory computing, to 2019 Flink real-time computing. Architecture evolved from monolithic Hadoop to YARN mul...
AI Research 49 - Big Data Survey Report: Development Hist...
Big data development began in 1997 when NASA proposed the concept, 2003-2006 Google published GFS, MapReduce, Bigtable three major papers leading distributed computing revolution. 2005 saw Hadoop b...
ClickHouse CollapsingMergeTree & External Data Sources: H...
ClickHouse external data source engine guide: DDL templates, key parameters and read/write pipelines for ENGINE=HDFS, ENGINE=MySQL, ENGINE=Kafka, and distributed table configurations.
Sqoop Practice: MySQL Full Data Import to HDFS
Complete example demonstrating Sqoop importing MySQL table data to HDFS, covering core parameter explanations, MapReduce parallel mechanism, and execution result verification.
MapReduce JOIN Four Implementation Strategies
Deep dive into four JOIN strategies in MapReduce: Reduce-Side Join, Map-Side Join, Semi-Join, and Bloom Join principles and Java implementations, with analysis of applicable scenarios and performan...
Hive Introduction: Architecture and Cluster Installation
Introduction to Hive data warehouse core concepts, architecture components and pros/cons, with detailed steps to install and configure Hive 2.3.9 on three-node Hadoop cluster.
HDFS Java Client Practice: Upload/Download Files, Directo...
Using Hadoop HDFS Java Client API for file operations: Maven dependency configuration, FileSystem/Path/Configuration core classes, implement file upload, download, delete, list scan and progress ba...
Java Implementation MapReduce WordCount Complete Code
Implement Hadoop MapReduce WordCount from scratch: Hadoop serialization mechanism detailed explanation, writing Mapper, Reducer, Driver three components, Maven project configuration, local and clus...
HDFS Distributed File System Read/Write Principle
Deep dive into HDFS architecture: NameNode, DataNode, Client roles, Block storage mechanism, file read/write process (Pipeline write and nearest read), and HDFS basic commands.
HDFS CLI Practice Complete Command Guide
Complete HDFS CLI practice: hadoop fs common commands including directory operations, file upload/download, permission management, with three-node cluster live demo.
Hadoop Cluster WordCount Distributed Computing Practice
Complete WordCount execution on Hadoop cluster: upload files to HDFS, submit MapReduce job, view running status through YARN UI, verify true distributed computing.
Hadoop JobHistoryServer Configuration and Log Aggregation
Configure Hadoop JobHistoryServer to record MapReduce job execution history, enable YARN log aggregation, view job details and logs via Web UI.
Hadoop Cluster SSH Passwordless Login Configuration and D...
Complete guide for Hadoop three-node cluster SSH passwordless login: generate RSA keys, distribute public keys, write rsync cluster distribution script, including pitfall notes and /etc/hosts confi...
Hadoop Cluster Startup and Web UI Verification
Complete startup process for Hadoop three-node cluster: format NameNode, start HDFS and YARN, verify cluster status via Web UI, including start-dfs.sh and start-yarn.sh usage.
Basic Environment Setup: Hadoop Cluster
Detailed tutorial on setting up Hadoop cluster environment on 3 cloud servers (2C4G configuration), including HDFS, MapReduce, YARN components introduction, Java and Hadoop environment configuratio...
Hadoop Cluster XML Configuration Details
Detailed explanation of Hadoop cluster three-node XML configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, including NameNode, DataNode, ResourceManager configuration ...