This is article 7 in the Big Data series. Understanding HDFS architecture design and read/write principles is the theoretical foundation for subsequent HDFS practice.

Complete illustrated version: CSDN Original | Juejin

HDFS Architecture

HDFS (Hadoop Distributed File System) uses Master/Slave architecture:

RoleDescription
NameNodeManages namespace, maintains Block to DataNode mapping, handles Client metadata requests
DataNodeStores actual data Blocks, sends heartbeats and block reports to NameNode periodically
Secondary NameNodePeriodically merges EditLog and FsImage, relieves NameNode pressure (not hot backup)
ClientSplits files when uploading, gets location info from NameNode, interacts directly with DataNodes

Block Mechanism

  • Hadoop 2.x default Block size: 128MB (Hadoop 1.x was 64MB)
  • Each Block has 3 replicas by default, distributed across different DataNodes
  • Large files exceeding 128MB are split into multiple Blocks for distributed storage

HDFS Features

FeatureDescription
High Fault Tolerance3 replica redundancy, automatic recovery from DataNode failures
High ThroughputSequential read/write, suitable for large file batch processing
Large File FriendlySuitable for TB/PB level data
Not Suitable ForLow-latency random read/write, many small files
Write ModelWrite-once-read-many, supports append, does not support random modification

Read Process

  1. Client sends read request to NameNode (file path)
  2. NameNode returns Block list with DataNode locations for each Block
  3. Client selects nearest DataNode (local → same rack → cross-rack)
  4. Client receives data in Packet units, caches locally, then writes to target file

Write Process

  1. Client applies to NameNode for write (provides file path and Block size)
  2. NameNode allocates DataNode list (dn1, dn2, dn3), returns to Client
  3. Client connects to dn1, establishes Pipeline (dn1 → dn2 → dn3)
  4. Client splits Block into Packets, sends to dn1, dn1 forwards to dn2, sequentially
  5. After all replicas complete, dn3 confirms to dn2, dn2 confirms to dn1, dn1 confirms to Client
  6. Client notifies NameNode write complete, updates metadata

Common Commands

hdfs dfs -ls /               # List root directory
hdfs dfs -mkdir -p /a/b     # Create directory recursively
hdfs dfs -put local.txt /   # Upload file
hdfs dfs -get /test.txt .   # Download file
hdfs dfs -cat /test.txt     # View file content
hdfs dfs -cp /a /b          # Copy
hdfs dfs -mv /a /b          # Move/rename
hdfs dfs -rm /test.txt      # Delete file
hdfs dfs -rmdir /dir        # Delete empty directory
hdfs dfs -rm -r /dir        # Delete directory recursively

Next article: Big Data 08 - HDFS CLI Practice