This is article 7 in the Big Data series. Understanding HDFS architecture design and read/write principles is the theoretical foundation for subsequent HDFS practice.
Complete illustrated version: CSDN Original | Juejin
HDFS Architecture
HDFS (Hadoop Distributed File System) uses Master/Slave architecture:
| Role | Description |
|---|---|
| NameNode | Manages namespace, maintains Block to DataNode mapping, handles Client metadata requests |
| DataNode | Stores actual data Blocks, sends heartbeats and block reports to NameNode periodically |
| Secondary NameNode | Periodically merges EditLog and FsImage, relieves NameNode pressure (not hot backup) |
| Client | Splits files when uploading, gets location info from NameNode, interacts directly with DataNodes |
Block Mechanism
- Hadoop 2.x default Block size: 128MB (Hadoop 1.x was 64MB)
- Each Block has 3 replicas by default, distributed across different DataNodes
- Large files exceeding 128MB are split into multiple Blocks for distributed storage
HDFS Features
| Feature | Description |
|---|---|
| High Fault Tolerance | 3 replica redundancy, automatic recovery from DataNode failures |
| High Throughput | Sequential read/write, suitable for large file batch processing |
| Large File Friendly | Suitable for TB/PB level data |
| Not Suitable For | Low-latency random read/write, many small files |
| Write Model | Write-once-read-many, supports append, does not support random modification |
Read Process
- Client sends read request to NameNode (file path)
- NameNode returns Block list with DataNode locations for each Block
- Client selects nearest DataNode (local → same rack → cross-rack)
- Client receives data in Packet units, caches locally, then writes to target file
Write Process
- Client applies to NameNode for write (provides file path and Block size)
- NameNode allocates DataNode list (dn1, dn2, dn3), returns to Client
- Client connects to dn1, establishes Pipeline (dn1 → dn2 → dn3)
- Client splits Block into Packets, sends to dn1, dn1 forwards to dn2, sequentially
- After all replicas complete, dn3 confirms to dn2, dn2 confirms to dn1, dn1 confirms to Client
- Client notifies NameNode write complete, updates metadata
Common Commands
hdfs dfs -ls / # List root directory
hdfs dfs -mkdir -p /a/b # Create directory recursively
hdfs dfs -put local.txt / # Upload file
hdfs dfs -get /test.txt . # Download file
hdfs dfs -cat /test.txt # View file content
hdfs dfs -cp /a /b # Copy
hdfs dfs -mv /a /b # Move/rename
hdfs dfs -rm /test.txt # Delete file
hdfs dfs -rmdir /dir # Delete empty directory
hdfs dfs -rm -r /dir # Delete directory recursively
Next article: Big Data 08 - HDFS CLI Practice