This is article 8 in the Big Data series. After understanding HDFS read/write principles, practice hadoop fs command line tool on the cluster.

Complete illustrated version: CSDN Original | Juejin

Prerequisite: Start Cluster

start-dfs.sh
start-yarn.sh

Confirm three DataNodes are online: http://h121.wzk.icu:50070

Command Format

HDFS commands have two equivalent forms:

hdfs dfs -<command> [options] [args]
hadoop fs -<command> [options] [args]  # Older syntax, same effect

Directory Operations

# List directory (-h shows human-readable sizes)
hdfs dfs -ls /
hdfs dfs -ls -h /user

# Create directory (-p creates recursively)
hdfs dfs -mkdir /test
hdfs dfs -mkdir -p /test/2024/07

# Delete directory (-r recursive, -skipTrash skip trash)
hdfs dfs -rm -r /test/2024

File Upload

# Upload (keep local file)
hdfs dfs -put /local/file.txt /hdfs/path/

# Move upload (delete local file after upload)
hdfs dfs -moveFromLocal /local/file.txt /hdfs/path/

# Write from stdin (suitable for small files)
echo "hello" | hdfs dfs -put - /test/hello.txt

File Download

# Download to local
hdfs dfs -get /hdfs/file.txt /local/path/

# Copy to local (same as get)
hdfs dfs -copyToLocal /hdfs/file.txt /local/path/

# View file content
hdfs dfs -cat /test/hello.txt
hdfs dfs -tail /test/bigfile.txt  # View end of file

File Management

# Copy within HDFS
hdfs dfs -cp /src/file.txt /dst/

# Move/rename within HDFS
hdfs dfs -mv /old.txt /new.txt

# Delete file
hdfs dfs -rm /test/file.txt

# Delete and skip trash
hdfs dfs -rm -skipTrash /test/file.txt

View File Information

# Count directory size
hdfs dfs -du -h /test

# Count files in directory
hdfs dfs -count /test

# View Block information
hdfs fsck /test/file.txt -files -blocks -locations

Permission Management

# Change permissions (like chmod)
hdfs dfs -chmod 755 /test

# Change ownership
hdfs dfs -chown hadoop:hadoop /test

# Recursive change
hdfs dfs -chmod -R 755 /test

Merge Small Files Download

Merge multiple small files and download to local:

hdfs dfs -getmerge /test/input/ /local/merged.txt

Secondary NameNode Mechanism

NameNode maintains FsImage (metadata snapshot) and EditLog (operation log) in memory. Secondary NameNode’s responsibilities:

  1. Periodically download FsImage and EditLog from NameNode
  2. Merge EditLog into FsImage, generate new FsImage
  3. Upload new FsImage to NameNode

This prevents EditLog from growing infinitely and causing long NameNode restart times.

Next article: Big Data 09 - HDFS Java Client