This is article 8 in the Big Data series. After understanding HDFS read/write principles, practice hadoop fs command line tool on the cluster.
Complete illustrated version: CSDN Original | Juejin
Prerequisite: Start Cluster
start-dfs.sh
start-yarn.sh
Confirm three DataNodes are online: http://h121.wzk.icu:50070
Command Format
HDFS commands have two equivalent forms:
hdfs dfs -<command> [options] [args]
hadoop fs -<command> [options] [args] # Older syntax, same effect
Directory Operations
# List directory (-h shows human-readable sizes)
hdfs dfs -ls /
hdfs dfs -ls -h /user
# Create directory (-p creates recursively)
hdfs dfs -mkdir /test
hdfs dfs -mkdir -p /test/2024/07
# Delete directory (-r recursive, -skipTrash skip trash)
hdfs dfs -rm -r /test/2024
File Upload
# Upload (keep local file)
hdfs dfs -put /local/file.txt /hdfs/path/
# Move upload (delete local file after upload)
hdfs dfs -moveFromLocal /local/file.txt /hdfs/path/
# Write from stdin (suitable for small files)
echo "hello" | hdfs dfs -put - /test/hello.txt
File Download
# Download to local
hdfs dfs -get /hdfs/file.txt /local/path/
# Copy to local (same as get)
hdfs dfs -copyToLocal /hdfs/file.txt /local/path/
# View file content
hdfs dfs -cat /test/hello.txt
hdfs dfs -tail /test/bigfile.txt # View end of file
File Management
# Copy within HDFS
hdfs dfs -cp /src/file.txt /dst/
# Move/rename within HDFS
hdfs dfs -mv /old.txt /new.txt
# Delete file
hdfs dfs -rm /test/file.txt
# Delete and skip trash
hdfs dfs -rm -skipTrash /test/file.txt
View File Information
# Count directory size
hdfs dfs -du -h /test
# Count files in directory
hdfs dfs -count /test
# View Block information
hdfs fsck /test/file.txt -files -blocks -locations
Permission Management
# Change permissions (like chmod)
hdfs dfs -chmod 755 /test
# Change ownership
hdfs dfs -chown hadoop:hadoop /test
# Recursive change
hdfs dfs -chmod -R 755 /test
Merge Small Files Download
Merge multiple small files and download to local:
hdfs dfs -getmerge /test/input/ /local/merged.txt
Secondary NameNode Mechanism
NameNode maintains FsImage (metadata snapshot) and EditLog (operation log) in memory. Secondary NameNode’s responsibilities:
- Periodically download FsImage and EditLog from NameNode
- Merge EditLog into FsImage, generate new FsImage
- Upload new FsImage to NameNode
This prevents EditLog from growing infinitely and causing long NameNode restart times.
Next article: Big Data 09 - HDFS Java Client