This is article 9 in the Big Data series. Learn to operate HDFS through Java code, master Hadoop’s Java Client API.

Complete illustrated version with code: CSDN Original | Juejin

Maven Dependencies

<dependencies>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.9.0</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.9.0</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.9.0</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-common</artifactId>
    <version>2.9.0</version>
  </dependency>
</dependencies>

Three Core Classes

ClassDescription
FileSystemAbstract base class for HDFS operations, entry point for all file operations
PathHDFS path wrapper (different from java.io.File)
ConfigurationLoads Hadoop configuration (core-site.xml, etc.)

Initialize FileSystem

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://h121.wzk.icu:9000");
FileSystem fs = FileSystem.get(conf);

Create Directory

fs.mkdirs(new Path("/test/java"));

Upload File (PUT)

// copyFromLocalFile(src, dst)
fs.copyFromLocalFile(
    new Path("/local/test.txt"),
    new Path("/test/java/test.txt")
);

Download File (GET)

// copyToLocalFile(src, dst)
fs.copyToLocalFile(
    new Path("/test/java/test.txt"),
    new Path("/local/download/test.txt")
);

Delete File

// delete(path, recursive)
fs.delete(new Path("/test/java/test.txt"), false);
// Recursive delete directory
fs.delete(new Path("/test/java"), true);

List Directory

FileStatus[] statuses = fs.listStatus(new Path("/test"));
for (FileStatus s : statuses) {
    System.out.println(s.getPath().getName() + " - " + s.getLen());
}

Stream Operations (With Progress Bar)

// PUT (with progress display)
try (FSDataOutputStream out = fs.create(new Path("/test/stream.txt"),
        progress -> System.out.print("."))) {
    out.write("hello hdfs".getBytes());
}

// GET (stream read)
try (FSDataInputStream in = fs.open(new Path("/test/stream.txt"))) {
    byte[] buffer = new byte[1024];
    int len;
    while ((len = in.read(buffer)) != -1) {
        System.out.write(buffer, 0, len);
    }
}

Seek Positioning Read

FSDataInputStream in = fs.open(new Path("/test/file.txt"));
in.seek(100);  // Jump to byte 100
// Continue reading...
in.close();

Close When Done

fs.close();

Complete project code (including log4j.properties configuration) see CSDN Original.

Next article: Big Data 10 - Java MapReduce WordCount