Hadoop JobHistoryServer Configuration and Log Aggregation

This is article 6 in the Big Data series. Configure JobHistoryServer (JHS) to view MapReduce job history, and enable log aggregation function.

Complete illustrated version: CSDN Original | Juejin

Role of JobHistoryServer

JHS records detailed information of completed MapReduce jobs, including:

Job status (success/failure)
Resource usage (CPU, memory)
Execution time and status of each Task

With JHS, no need to view while job is running, can analyze anytime after job completion.

Configuration Steps

1. Modify mapred-site.xml

Add in $HADOOP_HOME/etc/hadoop/mapred-site.xml:

<configuration>
  <!-- MapReduce uses YARN -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <!-- JobHistoryServer RPC Address -->
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>h121.wzk.icu:10020</value>
  </property>
  <!-- JobHistoryServer Web UI Address -->
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>h121.wzk.icu:19888</value>
  </property>
</configuration>

2. Modify yarn-site.xml (Enable Log Aggregation)

<configuration>
  <!-- ResourceManager Host -->
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>h123.wzk.icu</value>
  </property>
  <!-- Shuffle Service -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <!-- Enable Log Aggregation -->
  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>
  <!-- Retain logs for 7 days (unit: seconds) -->
  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
  </property>
</configuration>

3. Sync Configuration to All Nodes

xsync $HADOOP_HOME/etc/hadoop/

4. Start JobHistoryServer

mr-jobhistory-daemon.sh start historyserver

Verify process:

jps
# Should show JobHistoryServer

Access Web UI

After starting, access: http://h121.wzk.icu:19888/jobhistory

On history interface can see completed MapReduce job list, click job to view:

Job Summary (resource statistics)
Map/Reduce Tasks details
Each Task’s logs (need to enable log aggregation)

Test Verification

Re-run WordCount job:

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar \
  wordcount /test/input /wcoutput2

After job completes, can view complete execution history in JHS Web UI.

Next article: Big Data 07 - HDFS Read/Write Principle