This is article 6 in the Big Data series. Configure JobHistoryServer (JHS) to view MapReduce job history, and enable log aggregation function.

Complete illustrated version: CSDN Original | Juejin

Role of JobHistoryServer

JHS records detailed information of completed MapReduce jobs, including:

  • Job status (success/failure)
  • Resource usage (CPU, memory)
  • Execution time and status of each Task

With JHS, no need to view while job is running, can analyze anytime after job completion.

Configuration Steps

1. Modify mapred-site.xml

Add in $HADOOP_HOME/etc/hadoop/mapred-site.xml:

<configuration>
  <!-- MapReduce uses YARN -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <!-- JobHistoryServer RPC Address -->
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>h121.wzk.icu:10020</value>
  </property>
  <!-- JobHistoryServer Web UI Address -->
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>h121.wzk.icu:19888</value>
  </property>
</configuration>

2. Modify yarn-site.xml (Enable Log Aggregation)

<configuration>
  <!-- ResourceManager Host -->
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>h123.wzk.icu</value>
  </property>
  <!-- Shuffle Service -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <!-- Enable Log Aggregation -->
  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>
  <!-- Retain logs for 7 days (unit: seconds) -->
  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
  </property>
</configuration>

3. Sync Configuration to All Nodes

xsync $HADOOP_HOME/etc/hadoop/

4. Start JobHistoryServer

mr-jobhistory-daemon.sh start historyserver

Verify process:

jps
# Should show JobHistoryServer

Access Web UI

After starting, access: http://h121.wzk.icu:19888/jobhistory

On history interface can see completed MapReduce job list, click job to view:

  • Job Summary (resource statistics)
  • Map/Reduce Tasks details
  • Each Task’s logs (need to enable log aggregation)

Test Verification

Re-run WordCount job:

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar \
  wordcount /test/input /wcoutput2

After job completes, can view complete execution history in JHS Web UI.

Next article: Big Data 07 - HDFS Read/Write Principle