This is article 6 in the Big Data series. Configure JobHistoryServer (JHS) to view MapReduce job history, and enable log aggregation function.
Complete illustrated version: CSDN Original | Juejin
Role of JobHistoryServer
JHS records detailed information of completed MapReduce jobs, including:
- Job status (success/failure)
- Resource usage (CPU, memory)
- Execution time and status of each Task
With JHS, no need to view while job is running, can analyze anytime after job completion.
Configuration Steps
1. Modify mapred-site.xml
Add in $HADOOP_HOME/etc/hadoop/mapred-site.xml:
<configuration>
<!-- MapReduce uses YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- JobHistoryServer RPC Address -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>h121.wzk.icu:10020</value>
</property>
<!-- JobHistoryServer Web UI Address -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>h121.wzk.icu:19888</value>
</property>
</configuration>
2. Modify yarn-site.xml (Enable Log Aggregation)
<configuration>
<!-- ResourceManager Host -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>h123.wzk.icu</value>
</property>
<!-- Shuffle Service -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- Enable Log Aggregation -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- Retain logs for 7 days (unit: seconds) -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
3. Sync Configuration to All Nodes
xsync $HADOOP_HOME/etc/hadoop/
4. Start JobHistoryServer
mr-jobhistory-daemon.sh start historyserver
Verify process:
jps
# Should show JobHistoryServer
Access Web UI
After starting, access: http://h121.wzk.icu:19888/jobhistory
On history interface can see completed MapReduce job list, click job to view:
- Job Summary (resource statistics)
- Map/Reduce Tasks details
- Each Task’s logs (need to enable log aggregation)
Test Verification
Re-run WordCount job:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar \
wordcount /test/input /wcoutput2
After job completes, can view complete execution history in JHS Web UI.
Next article: Big Data 07 - HDFS Read/Write Principle