This is article 2 in the Big Data series, continuing from environment setup, focusing on explaining the meaning and writing of each Hadoop cluster XML configuration file.
Complete illustrated version: CSDN Original | Juejin
Hadoop Cluster Configuration = HDFS + MapReduce + YARN
Hadoop cluster configuration consists of three parts:
- HDFS Cluster Configuration:
core-site.xml,hdfs-site.xml - MapReduce Cluster Configuration:
mapred-site.xml - YARN Cluster Configuration:
yarn-site.xml
Three-node plan: h121 (NameNode), h122 (DataNode), h123 (SecondaryNameNode + ResourceManager)
1. hadoop-env.sh
Configure Java path to avoid JDK not found during cluster runtime:
export JAVA_HOME=/opt/jdk/jdk1.8.0_202
2. core-site.xml (NameNode Address)
<configuration>
<!-- HDFS Access Address -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://h121.wzk.icu:9000</value>
</property>
<!-- Temporary Data Directory -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/servers/hadoop-2.9.2/data/tmp</value>
</property>
</configuration>
3. hdfs-site.xml (SecondaryNameNode + Replication Factor)
<configuration>
<!-- SecondaryNameNode Address -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>h123.wzk.icu:50090</value>
</property>
<!-- Replication factor, set to 3 for 3-node cluster -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Also write three nodes in the slaves file:
h121.wzk.icu
h122.wzk.icu
h123.wzk.icu
4. mapred-site.xml (MapReduce Runtime Framework)
First copy the template file:
cp mapred-site.xml.template mapred-site.xml
Configure MapReduce to use YARN as the resource scheduling framework:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5. yarn-site.xml (ResourceManager Address)
<configuration>
<!-- ResourceManager Host -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>h123.wzk.icu</value>
</property>
<!-- MapReduce Shuffle Service -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Notes
- When deploying on public cloud servers, domain names in
/etc/hostsmust map to0.0.0.0(not127.x.x.x), otherwise services cannot be exposed externally - All configuration files need to be synchronized to three nodes (use rsync distribution script)
- Unified permissions:
chown -R hadoop:hadoop /opt/servers/hadoop-2.9.2
After configuration, continue to article 3: SSH Passwordless Login and Distribution Script