Article Overview
Deploying Flink in YARN mode requires completing a series of environment configuration and cluster management operations. First, configure environment variables on each node including HADOOP_CONF_DIR, YARN_CONF_DIR and HADOOP_CLASSPATH, and add Hadoop and Flink paths to profile. Then modify yarn-site.xml to specify ResourceManager address and disable virtual memory and physical memory checks to avoid resource limit issues. Each node in the cluster (h121, h122, h123) needs consistent configuration, which can be synchronized using rsync.
1. Environment Variable Configuration
vim /etc/profile
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`
Exit and save, then refresh environment variables.
2. yarn-site.xml Configuration
cd /opt/servers/hadoop-2.9.2/etc/hadoop
vim yarn-site.xml
Add the following configuration:
<!-- YARN Flink Related -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>h123.wzk.icu:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>h123.wzk.icu:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>h123.wzk.icu:8031</value>
</property>
3. Sync Configuration
Configure the same content on h121, h122, h123 three machines, can be synchronized using rsync.
4. Service Stop Process
Stop Hadoop cluster:
cd /opt/servers/hadoop-2.9.2/sbin
stop-all.sh
Stop YARN cluster (execute on h123 node):
start-yarn.sh
Stop Flink (execute on h121 node):
./stop-cluster.sh
5. Service Start Process
Start Hadoop cluster:
start-all.sh
Start YARN cluster (execute on h123 node):
start-yarn.sh
6. Apply for Resources
View help:
cd /opt/servers/flink-1.11.1/bin/
./yarn-session.sh -h
Apply for resources command:
./yarn-session.sh -n 2 -tm 800 -s 1 -d
Parameter description:
-n: Apply for 2 containers (TaskManager count)-s: Number of Slots per TaskManager-tm: Memory size per TaskManager-d: Run in background
Submit Flink job:
./flink run -m yarn-cluster -yn 2 -yjm 1024 -ytm 1024 /opt/wzk/WordCount.jar
Parameter description:
-m: JobManager address-yn: Number of TaskManagers-yjm: JobManager memory-ytm: TaskManager memory
Stop yarn-cluster:
yarn application -kill application_xxxxxxxxx
7. yarn-session Mode Description
Using yarn-session starts two necessary services: JobManager and TaskManager. Clients submit jobs through Flink run. yarn-session keeps running and continuously receives jobs submitted by clients. Flink clusters created this way will occupy resources exclusively. If there are many small jobs with short working hours, this method is suitable to reduce resource creation time.