Flink on YARN Deployment: Environment Variables, Configur...

Article Overview

Deploying Flink in YARN mode requires completing a series of environment configuration and cluster management operations. First, configure environment variables on each node including HADOOP_CONF_DIR, YARN_CONF_DIR and HADOOP_CLASSPATH, and add Hadoop and Flink paths to profile. Then modify yarn-site.xml to specify ResourceManager address and disable virtual memory and physical memory checks to avoid resource limit issues. Each node in the cluster (h121, h122, h123) needs consistent configuration, which can be synchronized using rsync.

1. Environment Variable Configuration

vim /etc/profile
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`

Exit and save, then refresh environment variables.

2. yarn-site.xml Configuration

cd /opt/servers/hadoop-2.9.2/etc/hadoop
vim yarn-site.xml

Add the following configuration:

<!-- YARN Flink Related -->
<property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
</property>
<property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
</property>
<property>
        <name>yarn.resourcemanager.address</name>
        <value>h123.wzk.icu:8032</value>
</property>
<property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>h123.wzk.icu:8030</value>
</property>
<property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>h123.wzk.icu:8031</value>
</property>

3. Sync Configuration

Configure the same content on h121, h122, h123 three machines, can be synchronized using rsync.

4. Service Stop Process

Stop Hadoop cluster:

cd /opt/servers/hadoop-2.9.2/sbin
stop-all.sh

Stop YARN cluster (execute on h123 node):

start-yarn.sh

Stop Flink (execute on h121 node):

./stop-cluster.sh

5. Service Start Process

Start Hadoop cluster:

start-all.sh

Start YARN cluster (execute on h123 node):

start-yarn.sh

6. Apply for Resources

View help:

cd /opt/servers/flink-1.11.1/bin/
./yarn-session.sh -h

Apply for resources command:

./yarn-session.sh -n 2 -tm 800 -s 1 -d

Parameter description:

-n: Apply for 2 containers (TaskManager count)
-s: Number of Slots per TaskManager
-tm: Memory size per TaskManager
-d: Run in background

Submit Flink job:

./flink run -m yarn-cluster -yn 2 -yjm 1024 -ytm 1024 /opt/wzk/WordCount.jar

Parameter description:

-m: JobManager address
-yn: Number of TaskManagers
-yjm: JobManager memory
-ytm: TaskManager memory

Stop yarn-cluster:

yarn application -kill application_xxxxxxxxx

7. yarn-session Mode Description

Using yarn-session starts two necessary services: JobManager and TaskManager. Clients submit jobs through Flink run. yarn-session keeps running and continuously receives jobs submitted by clients. Flink clusters created this way will occupy resources exclusively. If there are many small jobs with short working hours, this method is suitable to reduce resource creation time.