Article Overview

This article provides a detailed introduction to the complete process of deploying Flink in YARN mode, including environment variable configuration, yarn-site.xml configuration, resource application, and job submission.

Environment Variable Configuration

Configure in /etc/profile:

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`

yarn-site.xml Configuration

Need to add the following key configurations:

<!-- YARN Flink related -->
<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>h123.wzk.icu:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>h123.wzk.icu:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>h123.wzk.icu:8031</value>
</property>

Sync Configuration

Need to keep configuration consistent across h121, h122, h123 three nodes. Can use rsync tool for synchronization.

Service Management

Stop Services

# Stop Hadoop
cd /opt/servers/hadoop-2.9.2/sbin
stop-all.sh

# Stop YARN (execute on h123 node)
start-yarn.sh

# Stop Flink (execute on h121 node)
./stop-cluster.sh

Start Services

# Start Hadoop (h121 node)
start-all.sh

# Start YARN (h123 node)
start-yarn.sh

Apply for Resources

yarn-session.sh Usage

./yarn-session.sh -n 2 -tm 800 -s 1 -d

Parameter description:

  • -n: Apply for 2 containers (TaskManager count)
  • -s: Each TaskManager’s Slots count
  • -tm: Each TaskManager’s memory size
  • -d: Run in background

Note: Even if writing -n 2, actually applies for 3 Containers, because ApplicationMaster and JobManager occupy one extra container.

Submit for Execution

Method 1: Session Mode

Apply for resources first, then submit job:

./yarn-session.sh -n 2 -tm 800 -s 1 -d

Method 2: Direct Submission

./flink run -m yarn-cluster -yn 2 -yjm 1024 -ytm 1024 /opt/wzk//WordCount.jar

Parameter description:

  • -m: JobManager address
  • -yn: TaskManager count

Stop yarn-cluster

yarn application -kill application_xxxxxxxxx

Configuration Points Summary

ConfigurationDescription
HADOOP_CONF_DIRHadoop configuration directory
YARN_CONF_DIRYARN configuration directory
HADOOP_CLASSPATHHadoop classpath
yarn.nodemanager.pmem-check-enabledDisable physical memory check
yarn.nodemanager.vmem-check-enabledDisable virtual memory check
yarn.resourcemanager.addressResourceManager address

This article demonstrates the complete deployment process of Flink on YARN through detailed steps, including environment preparation, configuration modification, service start/stop, and resource application.