This article mainly explains Flink parallelism setting priority, principles, configuration and best practices.

Parallelism Setting Methods (in order of priority from high to low):

  • Operator Level: Specified through setParallelism() method, e.g., .map(...).setParallelism(10)
  • Environment Level: env.setParallelism(4)
  • Client Level: Specified through -p parameter of flink command, e.g., flink run -p 10 your-job.jar
  • System Level: Configure in flink-conf.yaml with parallelism.default: 4

Parallelism Optimization Suggestions:

  • Parallelism should match available slot count
  • Simple ETL jobs: 4-8; Complex computation/window aggregation: 8-16; Machine learning/graph computing: 16-32+
  • Start testing from low parallelism, gradually increase until resource utilization reaches 70-80%
  • I/O-intensive operations recommend using moderate or lower parallelism; CPU-intensive operations can use higher parallelism