This article mainly explains Flink parallelism setting priority, principles, configuration and best practices.
Parallelism Setting Methods (in order of priority from high to low):
- Operator Level: Specified through setParallelism() method, e.g.,
.map(...).setParallelism(10) - Environment Level:
env.setParallelism(4) - Client Level: Specified through -p parameter of flink command, e.g.,
flink run -p 10 your-job.jar - System Level: Configure in flink-conf.yaml with
parallelism.default: 4
Parallelism Optimization Suggestions:
- Parallelism should match available slot count
- Simple ETL jobs: 4-8; Complex computation/window aggregation: 8-16; Machine learning/graph computing: 16-32+
- Start testing from low parallelism, gradually increase until resource utilization reaches 70-80%
- I/O-intensive operations recommend using moderate or lower parallelism; CPU-intensive operations can use higher parallelism