Spark Cluster Architecture & Deployment Modes

This is article 71 in the Big Data series, introducing Spark cluster core architecture, deployment mode comparisons, and static/dynamic resource management strategies.

Core Architecture Components

Spark cluster consists of three key roles: Driver Program, Cluster Manager, and Executor.

Driver Program

Driver is the entry and control center of Spark application, responsible for three core tasks:

SparkContext management: Create and maintain SparkContext, provide cluster connection, RDD operation interfaces and job scheduling capabilities
Task scheduling: Convert user code to DAG (Directed Acyclic Graph) execution plan, then split into Stages and Tasks to distribute to Executors
Execution monitoring: Track task success/failure status, handle failure retries and resource cleanup

Driver’s complete lifecycle: initialization → execution → result processing → resource cleanup.

Cluster Manager

Cluster Manager is responsible for entire cluster resource allocation and task coordination. Spark supports four mainstream resource managers:

Manager	Use Case	Core Features
Standalone	Development/testing	Simple deployment, no extra dependencies
YARN	Enterprise big data platform	Deep Hadoop integration, supports multi-tenancy
Mesos	Mixed workload clusters	Fine-grained scheduling, strong horizontal scaling
Kubernetes	Cloud-native applications	Container orchestration, supports elastic scaling

In production, clusters coexisting with Hadoop typically choose YARN; cloud-native scenarios prioritize Kubernetes.

Executor

Executor is a JVM process running on Worker nodes, responsible for:

Executing Tasks assigned by Driver
Caching intermediate computation results in memory to accelerate iterative computation
Writing final results back to HDFS or returning to Driver

Deployment Modes

Local Mode

Run on single machine, no cluster needed, suitable for local development and debugging:

spark-shell --master local[*]   # Use all CPU cores
spark-shell --master local[4]   # Use 4 threads

Cluster Mode (Client vs Cluster)

When submitting to real cluster, two sub-modes:

Client mode: Driver runs on client node that submits task, logs directly output to terminal, suitable for interactive debugging
Cluster mode: Driver runs on cluster internal node, client disconnect doesn’t affect job execution, suitable for production

# Client mode submission
spark-submit --master yarn --deploy-mode client --class icu.wzk.App app.jar

# Cluster mode submission
spark-submit --master yarn --deploy-mode cluster --class icu.wzk.App app.jar

Cluster Startup Process

Start Hadoop Cluster

start-all.sh

Start Spark Standalone Cluster

cd /opt/servers/spark-2.4.5/sbin
./start-all.sh

After startup, access http://<master-ip>:8080 to view Spark Master Web UI.

Verify Cluster Status

# Run built-in Pi example to verify cluster is working
run-example SparkPi 10

Resource Management Strategies

Static Resource Allocation

Pre-specify fixed resources in config file or submission command:

spark-submit \
  --executor-memory 4g \
  --executor-cores 2 \
  --num-executors 10 \
  --class icu.wzk.App app.jar

Suitable for resource-exclusive, stable load batch processing scenarios.

Dynamic Resource Allocation

Automatically scale Executor count based on actual workload, enable in spark-defaults.conf:

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=2
spark.dynamicAllocation.maxExecutors=20
spark.dynamicAllocation.executorIdleTimeout=60s

Suitable for resource-sharing, fluctuating load interactive or stream processing scenarios, avoids resource idle waste.

Monitoring and Tuning

Spark UI (port 4040): View Job, Stage, Task execution status and data skew
History Server: Persist completed job execution logs for post-analysis
Ganglia/Prometheus: Cluster-level CPU, memory, network monitoring

Reasonably configuring parallelism (spark.default.parallelism) and serialization (Kryo vs Java) are two common tuning entry points.