This is article 71 in the Big Data series, introducing Spark cluster core architecture, deployment mode comparisons, and static/dynamic resource management strategies.

Core Architecture Components

Spark cluster consists of three key roles: Driver Program, Cluster Manager, and Executor.

Driver Program

Driver is the entry and control center of Spark application, responsible for three core tasks:

  • SparkContext management: Create and maintain SparkContext, provide cluster connection, RDD operation interfaces and job scheduling capabilities
  • Task scheduling: Convert user code to DAG (Directed Acyclic Graph) execution plan, then split into Stages and Tasks to distribute to Executors
  • Execution monitoring: Track task success/failure status, handle failure retries and resource cleanup

Driver’s complete lifecycle: initialization → execution → result processing → resource cleanup.

Cluster Manager

Cluster Manager is responsible for entire cluster resource allocation and task coordination. Spark supports four mainstream resource managers:

ManagerUse CaseCore Features
StandaloneDevelopment/testingSimple deployment, no extra dependencies
YARNEnterprise big data platformDeep Hadoop integration, supports multi-tenancy
MesosMixed workload clustersFine-grained scheduling, strong horizontal scaling
KubernetesCloud-native applicationsContainer orchestration, supports elastic scaling

In production, clusters coexisting with Hadoop typically choose YARN; cloud-native scenarios prioritize Kubernetes.

Executor

Executor is a JVM process running on Worker nodes, responsible for:

  1. Executing Tasks assigned by Driver
  2. Caching intermediate computation results in memory to accelerate iterative computation
  3. Writing final results back to HDFS or returning to Driver

Deployment Modes

Local Mode

Run on single machine, no cluster needed, suitable for local development and debugging:

spark-shell --master local[*]   # Use all CPU cores
spark-shell --master local[4]   # Use 4 threads

Cluster Mode (Client vs Cluster)

When submitting to real cluster, two sub-modes:

  • Client mode: Driver runs on client node that submits task, logs directly output to terminal, suitable for interactive debugging
  • Cluster mode: Driver runs on cluster internal node, client disconnect doesn’t affect job execution, suitable for production
# Client mode submission
spark-submit --master yarn --deploy-mode client --class icu.wzk.App app.jar

# Cluster mode submission
spark-submit --master yarn --deploy-mode cluster --class icu.wzk.App app.jar

Cluster Startup Process

Start Hadoop Cluster

start-all.sh

Start Spark Standalone Cluster

cd /opt/servers/spark-2.4.5/sbin
./start-all.sh

After startup, access http://<master-ip>:8080 to view Spark Master Web UI.

Verify Cluster Status

# Run built-in Pi example to verify cluster is working
run-example SparkPi 10

Resource Management Strategies

Static Resource Allocation

Pre-specify fixed resources in config file or submission command:

spark-submit \
  --executor-memory 4g \
  --executor-cores 2 \
  --num-executors 10 \
  --class icu.wzk.App app.jar

Suitable for resource-exclusive, stable load batch processing scenarios.

Dynamic Resource Allocation

Automatically scale Executor count based on actual workload, enable in spark-defaults.conf:

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=2
spark.dynamicAllocation.maxExecutors=20
spark.dynamicAllocation.executorIdleTimeout=60s

Suitable for resource-sharing, fluctuating load interactive or stream processing scenarios, avoids resource idle waste.

Monitoring and Tuning

  • Spark UI (port 4040): View Job, Stage, Task execution status and data skew
  • History Server: Persist completed job execution logs for post-analysis
  • Ganglia/Prometheus: Cluster-level CPU, memory, network monitoring

Reasonably configuring parallelism (spark.default.parallelism) and serialization (Kryo vs Java) are two common tuning entry points.