Big Data 218 - Prometheus Node Exporter & Pushgateway

TL;DR

Scenario: Add host metrics and short-task metrics collection to Prometheus on Rocky Linux (CentOS-like).

Conclusion: Long-running services use node_exporter (pull), short tasks/batch use pushgateway (push→pull); Pushgateway needs to handle ‘stale data’ and ‘single point’ issues.

Output: node_exporter-1.8.2 and pushgateway-1.10.0 installation/startup process, Prometheus job configuration, common fault location and fix cards.

Node Exporter

Download Configuration

cd /opt/software
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz

Extract and Configure

cd /opt/software
tar -zxvf node_exporter-1.8.2.linux-amd64.tar.gz
mv node_exporter-1.8.2.linux-amd64 ../servers/

Start Service

cd /opt/servers/node_exporter-1.8.2.linux-amd64
./node_exporter

Common Metrics

node_exporter exposes 200+ system-level metrics:

MetricDescription
node_cpu_seconds_totalCPU time in different modes
node_memory_MemTotal_bytesTotal physical memory
node_memory_MemAvailable_bytesAvailable memory
node_disk_reads_bytes_totalTotal bytes read from disk
node_disk_writes_bytes_totalTotal bytes written to disk
node_network_receive_bytes_totalNetwork interface received bytes
node_network_transmit_bytes_totalNetwork interface transmitted bytes
node_filesystem_avail_bytesFilesystem available space
node_load1/5/15System load averages

Prometheus Configuration

Add to prometheus.yml:

  - job_name: "node_exporter"
    static_configs:
      - targets: ["<host>:9100"]

PushGateway

Basic Introduction

Prometheus Pushgateway is a specially designed middleware component to help Prometheus monitor short-lived tasks and batch jobs.

In the standard Prometheus monitoring system, Prometheus server uses pull model, periodically fetching metric data from monitored services’ HTTP endpoints (usually /metrics). This pattern works well for long-running daemons and services.

However, for special scenarios:

  1. Short-lived tasks: One-time scripts, scheduled tasks (cron jobs)
  2. Batch jobs: ETL processes, data analysis tasks, etc.
  3. Services that cannot directly expose metrics: Tasks running in restricted environments

How Pushgateway Works

  • Tasks push metric data to Pushgateway at startup or during execution
  • Pushgateway persistently stores these metrics
  • Prometheus server pulls these metrics like monitoring regular targets
  • Metrics remain in Pushgateway until overwritten by new data or manually deleted

Use Cases

  • Short-lived jobs: Batch scripts, one-time tasks
  • Cron jobs: Scheduled tasks that run periodically
  • CI/CD pipelines: Build, test, deployment status metrics
  • Batch processing: ETL jobs, data import/export tasks

Important Notes

  • Persistence: Pushgateway doesn’t persist data by default, data lost after restart
  • Stale data: Pushgateway is suitable for one-time batch data pushes, recommend using push_time_seconds label to track push time
  • Avoid overuse: Pushgateway is for short-term tasks, not recommended for long-term task monitoring

Pushgateway Download Configuration

cd /opt/software
wget https://github.com/prometheus/pushgateway/releases/download/v1.10.0/pushgateway-1.10.0.linux-amd64.tar.gz

tar -zxvf pushgateway-1.10.0.linux-amd64.tar.gz
mv pushgateway-1.10.0.linux-amd64 ../servers/

Configure Service

cp pushgateway ../prometheus-2.53.2.linux-amd64/
chmod +x pushgateway

Need to modify prometheus.yml to add Pushgateway configuration:

  - job_name: 'pushgateway'
    static_configs:
      - targets: ['localhost:9091']

Push Metrics Example

from prometheus_client import CollectorRegistry, Gauge, push_to_gateway

registry = CollectorRegistry()
g = Gauge('job_last_success_unixtime', 'Last successful job run', registry=registry)
g.set_to_current_time()

push_to_gateway('localhost:9091', job='my_batch_job', registry=registry)

Pushgateway Limitations

  1. Single point of failure: Pushgateway itself has no HA
  2. No automatic expiration: Metrics remain until manually deleted
  3. Prometheus scrape semantics: UP only covers Pushgateway itself, not the actual job

Error Quick Reference

SymptomRoot CauseFix
Prometheus Targets shows DOWN (node_exporter)Process not running/exitedCheck process and listening port on machine; restart node_exporter; recommend using systemd management
Targets DOWN but localhost can curl /metricsPrometheus to target network不通Check connectivity from Prometheus machine to target ip:port
Targets repeatedly UP/DOWNPort conflict or unstable processCheck startup logs and system logs
node_exporter startup failed: permission deniedFile no execute permission/download corruptedls -l check permissions
Can’t see expected job metricsNo job has pushed metrics to pushgatewayAccess Pushgateway /metrics
Panel shows stale data long-term (Pushgateway)Pushgateway metrics don’t auto-expireObserve metric push time
Prometheus only sees Pushgateway UPUP semantics only cover Pushgateway service itselfCompare Pushgateway target status with job metrics