Big Data 218 - Prometheus Node Exporter & Pushgateway

TL;DR

Scenario: Add host metrics and short-task metrics collection to Prometheus on Rocky Linux (CentOS-like).

Conclusion: Long-running services use node_exporter (pull), short tasks/batch use pushgateway (push→pull); Pushgateway needs to handle ‘stale data’ and ‘single point’ issues.

Output: node_exporter-1.8.2 and pushgateway-1.10.0 installation/startup process, Prometheus job configuration, common fault location and fix cards.

Node Exporter

Download Configuration

cd /opt/software
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz

Extract and Configure

cd /opt/software
tar -zxvf node_exporter-1.8.2.linux-amd64.tar.gz
mv node_exporter-1.8.2.linux-amd64 ../servers/

Start Service

cd /opt/servers/node_exporter-1.8.2.linux-amd64
./node_exporter

Common Metrics

node_exporter exposes 200+ system-level metrics:

Metric	Description
node_cpu_seconds_total	CPU time in different modes
node_memory_MemTotal_bytes	Total physical memory
node_memory_MemAvailable_bytes	Available memory
node_disk_reads_bytes_total	Total bytes read from disk
node_disk_writes_bytes_total	Total bytes written to disk
node_network_receive_bytes_total	Network interface received bytes
node_network_transmit_bytes_total	Network interface transmitted bytes
node_filesystem_avail_bytes	Filesystem available space
node_load1/5/15	System load averages

Prometheus Configuration

Add to prometheus.yml:

  - job_name: "node_exporter"
    static_configs:
      - targets: ["<host>:9100"]

PushGateway

Basic Introduction

Prometheus Pushgateway is a specially designed middleware component to help Prometheus monitor short-lived tasks and batch jobs.

In the standard Prometheus monitoring system, Prometheus server uses pull model, periodically fetching metric data from monitored services’ HTTP endpoints (usually /metrics). This pattern works well for long-running daemons and services.

However, for special scenarios:

Short-lived tasks: One-time scripts, scheduled tasks (cron jobs)
Batch jobs: ETL processes, data analysis tasks, etc.
Services that cannot directly expose metrics: Tasks running in restricted environments

How Pushgateway Works

Tasks push metric data to Pushgateway at startup or during execution
Pushgateway persistently stores these metrics
Prometheus server pulls these metrics like monitoring regular targets
Metrics remain in Pushgateway until overwritten by new data or manually deleted

Use Cases

Short-lived jobs: Batch scripts, one-time tasks
Cron jobs: Scheduled tasks that run periodically
CI/CD pipelines: Build, test, deployment status metrics
Batch processing: ETL jobs, data import/export tasks

Important Notes

Persistence: Pushgateway doesn’t persist data by default, data lost after restart
Stale data: Pushgateway is suitable for one-time batch data pushes, recommend using push_time_seconds label to track push time
Avoid overuse: Pushgateway is for short-term tasks, not recommended for long-term task monitoring

Pushgateway Download Configuration

cd /opt/software
wget https://github.com/prometheus/pushgateway/releases/download/v1.10.0/pushgateway-1.10.0.linux-amd64.tar.gz

tar -zxvf pushgateway-1.10.0.linux-amd64.tar.gz
mv pushgateway-1.10.0.linux-amd64 ../servers/

Configure Service

cp pushgateway ../prometheus-2.53.2.linux-amd64/
chmod +x pushgateway

Need to modify prometheus.yml to add Pushgateway configuration:

  - job_name: 'pushgateway'
    static_configs:
      - targets: ['localhost:9091']

Push Metrics Example

from prometheus_client import CollectorRegistry, Gauge, push_to_gateway

registry = CollectorRegistry()
g = Gauge('job_last_success_unixtime', 'Last successful job run', registry=registry)
g.set_to_current_time()

push_to_gateway('localhost:9091', job='my_batch_job', registry=registry)

Pushgateway Limitations

Single point of failure: Pushgateway itself has no HA
No automatic expiration: Metrics remain until manually deleted
Prometheus scrape semantics: UP only covers Pushgateway itself, not the actual job

Error Quick Reference

Symptom	Root Cause	Fix
Prometheus Targets shows DOWN (node_exporter)	Process not running/exited	Check process and listening port on machine; restart node_exporter; recommend using systemd management
Targets DOWN but localhost can curl /metrics	Prometheus to target network不通	Check connectivity from Prometheus machine to target ip:port
Targets repeatedly UP/DOWN	Port conflict or unstable process	Check startup logs and system logs
node_exporter startup failed: permission denied	File no execute permission/download corrupted	ls -l check permissions
Can’t see expected job metrics	No job has pushed metrics to pushgateway	Access Pushgateway /metrics
Panel shows stale data long-term (Pushgateway)	Pushgateway metrics don’t auto-expire	Observe metric push time
Prometheus only sees Pushgateway UP	UP semantics only cover Pushgateway service itself	Compare Pushgateway target status with job metrics