Big Data 217 - Prometheus Installation & Configuration

TL;DR

Scenario: Single-machine deployment of Prometheus 2.53.2, pull node_exporter metrics from multiple hosts and verify Targets status.

Conclusion: Core is scrape_configs target accessibility and /metrics exposure consistency; alert and visualization chains need separate component deployment.

Output: Directly reusable installation/directory planning/configuration template + Targets verification path + troubleshooting and fix checklist.

Version Matrix

ItemDescription
Prometheus 2.53.2 (linux-amd64)Use official release tar.gz, extract and run ./prometheus in foreground
Static discovery (static_configs)Configure as targets: [“host:port”]
node_exporter port 9100Targets point to 9100 in this article
Prometheus Web UI /targetsUse http://:9090/targets to verify scrape success/failure and error reasons

Prometheus Architecture Design

Prometheus uses modular architecture with clear component responsibilities forming a complete monitoring solution.

  1. Prometheus Server - Core component, multi-process architecture:

    • Data collection: Polls targets at configured intervals to fetch metrics
    • Storage engine: Custom TSDB time-series database, supports efficient compressed storage
    • Query processing: Provides PromQL query language, supports instant and range queries
  2. Exporter System - Metrics transformation middleware:

    • System-level: node_exporter (collects CPU/memory/disk 200+ metrics)
    • Service-level: mysql_exporter, redis_exporter
    • Network probing: blackbox_exporter supports HTTP/ICMP/TCP protocol checks
  3. Alertmanager - Alert management subsystem:

    • Alert grouping: Merges related alerts for notification
    • Inhibition mechanism: Avoids cascading alert storms
    • Route distribution: Supports multi-receiver configuration
  4. Pushgateway - Special scenario solution:

    • Applicable scenarios: Short-cycle tasks like CronJob
    • Work mode: Task pushes metrics to gateway → Prometheus periodically pulls gateway

Data Model

Prometheus stores time-series data based on key-value pairs. Its data unit is time series, each time series consists of a unique metric name and a set of labels.

<metric name>{<label name>=<label value>, ...}

Example:

node_cpu_seconds_total{mode="idle", instance="h121.wzk.icu:9100"}

Data Collection Method

Prometheus uses Pull model for data collection: Prometheus periodically pulls data from configured target endpoints.

Query Language (PromQL)

Prometheus provides powerful query language PromQL for querying and analyzing stored data:

rate(http_requests_total[5m])  # Calculate HTTP request rate over past 5 minutes
avg_over_time(cpu_usage[1h])   # Calculate average CPU usage over past 1 hour

Common PromQL Functions

  • rate(): Calculate per-second rate
  • sum(), avg(), min(), max(): Aggregation functions
  • irate(): Instant rate (more sensitive to recent changes)
  • topk(), bottomk(): Get top/bottom K results

Download Configuration

cd /opt/software
wget https://github.com/prometheus/prometheus/releases/download/v2.53.2/prometheus-2.53.2.linux-amd64.tar.gz

Extract and Configure

tar -zxvf prometheus-2.53.2.linux-amd64.tar.gz
mv prometheus-2.53.2.linux-amd64 ../servers/

Modify Configuration

cd /opt/servers/prometheus-2.53.2.linux-amd64
vim prometheus.yml

Configuration content:

# my global config
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "h121-wzk-icu"
    static_configs:
      - targets: ["h121.wzk.icu:9100"]

  - job_name: "h122-wzk-icu"
    static_configs:
      - targets: ["h122.wzk.icu:9100"]

  - job_name: "h123-wzk-icu"
    static_configs:
      - targets: ["h123.wzk.icu:9100"]

  - job_name: "wzk-icu-grafana"
    static_configs:
      - targets: ["h121.wzk.icu:9091"]

Key Configuration Parameters

ParameterDescriptionDefault
scrape_intervalHow often to scrape targets15s
scrape_timeoutTimeout for each scrape10s
evaluation_intervalHow often to evaluate rules15s

Start Service

cd /opt/servers/prometheus-2.53.2.linux-amd64
./prometheus

Access addresses:

Verify Targets Status

  1. Access http://h121.wzk.icu:9090/targets
  2. Check “State” column:
    • UP: Target is being scraped successfully
    • DOWN: Target is unreachable or scrapes failing
  3. Check “Last Error” for error details

Common Metrics

node_exporter Common Metrics

MetricDescription
node_cpu_seconds_totalCPU time by mode
node_memory_MemTotal_bytesTotal memory
node_memory_MemAvailable_bytesAvailable memory
node_disk_io_time_seconds_totalDisk I/O time
node_network_receive_bytes_totalNetwork received bytes
node_network_transmit_bytes_totalNetwork transmitted bytes

Error Quick Reference

SymptomRoot CauseFix
/targets shows DOWN, connection refusedTarget port not listening/service not startedFrom Prometheus machine curl http://host:9100/metrics
/targets shows context deadline exceededNetwork fluctuation/slow link/target response slowPrometheus logs grep scrape
Scrape returns 404Target not exposing /metricsConfigure metrics_path in job
Prometheus startup error: error loading config fileYAML indentation/field name errorUse promtool check config prometheus.yml
Targets normal but query has no dataWrong time range/clock driftQuery up in UI Graph; calibrate NTP
Prometheus memory spike/query slowLabel cardinality too hightsdb status, check prometheus_tsdb_*
Disk growing fastRetention too long or high-frequency samplingObserve data directory size; --storage.tsdb.retention.time