TL;DR

  • Scenario: Build centralized logging system, achieve log collection, storage, analysis, visualization
  • Conclusion: ELK stack provides complete logging solution, ES handles storage and search, Logstash handles collection, Kibana handles visualization
  • Output: Architecture design, core concepts, query DSL, error quick reference

Centralized Logging System

Core Functions

FunctionDescription
CollectionMulti-source log collection
TransmissionLog transmission pipeline
StorageDistributed storage
AnalysisFull-text search and aggregation
AlertsAnomaly monitoring and alerting

Architecture

Log Source → Logstash → Elasticsearch ← Kibana

Elasticsearch Core Concepts

Cluster and Node

Node TypeResponsibility
Master NodeCluster management, index creation, load balancing
Data NodeData storage, document CRUD
Coordinating NodeRequest forwarding, result aggregation

Index

  • Similar to relational database table
  • Supports dynamic mapping
  • Can set shard and replica counts

Shard and Replica

TypePurpose
Primary ShardHorizontal data scaling
Replica ShardHigh availability, read concurrency

Query DSL

// Full-text search
{"query": {"match": {"content": "error"}}}

// Exact query
{"query": {"term": {"status": "failed"}}}

// Range query
{"query": {"range": {"@timestamp": {"gte": "now-1h"}}}}

// Boolean combination
{"query": {"bool": {"must": [{"match": {"level": "error"}}], "filter": [{"range": {"@timestamp": {"gte": "now-1h"}}}]}}}

Aggregation

TypeUse Case
BucketGrouping statistics
MetricNumeric calculation
PipelinePipeline aggregation

Logstash

Purpose

Real-time data collection engine, supports multiple data sources:

  • File logs
  • Network protocols (TCP/UDP/HTTP)
  • Databases (JDBC)
  • Message queues (Kafka, Redis)

Config File Structure

input {
  file {
    path => "/var/log/*.log"
    start_position => "beginning"
  }
}

filter {
  grok {
    match => {"message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:content}"}
  }
  date {
    match => ["timestamp", "yyyy-MM-dd HH:mm:ss"]
  }
}

output {
  elasticsearch {
    hosts => ["h121.wzk.icu:9200"]
    index => "app-logs-%{+YYYY.MM.dd}"
  }
}

Kibana

Purpose

Elasticsearch visualization analysis platform:

  • Index management
  • Graphical queries
  • Visualization dashboards
  • Monitoring alerts

Comparison with Solr

FeatureElasticsearchSolr
Distributed coordinationBuilt-inDepends on ZooKeeper
JSON supportNativeRequires config
Real-time searchAdvantageRequires config for near real-time
Document modelFlexibleStrict mode

Troubleshooting Checklist

Cluster Yellow/Red

Symptom: Cluster status is Yellow or Red

Root Causes:

  • Node offline
  • Disk watermark too high
  • Shard allocation failed

Fix:

# Check node status
curl -XGET 'h121.wzk.icu:9200/_cluster/health'

# Check disk watermark
curl -XGET 'h121.wzk.icu:9200/_cat/allocation?v'

# Clean up disk or adjust shards

Write Rejection

Symptom: Write request rejected

Root Causes:

  • JVM heap pressure
  • Field explosion

Fix:

# Monitor JVM
curl -XGET 'h121.wzk.icu:9200/_nodes/jvm'

# Limit field count
index.mapping.total_fields.limit: 1000

Slow Queries

Symptom: Query response time long

Root Causes:

  • Wildcard queries
  • Deep pagination
  • No index

Fix:

  • Avoid leading wildcards: *error
  • Use scroll or search_after instead of from/size
  • Optimize query conditions

Timeline Mess

Symptom: Log time order chaotic

Root Causes:

  • Timezone not unified
  • Logstash time parsing error

Fix:

filter {
  date {
    match => ["timestamp", "yyyy-MM-dd HH:mm:ss"]
    timezone => "Asia/Shanghai"
  }
}

Index Read-only

Symptom: Index becomes read-only

Root Cause:

  • Disk high watermark triggered (default 85%)

Fix:

# Clean up disk or adjust threshold
curl -XPUT 'h121.wzk.icu:9200/_all/_settings' -d '{"index.blocks.read_only_allow_delete": null}'

# Adjust high watermark
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%

ILM Index Lifecycle Management

Stages

StageActions
HotFrequent write/query
WarmPeriodic merge, read-only
ColdArchive storage
DeleteDelete

Config

{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {"max_age": "1d", "max_size": "50gb"}
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {"forcemerge": {"max_num_segments": 1}}
      },
      "cold": {
        "min_age": "90d",
        "actions": {"freeze": {}}
      },
      "delete": {
        "min_age": "365d",
        "actions": {"delete": {}}
      }
    }
  }
}