ELK Elastic Stack (ELK) Practice: Architecture Key Points...

TL;DR

Scenario: Build centralized logging system, achieve log collection, storage, analysis, visualization
Conclusion: ELK stack provides complete logging solution, ES handles storage and search, Logstash handles collection, Kibana handles visualization
Output: Architecture design, core concepts, query DSL, error quick reference

Centralized Logging System

Core Functions

Function	Description
Collection	Multi-source log collection
Transmission	Log transmission pipeline
Storage	Distributed storage
Analysis	Full-text search and aggregation
Alerts	Anomaly monitoring and alerting

Architecture

Log Source → Logstash → Elasticsearch ← Kibana

Elasticsearch Core Concepts

Cluster and Node

Node Type	Responsibility
Master Node	Cluster management, index creation, load balancing
Data Node	Data storage, document CRUD
Coordinating Node	Request forwarding, result aggregation

Index

Similar to relational database table
Supports dynamic mapping
Can set shard and replica counts

Shard and Replica

Type	Purpose
Primary Shard	Horizontal data scaling
Replica Shard	High availability, read concurrency

Query DSL

// Full-text search
{"query": {"match": {"content": "error"}}}

// Exact query
{"query": {"term": {"status": "failed"}}}

// Range query
{"query": {"range": {"@timestamp": {"gte": "now-1h"}}}}

// Boolean combination
{"query": {"bool": {"must": [{"match": {"level": "error"}}], "filter": [{"range": {"@timestamp": {"gte": "now-1h"}}}]}}}

Aggregation

Type	Use Case
Bucket	Grouping statistics
Metric	Numeric calculation
Pipeline	Pipeline aggregation

Logstash

Purpose

Real-time data collection engine, supports multiple data sources:

File logs
Network protocols (TCP/UDP/HTTP)
Databases (JDBC)
Message queues (Kafka, Redis)

Config File Structure

input {
  file {
    path => "/var/log/*.log"
    start_position => "beginning"
  }
}

filter {
  grok {
    match => {"message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:content}"}
  }
  date {
    match => ["timestamp", "yyyy-MM-dd HH:mm:ss"]
  }
}

output {
  elasticsearch {
    hosts => ["h121.wzk.icu:9200"]
    index => "app-logs-%{+YYYY.MM.dd}"
  }
}

Kibana

Purpose

Elasticsearch visualization analysis platform:

Index management
Graphical queries
Visualization dashboards
Monitoring alerts

Comparison with Solr

Feature	Elasticsearch	Solr
Distributed coordination	Built-in	Depends on ZooKeeper
JSON support	Native	Requires config
Real-time search	Advantage	Requires config for near real-time
Document model	Flexible	Strict mode

Troubleshooting Checklist

Cluster Yellow/Red

Symptom: Cluster status is Yellow or Red

Root Causes:

Node offline
Disk watermark too high
Shard allocation failed

Fix:

# Check node status
curl -XGET 'h121.wzk.icu:9200/_cluster/health'

# Check disk watermark
curl -XGET 'h121.wzk.icu:9200/_cat/allocation?v'

# Clean up disk or adjust shards

Write Rejection

Symptom: Write request rejected

Root Causes:

JVM heap pressure
Field explosion

Fix:

# Monitor JVM
curl -XGET 'h121.wzk.icu:9200/_nodes/jvm'

# Limit field count
index.mapping.total_fields.limit: 1000

Slow Queries

Symptom: Query response time long

Root Causes:

Wildcard queries
Deep pagination
No index

Fix:

Avoid leading wildcards: *error
Use scroll or search_after instead of from/size
Optimize query conditions

Timeline Mess

Symptom: Log time order chaotic

Root Causes:

Timezone not unified
Logstash time parsing error

Fix:

filter {
  date {
    match => ["timestamp", "yyyy-MM-dd HH:mm:ss"]
    timezone => "Asia/Shanghai"
  }
}

Index Read-only

Symptom: Index becomes read-only

Root Cause:

Disk high watermark triggered (default 85%)

Fix:

# Clean up disk or adjust threshold
curl -XPUT 'h121.wzk.icu:9200/_all/_settings' -d '{"index.blocks.read_only_allow_delete": null}'

# Adjust high watermark
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%

ILM Index Lifecycle Management

Stages

Stage	Actions
Hot	Frequent write/query
Warm	Periodic merge, read-only
Cold	Archive storage
Delete	Delete

Config

{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {"max_age": "1d", "max_size": "50gb"}
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {"forcemerge": {"max_num_segments": 1}}
      },
      "cold": {
        "min_age": "90d",
        "actions": {"freeze": {}}
      },
      "delete": {
        "min_age": "365d",
        "actions": {"delete": {}}
      }
    }
  }
}