TL;DR
- Scenario: Build centralized logging system, achieve log collection, storage, analysis, visualization
- Conclusion: ELK stack provides complete logging solution, ES handles storage and search, Logstash handles collection, Kibana handles visualization
- Output: Architecture design, core concepts, query DSL, error quick reference
Centralized Logging System
Core Functions
| Function | Description |
|---|---|
| Collection | Multi-source log collection |
| Transmission | Log transmission pipeline |
| Storage | Distributed storage |
| Analysis | Full-text search and aggregation |
| Alerts | Anomaly monitoring and alerting |
Architecture
Log Source → Logstash → Elasticsearch ← Kibana
Elasticsearch Core Concepts
Cluster and Node
| Node Type | Responsibility |
|---|---|
| Master Node | Cluster management, index creation, load balancing |
| Data Node | Data storage, document CRUD |
| Coordinating Node | Request forwarding, result aggregation |
Index
- Similar to relational database table
- Supports dynamic mapping
- Can set shard and replica counts
Shard and Replica
| Type | Purpose |
|---|---|
| Primary Shard | Horizontal data scaling |
| Replica Shard | High availability, read concurrency |
Query DSL
// Full-text search
{"query": {"match": {"content": "error"}}}
// Exact query
{"query": {"term": {"status": "failed"}}}
// Range query
{"query": {"range": {"@timestamp": {"gte": "now-1h"}}}}
// Boolean combination
{"query": {"bool": {"must": [{"match": {"level": "error"}}], "filter": [{"range": {"@timestamp": {"gte": "now-1h"}}}]}}}
Aggregation
| Type | Use Case |
|---|---|
| Bucket | Grouping statistics |
| Metric | Numeric calculation |
| Pipeline | Pipeline aggregation |
Logstash
Purpose
Real-time data collection engine, supports multiple data sources:
- File logs
- Network protocols (TCP/UDP/HTTP)
- Databases (JDBC)
- Message queues (Kafka, Redis)
Config File Structure
input {
file {
path => "/var/log/*.log"
start_position => "beginning"
}
}
filter {
grok {
match => {"message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:content}"}
}
date {
match => ["timestamp", "yyyy-MM-dd HH:mm:ss"]
}
}
output {
elasticsearch {
hosts => ["h121.wzk.icu:9200"]
index => "app-logs-%{+YYYY.MM.dd}"
}
}
Kibana
Purpose
Elasticsearch visualization analysis platform:
- Index management
- Graphical queries
- Visualization dashboards
- Monitoring alerts
Comparison with Solr
| Feature | Elasticsearch | Solr |
|---|---|---|
| Distributed coordination | Built-in | Depends on ZooKeeper |
| JSON support | Native | Requires config |
| Real-time search | Advantage | Requires config for near real-time |
| Document model | Flexible | Strict mode |
Troubleshooting Checklist
Cluster Yellow/Red
Symptom: Cluster status is Yellow or Red
Root Causes:
- Node offline
- Disk watermark too high
- Shard allocation failed
Fix:
# Check node status
curl -XGET 'h121.wzk.icu:9200/_cluster/health'
# Check disk watermark
curl -XGET 'h121.wzk.icu:9200/_cat/allocation?v'
# Clean up disk or adjust shards
Write Rejection
Symptom: Write request rejected
Root Causes:
- JVM heap pressure
- Field explosion
Fix:
# Monitor JVM
curl -XGET 'h121.wzk.icu:9200/_nodes/jvm'
# Limit field count
index.mapping.total_fields.limit: 1000
Slow Queries
Symptom: Query response time long
Root Causes:
- Wildcard queries
- Deep pagination
- No index
Fix:
- Avoid leading wildcards:
*error - Use scroll or search_after instead of from/size
- Optimize query conditions
Timeline Mess
Symptom: Log time order chaotic
Root Causes:
- Timezone not unified
- Logstash time parsing error
Fix:
filter {
date {
match => ["timestamp", "yyyy-MM-dd HH:mm:ss"]
timezone => "Asia/Shanghai"
}
}
Index Read-only
Symptom: Index becomes read-only
Root Cause:
- Disk high watermark triggered (default 85%)
Fix:
# Clean up disk or adjust threshold
curl -XPUT 'h121.wzk.icu:9200/_all/_settings' -d '{"index.blocks.read_only_allow_delete": null}'
# Adjust high watermark
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
ILM Index Lifecycle Management
Stages
| Stage | Actions |
|---|---|
| Hot | Frequent write/query |
| Warm | Periodic merge, read-only |
| Cold | Archive storage |
| Delete | Delete |
Config
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {"max_age": "1d", "max_size": "50gb"}
}
},
"warm": {
"min_age": "30d",
"actions": {"forcemerge": {"max_num_segments": 1}}
},
"cold": {
"min_age": "90d",
"actions": {"freeze": {}}
},
"delete": {
"min_age": "365d",
"actions": {"delete": {}}
}
}
}
}