Big Data 187 - Logstash Filter Plugin Practice
1. Filter Plugin Overview
Filter is responsible for parsing, transforming, filtering events. Multiple Filters execute in configured order.
Note: Order matters.
2. grok Regex Parsing
2.1 Syntax
(?<field_name>pattern)
2.2 Console Data Parsing
input { stdin {} }
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{LOGLEVEL:level}%{SPACE}%{GREEDYDATA:msg}" }
}
}
output { stdout { codec => rubydebug } }
3. Nginx Log Parsing
3.1 Nginx CLF Format
127.0.0.1 - - [10/Oct/2024:13:55:36 +0800] "GET /index.html HTTP/1.1" 200 612 "-" "Mozilla/5.0"
3.2 grok Config
grok {
match => {
"message" => '%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} %{NUMBER:bytes:int} "%{DATA:referrer}" "%{DATA:agent}"'
}
}
3.3 Parsed Fields
| Field | Meaning |
|---|---|
| clientip | Client IP |
| timestamp | Request time |
| verb | Request method |
| request | Request path |
| response | Status code |
| bytes | Response bytes |
| referrer | Referrer page |
| agent | User-Agent |
4. Verify Parsing Effect
output {
stdout { codec => rubydebug }
}
5. Error Quick Reference
| Issue | Possible Cause | Solution |
|---|---|---|
| grok miss | Pattern doesn’t match | Use grok debugger online for debugging |
| Multi-line logs | Log line breaks | Use multiline codec |
| Performance issue | Regex too complex | Precompile regex or simplify pattern |
| Time field non-standard | @timestamp not parsed | Use date plugin to convert |
6. Summary
- grok uses named capture to extract fields
- Console and Nginx log formats differ, patterns differ
- Use rubydebug to quickly verify parsing effect