Big Data 187 - Logstash Filter Plugin Practice

1. Filter Plugin Overview

Filter is responsible for parsing, transforming, filtering events. Multiple Filters execute in configured order.

Note: Order matters.

2. grok Regex Parsing

2.1 Syntax

(?<field_name>pattern)

2.2 Console Data Parsing

input { stdin {} }

filter {
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{LOGLEVEL:level}%{SPACE}%{GREEDYDATA:msg}" }
  }
}

output { stdout { codec => rubydebug } }

3. Nginx Log Parsing

3.1 Nginx CLF Format

127.0.0.1 - - [10/Oct/2024:13:55:36 +0800] "GET /index.html HTTP/1.1" 200 612 "-" "Mozilla/5.0"

3.2 grok Config

grok {
  match => {
    "message" => '%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} %{NUMBER:bytes:int} "%{DATA:referrer}" "%{DATA:agent}"'
  }
}

3.3 Parsed Fields

FieldMeaning
clientipClient IP
timestampRequest time
verbRequest method
requestRequest path
responseStatus code
bytesResponse bytes
referrerReferrer page
agentUser-Agent

4. Verify Parsing Effect

output {
  stdout { codec => rubydebug }
}

5. Error Quick Reference

IssuePossible CauseSolution
grok missPattern doesn’t matchUse grok debugger online for debugging
Multi-line logsLog line breaksUse multiline codec
Performance issueRegex too complexPrecompile regex or simplify pattern
Time field non-standard@timestamp not parsedUse date plugin to convert

6. Summary

  • grok uses named capture to extract fields
  • Console and Nginx log formats differ, patterns differ
  • Use rubydebug to quickly verify parsing effect