Big Data 185 - Logstash 7 Getting Started

1. Logstash Architecture

Logstash pipeline three stages:

Input → Filter → Output

2. stdin Input

bin/logstash -e 'input{stdin{}}output{stdout{codec=>rubydebug}}'

3. file Input

3.1 Basic Config

input {
  file {
    path => "/var/log/nginx/access.log"
    type => "nginx"
    start_position => "end"
  }
}

output {
  stdout { codec => rubydebug }
}

3.2 Key Parameters

ParameterDescription
pathFile path, supports wildcards
typeAdd type field, convenient for subsequent processing
start_positionWhere to start reading: beginning or end
discover_intervalInterval to discover new files (default 15 seconds)
close_olderClose files not updated after how long (default 3600 seconds)
sincedb_pathsincedb file path

4. sincedb Mechanism

  • Records each file’s read offset
  • Resume from breakpoint, prevent duplicate collection
  • Associated with file path + inode

4.1 start_position Effect Conditions

  • Only effective for first-seen files
  • If sincedb already has record, start_position is ignored

5. Error Quick Reference

IssuePossible CauseSolution
Duplicate datasincedb record wrongDelete sincedb file to restart
Can’t collect dataPermission issueCheck file read permission
File not updatingclose_older too shortIncrease close_older

6. Summary

  • Logstash pipeline: Input → Filter → Output
  • sincedb implements breakpoint resume
  • start_position only effective for new files