Big Data 185 - Logstash 7 Getting Started

1. Logstash Architecture

Logstash pipeline three stages:

Input → Filter → Output

2. stdin Input

bin/logstash -e 'input{stdin{}}output{stdout{codec=>rubydebug}}'

3. file Input

3.1 Basic Config

input {
  file {
    path => "/var/log/nginx/access.log"
    type => "nginx"
    start_position => "end"
  }
}

output {
  stdout { codec => rubydebug }
}

3.2 Key Parameters

Parameter	Description
`path`	File path, supports wildcards
`type`	Add type field, convenient for subsequent processing
`start_position`	Where to start reading: beginning or end
`discover_interval`	Interval to discover new files (default 15 seconds)
`close_older`	Close files not updated after how long (default 3600 seconds)
`sincedb_path`	sincedb file path

4. sincedb Mechanism

Records each file’s read offset
Resume from breakpoint, prevent duplicate collection
Associated with file path + inode

4.1 start_position Effect Conditions

Only effective for first-seen files
If sincedb already has record, start_position is ignored

5. Error Quick Reference

Issue	Possible Cause	Solution
Duplicate data	sincedb record wrong	Delete sincedb file to restart
Can’t collect data	Permission issue	Check file read permission
File not updating	close_older too short	Increase close_older

6. Summary

Logstash pipeline: Input → Filter → Output
sincedb implements breakpoint resume
start_position only effective for new files