Big Data 185 - Logstash 7 Getting Started
1. Logstash Architecture
Logstash pipeline three stages:
Input → Filter → Output
bin/logstash -e 'input{stdin{}}output{stdout{codec=>rubydebug}}'
3.1 Basic Config
input {
file {
path => "/var/log/nginx/access.log"
type => "nginx"
start_position => "end"
}
}
output {
stdout { codec => rubydebug }
}
3.2 Key Parameters
| Parameter | Description |
|---|
path | File path, supports wildcards |
type | Add type field, convenient for subsequent processing |
start_position | Where to start reading: beginning or end |
discover_interval | Interval to discover new files (default 15 seconds) |
close_older | Close files not updated after how long (default 3600 seconds) |
sincedb_path | sincedb file path |
4. sincedb Mechanism
- Records each file’s read offset
- Resume from breakpoint, prevent duplicate collection
- Associated with file path + inode
4.1 start_position Effect Conditions
- Only effective for first-seen files
- If sincedb already has record, start_position is ignored
5. Error Quick Reference
| Issue | Possible Cause | Solution |
|---|
| Duplicate data | sincedb record wrong | Delete sincedb file to restart |
| Can’t collect data | Permission issue | Check file read permission |
| File not updating | close_older too short | Increase close_older |
6. Summary
- Logstash pipeline: Input → Filter → Output
- sincedb implements breakpoint resume
- start_position only effective for new files