Introduction to Logstash
Table of Contents
What is Logstash?
A generic concept for receiving data, transforming it, and outputting it.
Downloading/Installing Logstash
Downloading is as easy as getting it from http://logstash.net, but I'll be using version 1.4 (which is currently beta as of this writing)
To install, untar the package somewhere, or use the .deb/.rpm repositories for your respective operating system.
Running Logstash
There are 2 main modes of running Logstash. Note that both can be run at once.
Agent
Running as an agent collects information, forwarding it to the backend (in our case, Elasticsearch)
Web
Runs the web UI (known as Kibana) bundled in Logstash
Logstash architecture
Logstash is a collection of:
- Inputs
- Codecs
- Filters
- Outputs
Simplest configuration
Starting with the simplest input, standard in:
input { stdin {} }
And the simplest output, standard out (no filters for now):
output { stdout {} }
To run this, you can do:
bin/logstash agent -f logstash-simple.conf
Changing the way data is represented
Let's change the codec (data representation) to print more information:
input { stdin {} } output { stdout { codec => rubydebug } }
Reading input from files on disk
This time, instead of reading in from stdin, read from a file:
input { file { type => "apache" path => "/Users/hinmanm/intro-to-logstash/example.log" } } output { stdout { codec => rubydebug } }
Outputing to an embedded Elasticsearch
Logstash can output to many more places than just stdout, it comes with elasticsearch as an output option that can run embedded:
input { file { type => "apache" path => "/Users/hinmanm/intro-to-logstash/example.log" } } output { stdout { codec => rubydebug } elasticsearch { embedded => true } }
Add a few logs to the file:
echo "this is a log message about foo" >> example.log echo "this is a log message about bar" >> example.log echo "this is a log message about baz" >> example.log
Logstash creates an index, notice that it created it for the day this was run. Logstash will create daily indices by default:
curl 'localhost:9200/_cat/health?v' echo "------" curl 'localhost:9200/_cat/shards?v'
epoch timestamp cluster status node.total node.data shards pri relo init unassign 1395046372 02:52:52 elasticsearch yellow 2 1 5 5 0 0 5 ------ index shard prirep state docs store ip node logstash-2014.03.17 2 p STARTED 0 99b 172.22.255.231 Multiple Man logstash-2014.03.17 2 r UNASSIGNED logstash-2014.03.17 0 p STARTED 0 99b 172.22.255.231 Multiple Man logstash-2014.03.17 0 r UNASSIGNED logstash-2014.03.17 3 p STARTED 2 4.2kb 172.22.255.231 Multiple Man logstash-2014.03.17 3 r UNASSIGNED logstash-2014.03.17 1 p STARTED 1 3.9kb 172.22.255.231 Multiple Man logstash-2014.03.17 1 r UNASSIGNED logstash-2014.03.17 4 p STARTED 0 99b 172.22.255.231 Multiple Man logstash-2014.03.17 4 r UNASSIGNED
And you can search for log messages (here's an example query)
{ "query": { "simple_query_string": { "query": "foo | bar", "fields": ["message"] } }, "size": 3 }
And get back the results:
HTTP/1.1 200 OK Content-Type: application/json; charset=UTF-8 Content-Length: 1248 { "took" : 76, "timed_out" : false, "_shards" : { "total" : 20, "successful" : 20, "failed" : 0 }, "hits" : { "total" : 4, "max_score" : 0.35355338, "hits" : [ { "_index" : "logstash-2014.03.17", "_type" : "apache", "_id" : "q8Eq-Ck2RjWwB70-rxz7bw", "_score" : 0.35355338, "_source" : {"message":"this is a log message about foo","@version":"1","@timestamp":"2014-03-17T08:52:39.789Z","type":"apache","host":"Xanadu.local","path":"/Users/hinmanm/intro-to-logstash/example.log"} }, { "_index" : "logstash-2014.03.17", "_type" : "apache", "_id" : "e0KXf2eCQjmm302UB6n60g", "_score" : 0.35355338, "_source" : {"message":"this is a log message about bar","@version":"1","@timestamp":"2014-03-17T08:52:39.791Z","type":"apache","host":"Xanadu.local","path":"/Users/hinmanm/intro-to-logstash/example.log"} }, { "_index" : "logstash-2014.03.13", "_type" : "apache", "_id" : "DXwFHMvTTsauxjr9lJ5Xcg", "_score" : 0.25427115, "_source" : {"message":"this is a log message about bar","@version":"1","@timestamp":"2014-03-13T08:45:22.446Z","type":"apache","host":"Xanadu.local","path":"/Users/hinmanm/intro-to-logstash/example.log"} } ] } }
Outputing to a separate Elasticsearch
Embedded is great for development, but outputting to a different Elasticsearch server is better for production:
input { file { type => "apache" path => "/Users/hinmanm/intro-to-logstash/example.log" } } output { stdout { codec => rubydebug } elasticsearch { host => "localhost" port => 9300 node_name => "logstash-agent-007" workers => 2 } }
Adding a filter into the mix
Filters allow you to modify output
The most useful is grok
, but let's start with mutate
. So the standard
input/output configuration first:
input { stdin {} file { type => "apache" path => "/Users/hinmanm/intro-to-logstash/example.log" } } output { stdout { codec => rubydebug } }
Adding the mutate
filter to add a field as well as lowercase the "message"
field
filter { mutate { add_field => [ "myhost", "Hello from %{host}!"] lowercase => [ "message"] } }
The "grok" filter
Logstash's arguably most useful filter. ~120 different patterns that can be comibned.
Again, standard boilerplate:
input { stdin {} file { type => "apache" path => "/Users/hinmanm/intro-to-logstash/example.log" } } output { stdout { codec => rubydebug } }
And then a grok filter meant to match the text "name: John" in the inputs:
filter { grok { match => [ "message", "name: %{WORD:custom_name}" ] } mutate { lowercase => [ "custom_name" ] } }
Combining everything together
Read from a file (this time an Elasticsearch log file), use the es-log
type
when putting the log message into Elasticsearch. Output will be written to a
separate Elasticsearch cluster at localhost on port 9300:
input { file { type => "es-log" path => "/Users/hinmanm/intro-to-logstash/es/logs/elasticsearch.log" } } output { stdout { codec => rubydebug } elasticsearch { host => "localhost" port => 9300 } }
This example filter will match Elasticsearch's log format, extract the useful pieces of the log (time, level, package, node_name, and log message).
The mutate
filter will then:
- lowercase the log level (INFO => info)
- strip the whitespace for the package ("indices.recovery " => "indices.recovery")
Additionally, the multiline
filter will match lines that look like a Java
Exception, and collapse them into a single message from the previous line.
filter { grok { match => [ "message", "^\[%{TIMESTAMP_ISO8601:time}\]\[%{LOGLEVEL:level}.*\]\[%{DATA:package}\] \[%{DATA:node_name}\] %{DATA:logmsg}$"] } mutate { lowercase => [ "level" ] strip => [ "package" ] } multiline { pattern => "(org\.elasticsearch\.Exception.+|(at .+))" what => "previous" } }
The UI for Logstash
Logstash bundles Kibana, which can be used for visualizing data, and is as easy as running:
bin/logstash web
or both at once with:
bin/logstash agent -f logstash.conf -- web