Introduction to Logstash

What is Logstash?
Downloading/Installing Logstash
Running Logstash
- Agent
- Web
Logstash architecture
The UI for Logstash

What is Logstash?

A generic concept for receiving data, transforming it, and outputting it.

Downloading/Installing Logstash

Downloading is as easy as getting it from http://logstash.net, but I'll be using version 1.4 (which is currently beta as of this writing)

To install, untar the package somewhere, or use the .deb/.rpm repositories for your respective operating system.

Running Logstash

There are 2 main modes of running Logstash. Note that both can be run at once.

Agent

Running as an agent collects information, forwarding it to the backend (in our case, Elasticsearch)

Web

Runs the web UI (known as Kibana) bundled in Logstash

Logstash architecture

Logstash is a collection of:

Inputs
Codecs
Filters
Outputs

Simplest configuration

Starting with the simplest input, standard in:

input {
  stdin {}
}

And the simplest output, standard out (no filters for now):

output {
  stdout {}
}

To run this, you can do:

bin/logstash agent -f logstash-simple.conf

Changing the way data is represented

Let's change the codec (data representation) to print more information:

input {
  stdin {}
}
output {
  stdout {
    codec => rubydebug
  }
}

Reading input from files on disk

This time, instead of reading in from stdin, read from a file:

input {
  file {
    type => "apache"
    path => "/Users/hinmanm/intro-to-logstash/example.log"
  }
}
output {
  stdout {
    codec => rubydebug
  }
}

Outputing to an embedded Elasticsearch

Logstash can output to many more places than just stdout, it comes with elasticsearch as an output option that can run embedded:

input {
  file {
    type => "apache"
    path => "/Users/hinmanm/intro-to-logstash/example.log"
  }
}
output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    embedded => true
  }
}

Add a few logs to the file:

echo "this is a log message about foo" >> example.log
echo "this is a log message about bar" >> example.log
echo "this is a log message about baz" >> example.log

Logstash creates an index, notice that it created it for the day this was run. Logstash will create daily indices by default:

curl 'localhost:9200/_cat/health?v'
echo "------"
curl 'localhost:9200/_cat/shards?v'

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign
1395046372 02:52:52  elasticsearch yellow          2         1      5   5    0    0        5
------
index               shard prirep state      docs store ip             node
logstash-2014.03.17 2     p      STARTED       0   99b 172.22.255.231 Multiple Man
logstash-2014.03.17 2     r      UNASSIGNED
logstash-2014.03.17 0     p      STARTED       0   99b 172.22.255.231 Multiple Man
logstash-2014.03.17 0     r      UNASSIGNED
logstash-2014.03.17 3     p      STARTED       2 4.2kb 172.22.255.231 Multiple Man
logstash-2014.03.17 3     r      UNASSIGNED
logstash-2014.03.17 1     p      STARTED       1 3.9kb 172.22.255.231 Multiple Man
logstash-2014.03.17 1     r      UNASSIGNED
logstash-2014.03.17 4     p      STARTED       0   99b 172.22.255.231 Multiple Man
logstash-2014.03.17 4     r      UNASSIGNED

And you can search for log messages (here's an example query)

{
  "query": {
    "simple_query_string": {
      "query": "foo | bar",
      "fields": ["message"]
    }
  },
  "size": 3
}

And get back the results:

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 1248

{
  "took" : 76,
  "timed_out" : false,
  "_shards" : {
    "total" : 20,
    "successful" : 20,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.35355338,
    "hits" : [ {
      "_index" : "logstash-2014.03.17",
      "_type" : "apache",
      "_id" : "q8Eq-Ck2RjWwB70-rxz7bw",
      "_score" : 0.35355338, "_source" : {"message":"this is a log message about foo","@version":"1","@timestamp":"2014-03-17T08:52:39.789Z","type":"apache","host":"Xanadu.local","path":"/Users/hinmanm/intro-to-logstash/example.log"}
    }, {
      "_index" : "logstash-2014.03.17",
      "_type" : "apache",
      "_id" : "e0KXf2eCQjmm302UB6n60g",
      "_score" : 0.35355338, "_source" : {"message":"this is a log message about bar","@version":"1","@timestamp":"2014-03-17T08:52:39.791Z","type":"apache","host":"Xanadu.local","path":"/Users/hinmanm/intro-to-logstash/example.log"}
    }, {
      "_index" : "logstash-2014.03.13",
      "_type" : "apache",
      "_id" : "DXwFHMvTTsauxjr9lJ5Xcg",
      "_score" : 0.25427115, "_source" : {"message":"this is a log message about bar","@version":"1","@timestamp":"2014-03-13T08:45:22.446Z","type":"apache","host":"Xanadu.local","path":"/Users/hinmanm/intro-to-logstash/example.log"}
    } ]
  }
}

Outputing to a separate Elasticsearch

Embedded is great for development, but outputting to a different Elasticsearch server is better for production:

input {
  file {
    type => "apache"
    path => "/Users/hinmanm/intro-to-logstash/example.log"
  }
}
output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    host => "localhost"
    port => 9300
    node_name => "logstash-agent-007"
    workers => 2
  }
}

Adding a filter into the mix

Filters allow you to modify output

The most useful is grok, but let's start with mutate. So the standard input/output configuration first:

input {
  stdin {}
  file {
    type => "apache"
    path => "/Users/hinmanm/intro-to-logstash/example.log"
  }
}
output {
  stdout {
    codec => rubydebug
  }
}

Adding the mutate filter to add a field as well as lowercase the "message" field

filter {
  mutate {
    add_field => [ "myhost", "Hello from %{host}!"]
    lowercase => [ "message"]
  }
}

The "grok" filter

Logstash's arguably most useful filter. ~120 different patterns that can be comibned.

Again, standard boilerplate:

input {
  stdin {}
  file {
    type => "apache"
    path => "/Users/hinmanm/intro-to-logstash/example.log"
  }
}
output {
  stdout {
    codec => rubydebug
  }
}

And then a grok filter meant to match the text "name: John" in the inputs:

filter {
  grok {
    match => [ "message", "name: %{WORD:custom_name}" ]
  }
  mutate {
    lowercase => [ "custom_name" ]
  }
}

Combining everything together

Read from a file (this time an Elasticsearch log file), use the es-log type when putting the log message into Elasticsearch. Output will be written to a separate Elasticsearch cluster at localhost on port 9300:

input {
  file {
    type => "es-log"
    path => "/Users/hinmanm/intro-to-logstash/es/logs/elasticsearch.log"
  }
}
output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    host => "localhost"
    port => 9300
  }
}

This example filter will match Elasticsearch's log format, extract the useful pieces of the log (time, level, package, node_name, and log message).

The mutate filter will then:

lowercase the log level (INFO => info)
strip the whitespace for the package ("indices.recovery " => "indices.recovery")

Additionally, the multiline filter will match lines that look like a Java Exception, and collapse them into a single message from the previous line.

filter {
  grok {
    match => [ "message",
               "^\[%{TIMESTAMP_ISO8601:time}\]\[%{LOGLEVEL:level}.*\]\[%{DATA:package}\] \[%{DATA:node_name}\] %{DATA:logmsg}$"]
  }

  mutate {
    lowercase => [ "level" ]
    strip => [ "package" ]
  }

  multiline {
    pattern => "(org\.elasticsearch\.Exception.+|(at .+))"
    what => "previous"
  }
}

The UI for Logstash

Logstash bundles Kibana, which can be used for visualizing data, and is as easy as running:

bin/logstash web

or both at once with:

bin/logstash agent -f logstash.conf -- web