The ELKBeats Stack: Sounds Like a Good Idea ...

Read the first item in this Table of Contents if you haven't been here before.

Table of Contents


This is a rough introduction to the ELK Stack and associated software: it's a big subject and is divided into multiple blog posts. Those are listed in the Table of Contents above: jump about as you feel is correct, but you should probably read this introduction first. If you're trying to build the entire stack, you should read the whole damn thing in sequence, and note that none of the rest of them are going to work without the Ground Work. The process given here was valid in March of 2016 for Debian systems using the Debian packages, not the tarballs: elastic.co's process and packages seems to be extremely volatile: if you're using some other distro or are significantly removed in time I suspect the process I've outlined won't work for you.

The ELK Stack is composed of Elasticsearch (kind of the star of the show, the hero search engine of open source), Logstash (handles incoming information, usually logs), and Kibana (Elasticsearch may be the star but this is what you see: Kibana does the visualization). As Linode puts it, "The ELK Stack is an open-source platform that makes structured and unstructured data easy to collect, store and visualize." Logstash runs in the background, monitoring and parsing logs in (near) realtime, feeding the data into Elasticsearch as JSON. Elasticsearch stores the data and performs analytics, while Kibana is a Javascript-based front end for visualizing your statistics. But practically speaking you need one more puzzle piece to actually make this work: a way of getting logs (or any form of information) from your other servers to the Elasticsearch server/cluster. And that software is "Beats." It's not technically part of "the stack" as it runs on other machines, but it's an important part of this infrastructure.

Trying to get ELK running has been an education in packaging and documentation. The most significant problem I can articulate so far is that elastic.co (who provide all of these pieces of software) provide both tarballs and packages for a variety of platforms (which is great), but the installation instructions occasionally fail horribly right after "unpack the tarball" or "install package X" because they explain how to configure usually the tarball when the package behaviour is totally different. The reasons for this are sound: the package installation system supports the creation of an unprivileged user to run the associated binaries, and a full start-stop system (ie. control scripts for Windows or Linux's systemd), but this also means that the configuration and behaviour are significantly different between the tarball and package, to the point that if you're working with the one they're not describing, nothing works at all.

elastic.co has extensive and imperfect documentation at http://www.elastic.co/guide/ . There's extensive documentation elsewhere as well, but all of it caused me a great deal of pain, so - like any good open source author who believes in fragmentation - I'm going to write my own. This is an outline: installation and configuration will follow in further blog posts. Keep in mind I'm using Debian, and that what I'm describing is the current Debian experience.

Logstash

I'm going to let elastic.co make the case for their product:

Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash can dynamically unify data from disparate sources and normalize the data into destinations of your choice. Cleanse and democratize all your data for diverse advanced downstream analytics and visualization use cases.

While Logstash originally drove innovation in log collection, its capabilities extend well beyond that use case. Any type of event can be enriched and transformed with a broad array of input, filter, and output plugins, with many native codecs further simplifying the ingestion process. Logstash accelerates your insights by harnessing a greater volume and variety of data.

Ouch - the documentation author drank the marketing koolaid. In plain English? Logstash is the information gatherer. You point it at logs and other data sources via its complex and unpleasant configuration, and it converts those sources to JSON and feeds it to ...

Elasticsearch

elastic.co says "Elasticsearch is an open-source search engine built on top of Apache Lucene, a full-text search-engine library." A bit more detail: "Elasticsearch is a real-time, distributed storage, search, and analytics engine. It can be used for many purposes, but one context where it excels is indexing streams of semi-structured data, such as logs or decoded network packets."

"Distributed," huh? Let's see if we can even get one instance running ... My experience was that it's just as tricky to configure as Logstash.

Kibana

Kibana is (mostly) the good news of this stack. It's relatively small (only compared to the other packages) and easy to install. And it either works or it doesn't (one of my attempts to install it failed completely and I have no explanation of that). I doubt it's a piece of software you'd use with anything other than Elasticsearch.

Server Overload

With all three services running (and not yet afflicted with any data or actually doing anything!), an otherwise entirely unloaded virtual machine has less than half of its 1G of memory remaining available. Make sure any machine running an ELK stack is well equipped.

Beats

Once again quoting elastic.co's documentation:

The Beats are open source data shippers that you install as agents on your servers to send different types of operational data to Elasticsearch. Beats can send data directly to Elasticsearch or send it to Elasticsearch via Logstash, which you can use to enrich or archive the data.

For my purposes, and likely yours too, Filebeat is your likeliest starting point:

Filebeat is a log data shipper initially based on the Logstash-Forwarder source code. Installed as an agent on your servers, Filebeat monitors the log directories or specific log files, tails the files, and forwards them either to Logstash for parsing or directly to Elasticsearch for indexing.

Prepare yourself for terms like "harvester" and "prospector."


Continue to The ELKBeats Stack: the Ground Work, the next article in this series.