In order to understand the idea of log aggregation, you need to understand the pain it alleviates. You’ve almost certainly felt this pain, even if you don’t realize it.
Let’s consider a scenario that every programmer has probably experienced. You’re staring at some gigantic, dusty log file, engaging in what I like to think of as “programming archaeology.” And you have a headache.
Log aggregation is the practice of gathering up disparate log files for the purposes of organizing the data in them and making them searchable.
In this post, I’ll walk you through what, exactly, that looks like. But I’ll also describe the backstory and motivation for doing this, both of which are essential to really understanding how it works.
So with that in mind, let’s consider a programmer’s tale of logging woe.
A Tale of Logger Woe
It started innocently enough.
A few users reported occasionally seeing junk text on the account settings screen. It’s not a regular bug, and it’s not particularly important. But it is embarrassing, and it seems like it should be easy enough to track down and fix. So you start trying to do just that.
You start by searching the database for the junk text in their screenshot. You find nothing.
Reasoning that application code must somehow have compiled the text in production, you figure you’ll head for the log files. When you open one up, and it crashes you text editor. Oops. Too big for that editor.
After using a little shell script magic to slice and dice the log file, you open it up and search for the text in question. That takes absolutely forever and yields no results.
So you start searching for parts of the text, and eventually you have some luck. There’s a snippet of the text on line 429,012 and then another on line 431,114, with all sorts of indecipherable debug junk in between.
But you can’t find all of the text. And you have a headache.
You then realize there’s a second log file for certain parts of the data access layer from before the Big Refactoring of ’15, and the rest of the text is probably in there. Your headache gets worse.
Log Aggregation to the Rescue
You don’t need a name for it to know that a better approach has to exist.
Of course, you’ll pull levers in your own code and configuration files first. Once you’ve tracked down that maddening issue and sorted it, you’ll parlay your lessons learned into suggestions for the team.
From now on, we should really turn off all debugging info in prod because it’s just noise. And we should really audit our codebase to get rid of spurious calls. Oh, and while we’re doing that, we should establish some standards for the information that goes onto each line and consolidate to a single line. That should do the trick.
Or should it?
With these types of approaches, you’re addressing a poor signal-to-noise ratio. Frustrated by what you deem worthless information in the log file, you seek to organize and reduce the raw volume. You cut down on the total noise, hoping to leave only signal.
But here’s the trouble.
Your noise in solving the junk text problem may prove to be someone else’s signal next week when tracking down a different problem. If you’re a code archaeologist, those log entries are your fossils; you don’t want to toss them in the garbage because they’re not helping with your project right now.
You don’t want to put your logs on a diet. Rather, you want to get better at managing them. That’s where log aggregation comes in.
Aggregating Your Logs
Today you have that main log and the other leftover one from the days before the Big Refactoring of ’15. You want to consolidate those in application code to make life easier. But then again, you’ll still have the entirely separate server log to deal with in some cases.
And what about the inevitable wave of consultants that come in and tell you to break your monolith into microservices? All your log file consolidation efforts become moot as your application becomes a bunch of small applications.
The real solution to your problem lies not in an enforced standard of dumping all information into a single file.
Instead, you want to find an efficient way to gather the entries from your various log files into one single, organized place. That may seem like a lot of extra work for you, but it really isn’t because someone has solved this problem already — and solved it well.
Tools exist to handle your log aggregation.
And Parsing Them, While You’re At It
Introducing log aggregation tooling will become a game changer for you. If you think ahead to the sorts of things you’d want following an aggregation, the tool has already taken care of them.
For instance, you probably think, “Well, slamming the log files together is all well and good, but all the different formats would just get confusing.”
And if you roll your own solution, that’s absolutely true. At least, until you write some sort of parser to extract structured data.
But people have already written that, and it comes along for the ride with log aggregation tooling.
There’s an important concept at play here. Generally, developers treat log files as text and do simple searches. But there’s data in those files, waiting for extraction and meaningful ordering. The aggregator treats your log file as data and lets you meaningfully query that data.
Chances are you’ve logged into a server somewhere and issued a command like “tail -f some.log.” And I’m betting that’s the sum total of the real-time log monitoring that you’ve done.
With a feature-rich log aggregator, you can achieve this same effect.
But with the aggregator, you can bring all of the structured data and gathered log files along for the ride. So instead of a scrolling wall of text, you can keep your eye on a scrolling set of organized, meaningful, color-coded entries. In this sense, it’s a lot more like looking at a dashboard than a text file dump.
Intelligent, Fast Search
All of that sets the stage for truly alleviating the headache of the code archaeologist. You can get all of the log data in one place, parse it into meaningful data, and keep an eye on it. So, not surprisingly, you can get a lot more sophisticated with your querying than simple text searches.
The log aggregation tool treats your code as data.
That means conceptual schema and indexing. Put another way, that means that you can execute semantically meaningful searches that are also fast.
Forget relying on your text editor’s wildcard/regex feature to make your search smart and then going for lunch while it cranks through a 10-gig log file. You can look for things based on the nature of the data in question, and you can do so quickly.
The Value Proposition of Log Aggregation
Everything that I’ve talked about so far, you can think about in terms of features. Literally speaking, log aggregation just means “gathering log files into one place.”
Log aggregation tool makers have taken that to its logical conclusion, adding things like parsing, search, and indexing. You now have sophisticated, robust options to help you keep track of the information that your applications leave behind as they run.
But getting to the real core value proposition — the “what’s in it for me” angle — requires you to consider these features as a whole. It means you have to think of the code archaeologist with the headache.
That developer wades through a swamp of noise, looking for a signal. Log aggregation turns the log files into proper data, thus taking the noise and hiding it until you need it. Left with only the signal, you can now use your logs as a quick and efficient tool for chasing down production issues — without any headaches.