Imagine this: You have your services up and running in production. Then you get a call. Something’s broken in production.
What’s the first thing you do? Where do you start so you can figure out if your services are in good working order?
Logs. The logs are your connection to your services. They should tell you what’s going right and what’s going wrong.
In this post, we’ll go over what logs are, why they matter, and most importantly, why you probably need a service to offload the burden of managing those logs.
What Is Logging?
Just to be clear on what logging is, let’s talk about why logging is necessary and how to make it as useful as possible.
Each component in your environment has, or should have, some form of communication. This is (in general) some sort of log. These logs should help communicate what the service is doing and provide a heartbeat that lets you know whether the service or component is doing its job.
Also, the logs should communicate whether the problem is the service itself. For example, is the service out of memory? Is it out of disk space?
Another problem that logs should communicate is whether the service is failing on an outside dependency. For example, maybe a service it’s trying to communicate with is either returning errors or returning corrupt data.
Whether the service is part of your cloud environment or not, logs are an important part of your system. It’s vital to make sure that logs in your services communicate useful information and help you diagnose problems quickly. With your services, make sure your logs are formatted in a way that allows someone or something to extract data quickly.
Why Would I Want a Logging Service?
Now that you have a source of logs, you or someone else has to aggregate them into a central place for an operator to extract information from them. Make sure to sort timestamps so they’re at least in chronological order. You could order them in many other ways, of course. But the important thing is that you need to get features out! Don’t use your important time on tasks you could easily solve by pulling something off the shelf.
This is where logging as a service comes in. Logging as a service helps you get your logs into a central location. Also, it lets you quickly sift through those logs and find the signals among all the noise. This service makes finding the information you need quick and easy.
What Should I Look for in a Service?
What makes a good logging service? Well, you want to know that if you put your logs into this service, you can get the data you need out of them as soon as possible and without much difficulty. You have three pillars of observability to cover: logs, metrics, and tracing.
- Why do logs matter? They’re your source of information. You want to be able to view the logs and inspect them for information to help you solve whatever problem you’re facing. Scalyr has an excellent search mechanism that lets you slice and dice your logs. This way, you can find the information you need. Plus, since logs don’t have indexes, you don’t have to worry about the issue of adding fields to an index. Everything is searchable when you ingest them.
- What’s so important about metrics? With metrics, you want to be able to analyze how your system is performing. You can usually use this performance data as a “canary in a coal mine”—in other words, an early warning sign that a system is in trouble. If your metrics fall below certain thresholds, then logging as a service can alert you before your customers even notice. Scalyr can help you set up alerts on those thresholds and notify you through a number of different methods.
- Why should I be concerned about tracing? It helps you track down errors when they happen. As you build up services, and especially if you use a microservices architecture, you want to be able to see the flow of information or requests through your system. You need to be able to identify when a request went bad in your system so you can react and fix the problem.
Do I Log Everything, or Do I Try to Be Specific?
You want to log as much as you need without being verbose. If you take this approach, when there’s a problem, you can quickly identify it and solve it. This usually requires you to at least make sure your services have the correct logging information. Also, you’ll want information on your platform, which will mean you’ll have to tap into their information streams to find out whether the platform is healthy and performing correctly. Most platforms out there have some kind of logging mechanism that you can monitor for this.
For instance, if you’re running on Kubernetes, you would want to set up logs to capture the logs from the Kubernetes system. This way, the system can help you diagnose if the problem exists at your service layer or at a platform level. For example, you could capture the logs when pods are created and destroyed to verify that resources are being allocated correctly. Another log source from Kubernetes relates to whether persistent disk mounts are being created and attached correctly to pods. If disk problems start to arise, you receive an alert, and you can take action and resolve it before a bigger problem arises.
Sometimes the platform itself may be more of a black box. In that case, you’ll have to use other methods to extract information. Black-box monitoring tools will help you to identify possible problems. Then the tools can alert you to these problems if they ever happen to you. These monitoring tools may not be as robust a solution as reviewing the raw logs would be. But at least you’ll have some kind of monitoring around what’s currently healthy and working correctly, versus what’s currently in a state that may require manual intervention.
With your services, you want logs to be helpful. This usually means working closely with your developers and asking them to put together appropriate logs that contain appropriate data. This may require you to sit down with your dev team and show them what you’re seeing come out of the system and what actions you take when those messages appear. For example, you may have developers put in a mechanism that lets you adjust how verbose your logs are. In certain debug situations, you would want to know every step the service takes. Another example would be to work with the developers to structure the data correctly, so you can search based on certain parameters to pull out data relevant to the problem. This might include trace IDs or other session information to help tie the logs together in a logical way.
Logs are your window into your system. You need logs to get a sense of what’s going on and what problems may need to be solved. If there’s a problem, you need to be able to recover from it as quickly as possible. Now, you could monitor the logs yourself, but your time is far too important to reinvent the wheel. There are services out there that can make logging a breeze. This blog post has hopefully outlined what makes logging as a service a worthwhile investment. I recommend taking Scalyr for a free spin.
Logging as a service is a cost-effective way to let you keep an eye on your platform. It can also help you recover faster. It can even help you react faster to events that need manual intervention. This kind of service is usually the backbone of your monitoring solutions, and it’s the first place to go when you need to reach for data when things go wrong. Make sure to invest correctly in a well-rounded system, and you’re likely to see dividends.
This post was written by Erik Lindblom. Erik has been a full stack developer for the last 13 years. During that time, he’s tried to understand everything that’s required to deliver high quality, valuable software. Today, that means using cloud services, microservices techniques, container technologies. Tomorrow? Well, he’s ready to find out.