Looking at the performance of a new app you manage, you probably wonder what metrics matter most. Maybe you’ve implemented a monitoring tool. Or maybe not. Either way, what should you be monitoring? What metrics are important?
Monitoring your applications effectively is challenging. If your applications have moved from a monolithic architecture to microservices, it’s an even bigger challenge. You have all of the parts of your applications across many instances instead of a few servers. What monitoring metrics do you focus on in that environment?
When faced with these types of challenges, a good start is usually a template. Or some system or method that’s tried and true. You need something you can start with and build on later if you need to.
That’s where the RED method can help. Let’s talk about using RED to help you get started with app monitoring and why you should consider it.
We all love acronyms and jargon. They can help us templatize the process of how to do things. The RED method is another one of those kinds of acronyms.
RED is a monitoring philosophy created by Tom Wilkie while he was working at Weaveworks. Tom spent some time at Google and was an SRE for Google Analytics. In that job, he learned how Google monitored their applications. A lot of that information was published in the Google SRE book.
In that book, the authors describe four types of metrics you should consider checking for when monitoring a user-facing system. These are called the four golden signals. These metrics include the following:
- Latency: the amount of time it takes to respond to a request, whether successful or failed
- Traffic: the amount of demand your system is handling, in the form of requests, sessions, etc.
- Errors: the number of requests that fail, either explicitly with an error or that didn’t meet any set expectations
- Saturation: the percentage use of system resources with constraints, such as CPU or bandwidth utilization
The RED method is a subset of these golden signals. It’s focused on providing only three metrics for monitoring your microservice applications.
Give Me an R
The R in RED stands for “rate.” This is the rate at which requests are coming in, usually measured on a per-second basis.
If you’re monitoring a web application, the first thing you want to look at is how many HTTP requests per second you’re seeing. This metric is simply a raw count of each request. With HTTP and its chatty nature, this can be a lot of requests. So you want to focus on how a request rate changes from the norm. For example, you could have a webpage that normally generates 25 HTTP requests for texts, images, scripts, and other things. If you later notice that the rate has jumped to 50 requests, something has changed. You can now use your monitoring tool to find out what those additional requests are and check the difference.
You Got Your E
Now that you know the number of requests per second, you might want to know how many are succeeding or failing. You have the E for this. It stands for “errors.” This metric is a count of the number of requests that have failed.
In the case of a web application, you’ll be looking for HTTP requests that return status codes like 400s or 500s. So if you’re seeing the number of 501 responses from your Apache or NGINX web server increasing, get to troubleshooting!
Now Give Me a D
When all of those requests come in, you also want to know how long it’s taking for users to get a response. This is the D in RED. It stands for the “duration” and is the amount of time it takes for a request to be processed.
The last thing a user wants is for a website to take too long to process their clicks. Studies from Google, Akamai, and others have shown that typical web users prefer one- or two-second response times for requests. If you’re constantly seeing times higher than those, some users are probably complaining.
Why RED Matters
These three letters are easy to remember. Together, they’ll help you figure out what metrics to look at when monitoring your applications. But why should using the RED method matter to you?
Here are four reasons why using the RED method for your monitoring matters.
Reduces Decision Fatigue
When you sit in front of your computer to access whatever monitoring system you use to look at the performance of an app, you have some decisions to make. What do you look at? Is your black box monitoring tool already collecting it? Is that metric being emitted by your white box monitoring tool?
What and how you monitor are decisions you need to make. When you’re in front of a new application, you may not have time to make those decisions. If you’re on call, you have even less time to decide what metrics to look at.
With the RED method, your decision is made. You start looking at request rate, request errors, and request duration. Period. End of story.
You can tweak it later if you want, but to get started monitoring your microservices, use RED.
Drives Standardization and Consistency
One of the reasons Tom Wilkie created RED, as he explains it, is because a new colleague asked about Weaveworks’ monitoring philosophy. When you have a method that you use, it creates a standard and level of consistency across your team.
Using RED for your monitoring will allow you and your team to be consistent in how monitoring is done. Everyone will be looking at the same metric regardless of the service they’re responsible for.
When new members get added to the team, this consistency will help them get up to speed more quickly on how to monitor their service. When dashboards are created, you can have a standard and consistent look across all of them, regardless of the application.
This consistency makes working as a team much easier. There are other hard things to worry about. Having such a standard helps reduce hard things.
Helps With Automation
Similar to helping with consistency, using the RED method can help with making tasks automated. Repeating the same tasks over and over can become monotonous. You do one thing and expect something else in response.
Once that happens, it means you have an opportunity to script it. You can automate the repeated tasks. It could be writing a script for creating dashboards with the RED metrics coming from your monitoring tool. Or it could be creating an alert that automatically gets triggered once one of the metrics reaches a certain threshold.
Repeated tasks do have some benefits, regardless of how boring they can be sometimes.
Serves as a Proxy for User Happiness
At the end of the day, the reason you’re monitoring your applications is to help keep users happy. Whether the users are your superiors, colleagues, or your organization’s customers, you need something that will help you understand how happy they are.
There are things like the Net Promoter Score and Apdex that have been used over the years to help determine this. But using RED, what you see in the metrics for rate, errors, and duration is what the user is experiencing. If your web app has a high duration, it likely means the app is slow and the users are also likely unhappy.
So using RED is a good proxy for how happy your application users probably are.
RED Means Go
Using the RED method can help you identify what metrics to monitor quickly. The reasons highlighted above can help you and your team become better at monitoring your applications. Instead of trying to figure out the next steps, you can just get going.
Once you’ve started with RED, you’ll become more acquainted with how your applications work. You can then get into other things to look at, like saturation or system resources. Moving into other types of metrics can lead you into creating your own method for you and your team.
And this is where you want to get to, eventually. You want to be able to develop a method of monitoring that suits your applications. But if you’re not sure when just starting, just start with RED.
Red doesn’t always mean “stop.” Sometimes, it means “go.” So go get monitoring with RED.
This post was written by Jean Tunis. Jean is the principal consultant and founder of RootPerformance, a performance engineering consultancy that helps technology operators minimize cost and lost productivity. He has worked in this space since 1999 with various companies, helping clients solve and plan for application and network performance issues.