In our guides Zen and the Art of System Monitoring and How to Monitor Nginx: The Essential Guide, we cover our monitoring philosophy and recommend a specific set of metrics to monitor and alerts to set for maximum nginx happiness.
Here, we'd like to dive into the nitty-gritty of those essential nginx metrics and discuss more about what exactly they mean and why they're important. This will also serve as a primer for those of you who want a bit more familiarity with some of the (perhaps esoteric) terminology associated with web servers.
For now, this guide covers only the metrics available via
ngx_http_stub_status_module and variables associated with the F/OSS version of nginx. More comprehensive metrics are available via
ngx_http_status_module, which is included with the commercial version (NGINX Plus). A later revision of this guide will be expanded to include those NGINX Plus metrics.
So roll up your sleeves, grab your Slanket or Snuggie, and let's talk nginx metrics...
Metrics are available from two sources:
Nginx Status Modules - The most direct way to get the goods. Data is available either through polling a configurable web page (such as /status) or via embedded variables that can be output to log files. Note: Polling is the preferred method of access, as nginx does not provide embedded variables for all of the status module.
Log Files - Nginx, like most web servers, maintains an "access log" with a record of each request handled. Additional metrics can be synthesized from the access log if your monitoring tools are able to analyze log data. You can use the
log_format directive to instruct nginx to include variable values in each log record. Your monitoring tools can then use those variables values as the inputs for your synthetic metrics.
The status module metrics are mostly about connections, so it's worth diving into some depth about what those are exactly.
An nginx server generally plays two roles. Its primary role, as the name "server" would imply, is to serve content (web pages, images, video, or raw API data) to clients. Clients historically have been web browsers, but now include native mobile apps and other servers that consume your APIs.
Nginx's secondary role is as middleman, or proxy between clients and "upstream servers". Upstream servers do more of the heavy lifting for a given request. They run application servers (like Unicorn for Ruby on Rails, or Tomcat for Java) and generally handle the "dynamic" part of dynamic web pages.
As a proxy, nginx can do a few things really, really well. It can serve local static content (images and non-dynamic HTML files) super quickly, it can act as a cache for content from upstream servers, and it can load-balance requests among several upstream servers.
All this communication by and between clients, nginx, and upstream servers is done through connections -- the network-level establishment of formal communications between two systems. Once a network-level connection is established, application-level data can flow via the higher-level protocols that nginx supports (most commonly HTTP, but also WebSockets, SPDY, TCP streams, and several mail protocols.)
Connections are handled one layer down from the application data (at the TCP layer of networking) and negotiated through a "handshake" process. The details of the TCP protocol & handshake are beyond the scope of this article, but some great details are available on Wikipedia if you're so inclined. For even more on the specifics of connections and how they relate to HTTP, see A Software Developer's Guide to HTTP - Connections
Nginx connections can be in one of several states:
Accepted - After a completed TCP handshake between the client and nginx, a connection is considered Accepted. It then takes one of 3 sub-states:
Handled - Nginx has finished writing data to the client and the request has been completed successfully and closed.
Dropped - Nginx ended the connection prior to successfully completing the request. (This usually happens because of a resource or configuration limit.)
With this background information in mind, let's take a look at the metrics available via
accepts - The total number of accepted connections from clients since the nginx master process started. Note that reloading configuration or restarting worker processes does not reset this metric, but terminating and restarting the master process does.
handled - The total number of handled connections from clients since the nginx master process started. This will be lower than
accepts only in cases where a connection is dropped before it is handled.
handledbe lower than
acceptsby the number of currently "in-flight" requests that are being processed by upstream servers?
requests - The total number of requests from clients since the nginx master process started. A request is an application-level (HTTP, SPDY, etc.) event and is defined as a client requesting a resource via the application protocol. A single connection can (and often does) make multiple requests, so this number will generally be larger than the number of accepted/handled connections.
Reading - The current number of (accepted) connections from clients where nginx is reading the request (at the time the status module was queried.)
Writing - The current number of connections from clients where nginx is writing a response back to the client.
Waiting - The current number of connections from clients that are in the Idle / Waiting state (waiting for a request.)
It's beyond the scope of this guide to dive into every nginx log variable. Instead, we're going to take a close look at a few variables that are of particular interest in the context of monitoring.
The default nginx access log (obtained by declaring
log_format combined in your configuration file) uses the following variables:
$body_bytes_sent - The number of bytes sent to the client as the response body (not including the response header). This allows you to monitor individual response sizes. In aggregate, it also gives you a rough measure of outbound bandwidth.
$http_referer - The
HTTP Referer header from the incoming HTTP request. This is determined by the requesting browser and identifies the URL of the page that linked to the resource being requested. Two interesting notes on this:
The word "referrer", at least in English, is spelled with two Rs, but the original misspelling from the HTTP spec (RFC 1945) managed to stick around.
Since the header is set by the client making the request, it can't always be trusted. Lately this has resulted in what's known as "referrer spam", where malicious clients will spoof their referrer headers to instead show spammy websites for consumers of analytics or monitoring software to see, be tempted by, and (they hope), visit.
$http_user_agent - The
User-Agent header from the incoming HTTP request. This identifies the specific browser, bot, or other software that issued the request, and may also include other information such as the client's operating system. The format is slightly different between human-operated web browsers and automated agents (bots), but the theme is the same. Wikipedia has some awesome nitty-gritty details on user agent strings.
$remote_addr - The IP address of the client making the request. If the request passed through an intermediate device, such as a NAT firewall, web proxy, or your load balancer, this will be the address of the last device to relay the request.
$remote_user - The username supplied if HTTP Basic authentication is used for the request.
$request - The raw HTTP request line. An example of a (familiar) request line is as follows:
GET /community/guides/an-in-depth-guide-to-nginx-metrics/ HTTP/1.1
This is actually a compound variable, composed of 3 sub-variables (each of which is accessible individually if needed):
$request_method $request_uri $server_protocol
A breakdown of these variables:
$request_method - The HTTP method of the request. The most common methods used by browsers are GET and POST, but the spec also includes HEAD, PUT, DELETE, OPTIONS, TRACE and CONNECT (details on each available at the preceding link.)
$request_uri - The URI of the requested page, including query arguments. Note that if the ultimate resource returned is different from the one requested (due to use of a module like mod_rewrite), this field will still log the originally requested URI.
$server_protocol - The application-level protocol and version used in the request. You'll most commonly see HTTP/1.0 or HTTP/1.1, but nginx supports SPDY, WebSockets, and several mail protocols as well.
$time_local - The local (server) time the request was received, in the format
$status - The numeric HTTP Status Code of the response. This is an important variable to monitor, as it provides information regarding errors, missing pages, and other unusual events.
These variables are not included in the default combined log format.
$bytes_sent - The total number of bytes sent to the client in the response, including headers. This is similar to bodybytessent, but provides a more complete picture.
$connection - The connection serial number. This is a unique number assigned by nginx to each connection. If multiple requests are received on a single connection, they will all have the same connection serial number. Serial numbers reset when the master nginx process is terminated, so they will not be unique over long periods of time.
$connection_requests - The number of requests made through this
$content_length - the HTTP Content-Length request header field. This is the total size (in bytes) of the body of the request being made by the client, as reported by the client.
$request_length - the full request length (in bytes) - including the request line, header, and body, as calculated by nginx.
If you're interested in monitoring overall incoming bandwidth, use
$content_length is drawn from a request header, it's calculated by the client and, therefore, has the potential to be spoofed (in the case of a DDoS attack, for example.)
$gzip_ratio - The compression ratio of the response (the ratio between the original and compressed response sizes.) Applicable if you have enabled gzip response compression.
Gzip compression is a feature that nginx provides (through
ngx_http_gzip_module) that pipes responses through gzip before sending them to the client. This can reduce the size of responses by 50% or more and provide a significant outbound bandwidth savings. Gzip is extremely fast, but there is still material overhead in the compression process (both in terms of CPU usage and response time), though some of this overhead is recovered in the transfer time savings from the smaller file.
It's a delicate balance to strike so be sure to monitor your resource usage closely if you decide to use gzip compression.
Further details about compression and decompression are available from NGINX directly.
$host - The DNS name that the client used to find your server (as presented in the
Host HTTP header). If that header is empty or missing, nginx substitutes the name in the first
server_name directive in your nginx configuration.
$upstream_http_HEADERNAME - Following the pattern of
$http_user_agent above, nginx allows you to log any HTTP request headers by referencing
$http_ and the header name (converted to lowercase, with dashes replaced by underscores).
Similarly, you can access the headers as returned by any upstream servers by appending "upstream_" to the front of the desired header name.
$msec - The current time, in milliseconds from the Unix epoch of 1/1/1970. This allows you to determine the exact time at which a request took place.
$pid - The process ID (PID) of the nginx worker that handled the request. This can be used to track the workload and associated metrics of each worker individually.
Note that worker processes can (and do) crash or be reaped and restarted by the nginx master process, so PIDs can come and go. Eventually, an old PID may even be reused.
$request_time is the total time taken for nginx (and any upstream servers) to process a request and send a response. Time is measured in seconds, with millisecond resolution. This is the primary source you should use for your server's response time metric.
The clock starts as soon as the first bytes are read from a client and stops after the last bytes have been sent. Note that this includes the processing time for upstream servers -- if you're interested in breaking out those metrics then you can use
$upstream_response_time, which only measures the response time of the upstream server.
To clarify: The language can be a bit confusing here -- even though the variable is "request" time, it actually measures the elapsed time of the full request-response cycle (from the nginx server's perspective.)
$server_name - The IP address (
$server_addr) or name (
$server_name) of the nginx server that accepted a request. This is useful in a multi-server (load-balanced) environment when you'll need to monitor which requests (and, therefore, which metrics) are handled by each server.
Note: Computation of the IP address requires a system call unless you specifically bind an address using the
listen directive in your configuration. Keep this in mind as the addition of system calls can add significant overhead and impact server performance accordingly. For this reason, $server_name may be a better choice for many installations unless the full IP is specifically required.
$uri - The current URI of the request. Internal to nginx, this value may change during request processing (i.e. in the case of rewrites). The value that is logged represents the URI of the resource that was ultimately sent to the client.
Tis differs from
$request_uri in that
$request_uri does not reflect any URL rewrites internal to the nginx server.
Scalyr offers a fast, powerful server monitoring, alerting, and log management service. We're a team of ex-Google engineers with years of DevOps experience and we know what it's like to be on call, get an alert, and not have enough information to track down the problem. So we decided to fix that. If you like what you've read on this site, you'll probably like using Scalyr.