Archive Logs to Amazon S3

This Solution describes how to archive a copy of your logs to Amazon's S3 storage service for long-term storage. (Note that the archived copy cannot be viewed, searched or analyzed from within Scalyr.)

A Solution is a step-by-step guide for accomplishing a specific task, designed to make sense even if you're just getting started with Scalyr server monitoring. If you're new to Scalyr, you should read the short Getting Started guide. For help with other tasks, see the Solutions directory.

Prerequisites

1. An Amazon AWS account.

2. An Amazon S3 bucket (in the us-east-1 region) to hold your archived logs. To create a bucket: log into the AWS console, navigate to the S3 service page, and click Create Bucket.

Steps

1. Give Scalyr permission to add logs to the S3 bucket you'll be using:

  • Log into the AWS console and navigate to the S3 service page.
  • Click on the name of the bucket where archived logs should be written.
  • Click on Permissions.
  • Click Add User.
  • Enter the canonical ID for aws@scalyr.com ( c768943f39940f1a079ee0948ab692883824dcb6049cdf3d7725691bf4f31cbb ) and check the Object Access Read and Write permission check boxes and click Save.

For example, the Add User form should look like:

2. Open the /scalyr/logs configuration file. Add an s3ArchiveBucket field, containing the name of your S3 bucket. Optionally, you can also add an s3ArchivePathPrefix field (see "Organizing Logs by Day or Month", below). The file should look something like this:

{
  s3ArchiveBucket: "mycorp-log-archive",
  s3ArchivePathPrefix: "{yyyy}/{mm}/{dd}",

  ...
}

Click Update File to save the change.

3. The first batch of logs should be uploaded within a few minutes. To check, go back to the S3 service page in the AWS console and click on the bucket name. If you don't see any logs, wait a few more minutes and then click the refresh icon in the upper-right corner of the S3 console.

Archive Format

Scalyr archives logs in one-hour increments. 10 to 30 minutes after the end of each hour, logs for that hour are written to S3. Each log file from each host creates a separate object in S3. Logs are compressed in gzip format, and are named according to the following pattern:

HOSTNAME_FILENAME_YYYYMMDD_HH.gz

Where YYYY, MM, DD, and HH give the year, month, date, and hour at the beginning of the time period covered by that object, in UTC time. Special characters in the file name are replaced with dashes. For instance:

my-server-1_-var-log-messages_20140924_22.gz

If you would like to use a different format for log archives, please let us know.

Organizing Logs by Day or Month

You can use the s3ArchivePathPrefix setting to group logs into multiple directories, avoiding one gigantic directory in S3. For instance, if you specify {yyyy}/{mm}/{dd}, then a directory will be created for each year, month, and date. All of these directories reside in the same S3 bucket.

You can use the following substitution tokens in s3ArchivePathPrefix:

Token Replacement
{yyyy} The four-digit year, e.g. 2014
{yy} The two-digit year, e.g. 14
{mm} The two-digit month, e.g. 04 for April
{dd} The two-digit date, e.g. 09 for April 9

Troubleshooting

Scalyr logs a brief message each time it uploads a batch of logs to S3. You can view these messages to confirm that everything is working correctly, or to troubleshoot problems. To view these messages, go to the Search page and search for tag='s3-log-archive'.

If you haven't properly configured S3 permissions, you'll see an error like this:

errorMessage=Amazon S3 error: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXXXXXXX, AWS Error Code: AccessDenied, AWS Error Message: Access Denied

Review step 1 above ("Give Scalyr permission to add logs to the S3 bucket..."), and make sure that the bucket name you've specified in the configuration file is correct.

After correcting a configuration error, you will usually have to wait an hour or so until the next batch of logs is uploaded.

If you don't see any records at all with tag='s3-log-archive', check your usage plan (https://www.scalyr.com/plan). S3 log archiving is not enabled for accounts on the Startup plan.

Turning off S3 Archives

To turn off S3 archives, return to the /scalyr/logs configuration file and remove the s3ArchiveBucket field. This will not affect existing archives in S3. You can re-enable archiving at any time by adding the s3ArchiveBucket field again.

Further Reading

If you would like to delete your archived logs after some period of time, you can use S3's Object Lifecycle Management feature. See the S3 documentation for details.