Skip to main content
Sumo Logic

Collect logs for the Amazon S3 Audit App

Amazon Simple Storage Service (S3) provides a simple web services interface that can be used to store and retrieve any amount of data from anywhere on the web.

This topic details how to collect logs for Amazon S3 Audit and ingest them into Sumo Logic.

Log Types

Amazon S3 Audit uses Server Access Logs (activity logs). For more information on the log format, see:

http://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html

Configure Logging in AWS

Before you can begin to collect logs from an S3 bucket, perform the following steps:

  1. Grant Access to an AWS S3 Bucket.
  2. Enable logging using the AWS Management Console.
  3. Confirm that logs are being delivered to the S3 bucket.

Configure a Collector

In Sumo Logic, configure a Hosted Collector.

Configure a Source

  1. Add an Amazon S3 Audit Source.
  2. Configure the Source fields as follows:
    1. Name. Enter a name for your Source.
    2. Path. Enter the string that matches the S3 objects you'd like to collect. A wildcard (*) can be used in this string. DO NOT use a leading forward slash. For more information, see Amazon Path Expressions.  
    3. Source Category. Enter any string to tag the output collected from this Source. For example, use prod/aws/s3audit. (The Source Category metadata field is a fundamental building block to organize and label Sources. For details see Best Practices.)
    4. Key ID. Enter the AWS Access Key ID number granted to Sumo Logic. (For details, see Grant Access to an S3 Bucket.)
    5. Secret Key. Enter the AWS Secret Access Key that Sumo Logic should use to access the S3 bucket. (For details, see Grant Access to an S3 Bucket.)
    6. Scan Interval. Use the default of 5 minutes. Alternately, enter the frequency Sumo Logic will scan your S3 bucket for new data. (For details, see About setting the S3 Scan Interval.)
  3. Configure the Advanced section:
    1. Enable Timestamp Parsing. This option is selected by default. If it's deselected, no timestamp information is parsed at all.
    2. Time Zone. There are two options for Time Zone. You can use the time zone present in your log files, and then choose an option in case time zone information is missing from a log message. Or, you can have Sumo Logic completely disregard any time zone information present in logs by forcing a time zone. It's important to have the proper time zone set, no matter which option you choose. If the log’s time zone can't be determined, Sumo Logic assigns logs UTC; if the rest of your logs are from another time zone, your search results will be affected.
    3. Timestamp Format. By default, Sumo Logic will automatically detect the timestamp format of your logs. However, you can manually specify a timestamp format for a Source. (For details, see Timestamps, Time Zones, Time Ranges, and Date Formats.)
    4. Enable Multiline Processing. Multiline processing is enabled by default. Use this option if you're working with multi-line messages (for example, log4J or exception stack traces). Deselect this option if you want to avoid unnecessary processing when collecting single-message-per-line files (for example, Linux system.log).
    5. Infer Boundaries. Enable when you want Sumo Logic to automatically attempt to determine which lines belong to the same message. If you deselect the Infer Boundaries option, you will need to enter a regular expression in the Boundary Regex field to use for detecting the entire first line of multi-line messages.
    6. Boundary Regex. You can specify the boundary between messages using a regular expression. Enter a regular expression for the full first line of every multi-line message in your log files. (For an example, see the Boundary RegEx section in Local File Source.)
  4. Click Save.

Field Extraction Rules

Field Extraction Rules (FERs) tell Sumo Logic which fields to parse out automatically. For instructions, see Create a Field Extraction Rule.

Use the following Parse Expression:

parse "* * [*] * * * * * \"* HTTP/1.1\" * * * * * * * \"*\" *" as bucket_owner, bucket, time, remoteIP, requester, request_ID, operation, key, request_URI, status_code, error_code, bytes_sent, object_size, total_time, turn_time, referrer, user_agent, version_ID

Sample Log Messages

The server access log files consist of a sequence of new-line delimited log records. Each log record represents one request and consists of space delimited fields. The following is an example log consisting of six log records.


79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 3E57427F3EXAMPLE REST.GET.VERSIONING - "GET /mybucket?versioning HTTP/1.1" 200 - 113 - 7 - "-" "S3Console/0.4" -
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 891CE47D2EXAMPLE REST.GET.LOGGING_STATUS - "GET /mybucket?logging HTTP/1.1" 200 - 242 - 11 - "-" "S3Console/0.4" -
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be A1206F460EXAMPLE REST.GET.BUCKETPOLICY - "GET /mybucket?policy HTTP/1.1" 404 NoSuchBucketPolicy 297 - 38 - "-" "S3Console/0.4" -
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:01:00 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 7B4A0FABBEXAMPLE REST.GET.VERSIONING - "GET /mybucket?versioning HTTP/1.1" 200 - 113 - 33 - "-" "S3Console/0.4" -
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:01:57 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be DD6CC733AEXAMPLE REST.PUT.OBJECT s3-dg.pdf "PUT /mybucket/s3-dg.pdf HTTP/1.1" 200 - - 4406583 41754 28 "-" "S3Console/0.4" -
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:03:21 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be BC3C074D0EXAMPLE REST.GET.VERSIONING - "GET /mybucket?versioning HTTP/1.1" 200 - 113 - 28 - "-" "S3Console/0.4" -

Sample Query

| parse "* * [*] * * * * * \"* HTTP/1.1\" * * * * * * * \"*\" *" as bucket_owner, bucket, time, remoteIP, requester, request_ID, operation, key, request_URI, status_code, error_code, bytes_sent, object_size, total_time, turn_time, referrer, user_agent, version_ID
| parse regex field=operation "[A-Z]+\.(?<operation>[\w.]+)"
| count by operation

Sumo Logic App

Now that you have configured log collection for Amazon S3 Audit, install the Sumo Logic App for Amazon S3, and take advantage of predefined searches and Dashboards. The Sumo Logic App for Amazon S3 Audit presents details from access logs that contain information about the request type, the average response time, and the inbound and outbound data volume.