Skip to main content
Sumo Logic

Collect logs for the Amazon S3 Audit App

Amazon Simple Storage Service (S3) provides a simple web services interface that can be used to store and retrieve any amount of data from anywhere on the web.

This topic details how to collect logs for Amazon S3 Audit and ingest them into Sumo Logic.

Log Types

Amazon S3 Audit uses Server Access Logs (activity logs). For more information on the log format, see:

Configure Logging in AWS

Before you can begin to collect logs from an S3 bucket, perform the following steps:

  1. Grant Access to an AWS S3 Bucket.
  2. Enable logging using the AWS Management Console.
  3. Confirm that logs are being delivered to the S3 bucket.

Configure a Collector

In Sumo Logic, configure a Hosted Collector.

Configure an S3 Audit Source

When you create an AWS Source, you associate it with a Hosted Collector. Before creating the Source, identify the Hosted Collector you want to use, or create a new Hosted Collector. For instructions, see Configure a Hosted Collector.


  • Sumo Logic supports log files (S3 objects) that do NOT change after they are uploaded to S3. Support is not provided if your logging approach relies on updating files stored in an S3 bucket.
  • Glacier objects will not be collected and are ignored.

S3 Event Notifications Integration

Sumo’s S3 integration combines scan based discovery and event based discovery into a unified integration that gives you the ability to maintain a low-latency integration for new content and provide assurances that no data was missed or dropped. When you enable event based notifications S3 will automatically publish new files to Amazon Simple Notification Service (SNS) topics which Sumo Logic can be subscribed. This notifies Sumo Logic immediately when new files are added to your S3 bucket so we can collect it. For more information about SNS, see the Amazon SNS product detail page.

AWS event notifications diagram.png

Enabling event based notifications is an S3 bucket-level operation that subscribes to an SNS topic. An SNS topic is an access point that Sumo Logic can dynamically subscribe to in order to receive event notifications. When creating a Source that collects from an S3 bucket Sumo assigns an endpoint URL to the Source. The URL is for you to use in the AWS subscription to the SNS topic so AWS notifies Sumo when there are new files. See Configuring Amazon S3 Event Notifications for more information.

You can adjust the configuration of when and how AWS handles communication attempts with Sumo Logic. See Setting Amazon SNS Delivery Retry Policies for details.

Create an AWS Source

These configuration instructions apply to log collection from all AWS Source types. Select the correct Source type for your Source in Step 3.

  1. In Sumo Logic select Manage Data > Collection > Collection
  2. On the Collectors page, click Add Source next to a Hosted Collector, either an existing Hosted Collector, or one you have created for this purpose.
  3. Select your AWS Source type.
  4. Enter a name for the new Source. A description is optional.
  5. For Bucket Name, enter the exact name of your organization's S3 bucket. Be sure to double-check the name as it appears in AWS, for example:
  6. For Path Expression, enter the wildcard pattern that matches the S3 objects you'd like to collect. You can use one wildcard (*) in this string. Recursive path expressions use a single wildcard and do NOT use a leading forward slash. See About Amazon Path Expressions for details.
  7. Collection should begin. Choose or enter how far back you'd like to begin collecting historical logs. You can either:
    • Choose a predefined value from dropdown list, ranging from "Now" to “72 hours ago” to “All Time”, or
    • Enter a relative value. To enter a relative value, click the Collection should begin field and press the delete key on your keyboard to clear the field. Then, enter a relative time expression, for example -1w. You can define when you want collection to begin in terms of months (M), weeks (w), days (d), hours (h) and minutes (m).
  8. For Source Category, enter any string to tag the output collected from this Source. (Category metadata is stored in a searchable field called _sourceCategory.)
  9. For AWS Access you have two Access Method options. Select Role-based access or Key access based on the AWS authentication you are providing. Role-based access is preferred, this was completed in the prerequisite step Grant Sumo Logic access to an AWS Product.
    • For Role-based access enter the Role ARN that was provided by AWS after creating the role. 
      Role based access input roleARN.png
    • For Key access enter the Access Key ID and Secret Access Key. See AWS Access Key ID and AWS Secret Access Key for details.
  10. Log File Discovery. You have the option to set up Amazon Simple Notification Service (SNS) to notify Sumo Logic of new items in your S3 bucket. A scan interval is required and automatically applied to detect log files.
    • SNS Subscription Endpoint (Optional). New files will be collected by Sumo Logic as soon as the notification is received. This will provide faster collection versus having to wait for the next scan to detect the new file.
      1. To set up the subscription you need to get an endpoint URL from Sumo to provide to AWS. This process will save your Source and begin scanning your S3 bucket when the endpoint URL is generated. Click on Create URL and use the provided endpoint URL when creating your subscription in step C.
        log file discovery input.png

Set up SNS in AWS (Optional)
  1. Go to Services > Simple Notification Service and click Create Topic. Enter a Topic name and click Create topic. Copy the provided Topic ARN, you’ll need this for the next step.

  2. Again go to Services > Simple Notification Service and click Create Subscription. Paste the Topic ARN from step B above. Select HTTPS as the protocol and enter the Endpoint URL provided while creating the S3 source in Sumo Logic. Click Create subscription and a confirmation request will be sent to Sumo Logic. The request will be automatically confirmed by Sumo Logic.

  3. Select the Topic created in step B and navigate to Actions > Edit Topic Policy. Use the following policy template, replace the SNS-topic-ARN and bucket-name placeholders in the Resource section of the JSON policy with your actual SNS topic ARN and S3 bucket name:

        "Version": "2008-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            "Action": [
            "Resource": "SNS-topic-ARN",
            "Condition": {
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:s3:*:*:bucket-name"

  4. Go to Services > S3 and select the bucket to which you want to attach the notifications. Navigate to Properties > Events > Add Notification. Enter a Name for the event notification. In the Events section select ObjectCreate (All). In the Send to section (notification destination) select SNS Topic. An SNS section becomes available, select the name of the topic you created in step B from the dropdown. Click Save.

Complete set up in Sumo
  • Scan Interval. Sumo Logic will periodically scan your S3 bucket for new items in addition to SNS notifications. Automatic is recommended to not incur additional AWS charges. This sets the scan interval based on if subscribed to an SNS topic endpoint and how often new files are detected over time. You may enter a set frequency to scan your S3 bucket for new data. To learn more about Scan Interval considerations, see About setting the S3 Scan Interval.
  1. Set any of the following under Advanced:
  • Enable Timestamp Parsing. This option is selected by default. If it's deselected, no timestamp information is parsed at all.
    • Time Zone. There are two options for Time Zone. You can use the time zone present in your log files, and then choose an option in case time zone information is missing from a log message. Or, you can have Sumo Logic completely disregard any time zone information present in logs by forcing a time zone. It's very important to have the proper time zone set, no matter which option you choose. If the time zone of logs can't be determined, Sumo Logic assigns logs UTC; if the rest of your logs are from another time zone your search results will be affected.
    • Timestamp Format. By default, Sumo Logic will automatically detect the timestamp format of your logs. However, you can manually specify a timestamp format for a Source. See Timestamps, Time Zones, Time Ranges, and Date Formats for more information.
  • Enable Multiline Processing. See Collecting Multiline Logs for details on multiline processing and its options. This is enabled by default. Use this option if you're working with multiline messages (for example, log4J or exception stack traces). Deselect this option if you want to avoid unnecessary processing when collecting single-message-per-line files (for example, Linux system.log). Choose one of the following:  
    • Infer Boundaries. Enable when you want Sumo Logic to automatically attempt to determine which lines belong to the same message. If you deselect the Infer Boundaries option, you will need to enter a regular expression in the Boundary Regex field to use for detecting the entire first line of multiline messages.
    • Boundary Regex. You can specify the boundary between messages using a regular expression. Enter a regular expression that matches the entire first line of every multiline message in your log files.
  1. Create any Processing Rules you'd like for the AWS Source.
  2. When you are finished configuring the Source click Save.

Update Source to use S3 Event Notifications

  1. In Sumo Logic select Manage Data > Collection > Collection.
  2. On the Collection page navigate to your Source and click Edit. Scroll down to Log File Discovery and note the Endpoint URL provided, you will use this in step 10.C when creating your subscription.
  3. Complete steps 10.B through 10.E for configuring SNS Notifications.

Troubleshoot S3 Event Notifications

In the web interface under Log File Discovery it shows a red exclamation mark with "Sumo Logic has not received a validation request from AWS".

SNS error, Sumo Logic has not received a validation request from AWS.

Steps to troubleshoot:

  1. Refresh the Source’s page to view the latest status of the subscription in the SNS Subscription section by clicking Cancel then Edit on the Source in the Collection tab.
  2. Verify you have enabled sending Notifications from your S3 bucket to the appropriate SNS topic. This is done in step 10.E.
  3. If you didn’t use CloudFormation check that the SNS topic has a confirmed subscription to the URL in AWS console. A "Pending Confirmation" state likely means that you entered the wrong URL while creating the subscription.

In the web interface under Log File Discovery it shows a green check with "Sumo Logic has received an AWS validation request at this endpoint." but you still have high latencies.

SNS green tick mark, Sumo Logic has received an AWS validation request at this endpoint.

The green check confirms that the endpoint was used correctly, but it does not mean Sumo is receiving notifications successfully.

Steps to troubleshoot:

  1. AWS writes CloudTrail and S3 Audit Logs to S3 with a latency of a few minutes. If you’re seeing latencies of around 10 minutes for these Sources it is likely because AWS is writing them to S3 later than expected. 
  2. Verify you have enabled sending Notifications from your S3 bucket to the appropriate SNS topic. This is done in step 10.E.

Field Extraction Rules

Field Extraction Rules (FERs) tell Sumo Logic which fields to parse out automatically. For instructions, see Create a Field Extraction Rule.

Use the following Parse Expression:

parse "* * [*] * * * * * \"* HTTP/1.1\" * * * * * * * \"*\" *" as bucket_owner, bucket, time, remoteIP, requester, request_ID, operation, key, request_URI, status_code, error_code, bytes_sent, object_size, total_time, turn_time, referrer, user_agent, version_ID

Sample Log Messages

The server access log files consist of a sequence of new-line delimited log records. Each log record represents one request and consists of space delimited fields. The following is an example log consisting of six log records.

79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:00:38 +0000] 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 3E57427F3EXAMPLE REST.GET.VERSIONING - "GET /mybucket?versioning HTTP/1.1" 200 - 113 - 7 - "-" "S3Console/0.4" - 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:00:38 +0000] 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 891CE47D2EXAMPLE REST.GET.LOGGING_STATUS - "GET /mybucket?logging HTTP/1.1" 200 - 242 - 11 - "-" "S3Console/0.4" - 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:00:38 +0000] 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be A1206F460EXAMPLE REST.GET.BUCKETPOLICY - "GET /mybucket?policy HTTP/1.1" 404 NoSuchBucketPolicy 297 - 38 - "-" "S3Console/0.4" - 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:01:00 +0000] 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 7B4A0FABBEXAMPLE REST.GET.VERSIONING - "GET /mybucket?versioning HTTP/1.1" 200 - 113 - 33 - "-" "S3Console/0.4" - 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:01:57 +0000] 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be DD6CC733AEXAMPLE REST.PUT.OBJECT s3-dg.pdf "PUT /mybucket/s3-dg.pdf HTTP/1.1" 200 - - 4406583 41754 28 "-" "S3Console/0.4" - 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:03:21 +0000] 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be BC3C074D0EXAMPLE REST.GET.VERSIONING - "GET /mybucket?versioning HTTP/1.1" 200 - 113 - 28 - "-" "S3Console/0.4" -

Sample Query

| parse "* * [*] * * * * * \"* HTTP/1.1\" * * * * * * * \"*\" *" as bucket_owner, bucket, time, remoteIP, requester, request_ID, operation, key, request_URI, status_code, error_code, bytes_sent, object_size, total_time, turn_time, referrer, user_agent, version_ID
| parse regex field=operation "[A-Z]+\.(?<operation>[\w.]+)"
| count by operation