Amazon S3 Source
Amazon Simple Storage Service (Amazon S3) provides a web services interface that can be used to store and retrieve any amount of data from anywhere on the web. Use an Amazon S3 Source to upload data to Sumo Logic from S3.
One Amazon S3 Source can collect data from a single S3 bucket. However, you can configure multiple S3 Sources to collect from one S3 bucket. For example, you could use one S3 Source to collect one particular data type, and then configure another S3 Source to collect another data type.
For information on S3 performance optimization, see Request Rate and Performance Considerations.
Compressed data
An S3 Source can collect either plain text or gzip-compressed text. Zip files are not supported.
Data is treated as plain text by default, but gzip decompression will be used if both of the following conditions apply:
- The target file has a .gz or .gzip extension, or no file extension.
- The target file's initial bytes match the gzip file format.
Files are transferred in their compressed form and decompressed when ingested. Data volume is calculated by the size of your data decompressed.
Configure an Amazon S3 Source
- Grant Sumo Logic access to an Amazon S3 bucket.
- Enable logging in AWS using the Amazon Console.
- Confirm that logs are being delivered to the Amazon S3 bucket.
- Add an Amazon S3 Source to collect objects from your Amazon S3 bucket. See below for details.
Amazon S3 Source
When you create an Amazon Source, you add it to a Hosted Collector. Before creating the Source, identify the Hosted Collector you want to use or create a new Hosted Collector. For instructions, see Configure a Hosted Collector.
Rules
- If you're editing the
Collection should begin
date on a Source the new date must be after the currentCollection should begin
date.
If you set Collection should begin to a collection time that overlaps with data that was previously ingested on a source, it may result in duplicated data to be ingested into Sumo Logic.
-
Sumo Logic supports log files (S3 objects) that do NOT change after they are uploaded to S3. Support is not provided if your logging approach relies on updating files stored in an S3 bucket. S3 does not have a concept of updating existing files, you can only overwrite an existing file. When this overwrite happens, S3 considers it as a new file object, or a new version of the file, and that file object gets its own unique version ID.
Sumo Logic scans an S3 bucket based on the path expression supplied, or receives an SNS notification when a new file object is created. As part of this, we receive a file name (key) and the object's ID. It's compared against a list of file objects already ingested. If a matching file ID is not found the contents of the file are ingested in full.
When you overwrite a file in S3, the file object gets a new version ID and as a result, Sumo Logic sees it as a new file and ingests all of it. If with each version you post to S3 you are simply adding to the end of the file, then this will lead to duplicate messages ingested, one message for each version of the file you created in S3.
-
Duplicate logs are collected when changing the AWS versioned APIs setting from Yes to No and the S3 bucket has versioning enabled.
-
Glacier objects will not be collected and are ignored.
-
If you're using SNS you need to create a separate topic and subscription for each Source.
Cisco Umbrella
Cisco Umbrella offers logging to a Cisco-managed S3 bucket. Collection from these buckets has the following limitations:
- AWS versioned APIs are not supported. The Use AWS versioned APIs setting on the Source must be disabled.
- S3 Event Notifications Integration is not supported.
- Access must be provided with an Access ID and Key. Role-based access is not supported.
- Use a prefix in the path expression so it doesn't point to the root directory.
- Ensure that your path expression ends in
/*
. Otherwise, you will get a ListBucket error. For example:s3://cisco-managed-us-east-1/PREFIX/*
S3 Event Notifications Integration
Sumo’s S3 integration combines scan-based discovery and event based discovery into a unified integration that gives you the ability to maintain a low-latency integration for new content and provide assurances that no data was missed or dropped. When you enable event based notifications S3 will automatically publish new files to Amazon Simple Notification Service (SNS) topics which Sumo Logic can be subscribed. This notifies Sumo Logic immediately when new files are added to your S3 bucket so we can collect them. For more information about SNS, see the Amazon SNS product detail page.
Enabling event based notifications is an S3 bucket-level operation that subscribes to an SNS topic. An SNS topic is an access point that Sumo Logic can dynamically subscribe to in order to receive event notifications. When creating a Source that collects from an S3 bucket Sumo assigns an endpoint URL to the Source. The URL is for you to use in the AWS subscription to the SNS topic so AWS notifies Sumo when there are new files. See Configuring Amazon S3 Event Notifications for more information.
You can adjust the configuration of when and how AWS handles communication attempts with Sumo Logic. See Setting Amazon SNS Delivery Retry Policies for details.
Create an Amazon S3 Source
-
In Sumo Logic, select Manage Data > Collection > Collection.
-
On the Collectors page, click Add Source next to a Hosted Collector, either an existing Hosted Collector, or one you have created for this purpose.
-
Select Amazon S3.
-
Enter a name for the new Source. A description is optional.
-
Select an S3 region or keep the default value of Others. The S3 region must match the appropriate S3 bucket created in your Amazon account. Selecting an AWS GovCloud region means your data will be leaving a FedRAMP-high environment. Use responsibly to avoid information spillage. See Collection from AWS GovCloud for details.
-
Use AWS versioned APIs? Select Yes to collect from buckets where versioning is enabled. This uses the list-object-versions and get-object-version Amazon S3 APIs. Selecting Yes requires your credentials to have ListObjectVersions and GetObjectVersion permissions.
-
For Bucket Name, enter the exact name of your organization's S3 bucket. Be sure to double-check the name as it appears in AWS, for example:
-
For Path Expression, enter the wildcard pattern that matches the S3 objects you'd like to collect. You can use more than one wildcard (*) in this string. Recursive path expressions use a multiple wildcard. Do NOT use a leading forward slash. See About Amazon Path Expressions for details.
Following is an example of a managed S3 bucket's name and path expression entered in the dialog. Together they comprise an S3 bucket data path. For more information, see S3 Bucket Data Path in the Cisco documentation.
-
Collection should begin. Choose or enter how far back you'd like to begin collecting historical logs. You can either:
- Choose a predefined value from dropdown list, ranging from "Now" to “72 hours ago” to “All Time”.
- Enter a relative value. To enter a relative value, click the Collection should begin field and press the delete key on your keyboard to clear the field. Then, enter a relative time expression, for example
-1w
. You can define when you want collection to begin in terms of months (M), weeks (w), days (d), hours (h), and minutes (m).
noteIf you paused the Source and want to skip some data when you resume, update the Collection should begin setting to a time after it was paused.
noteIf you set Collection should begin to a collection time that overlaps with data that was previously ingested on a source, it may result in duplicated data to be ingested into Sumo Logic.
-
For Source Category, enter any string to tag the output collected from this Source. (Category metadata is stored in a searchable field called _sourceCategory.)
-
Fields. Click the +Add Field link to define the fields you want to associate, each field needs a name (key) and value.
- A green circle with a check mark is shown when the field exists in the Fields table schema.
- An orange triangle with an exclamation point is shown when the field doesn't exist in the Fields table schema. In this case, an option to automatically add the nonexistent fields to the Fields table schema is provided. If a field is sent to Sumo that does not exist in the Fields schema it is ignored, known as dropped.
-
For AWS Access you have two Access Method options. Select Role-based access or Key access based on the AWS authentication you are providing. Role-based access is preferred, this was completed in the prerequisite step Grant Sumo Logic access to an AWS Product. If you're collecting from a Cisco Umbrella bucket you must use Key access.
-
For Role-based access enter the Role ARN that was provided by AWS after creating the role.
-
For Key access enter the Access Key ID and Secret Access Key. See AWS Access Key ID and AWS Secret Access Key for details.
-
-
Log File Discovery. You have the option to set up Amazon Simple Notification Service (SNS) to notify Sumo Logic of new items in your S3 bucket. A scan interval is required and automatically applied to detect log files.
infoSumo Logic highly recommends using an SNS Subscription Endpoint for its ability to maintain low-latency collection. This is essential to support up-to-date Alerts.
If you're collecting from a Cisco Umbrella bucket SNS Subscription Endpoint is not supported.
-
Scan Interval. Sumo Logic will periodically scan your S3 bucket for new items in addition to SNS notifications. Automatic is recommended to not incur additional AWS charges. This sets the scan interval based on if subscribed to an SNS topic endpoint and how often new files are detected over time. If the Source is not subscribed to an SNS topic and set to Automatic the scan interval is 5 minutes. You may enter a set frequency to scan your S3 bucket for new data. To learn more about Scan Interval considerations, see About setting the S3 Scan Interval.
-
SNS Subscription Endpoint (Highly Recommended). New files will be collected by Sumo Logic as soon as the notification is received. This will provide faster collection versus having to wait for the next scan to detect the new file.
To set up the subscription you need to get an endpoint URL from Sumo to provide to AWS. This process will save your Source and begin scanning your S3 bucket when the endpoint URL is generated. Click on Create URL and use the provided endpoint URL when creating your subscription in step C.
-
Set up SNS in AWS (Highly Recommended)
The following steps use the Amazon SNS Console. You may instead use AWS CloudFormation. Follow the instructions to use CloudFormation to set up an SNS Subscription Endpoint.
-
Go to Services > Simple Notification Service and click Create Topic. Enter a Topic name and click Create topic. Copy the provided Topic ARN, you’ll need this for the next step. Make sure that the topic and the bucket are in the same region.
-
Again go to Services >