Skip to main content
Sumo Logic


Archive allows you to forward log data from Installed Collectors to AWS S3 buckets to collect at a later time. If you have logs that you don't need to search immediately you can archive them for later use. You can ingest from your Archive on-demand with hourly granularity.


To archive your data you need a Processing Rule configured to send to an AWS Archive Destination. First, create an AWS Archive Destination, then create Archive processing rules to start archiving. Any data that matches the filter expression of an Archive processing rule is not sent to Sumo Logic, instead, it is sent to your AWS Archive Destination.

Create an AWS Archive Destination

  1. Follow the instructions on Grant Access to an AWS Product to grant Sumo permission to send data to the destination S3 bucket.
  2. In Sumo Logic, select Manage Data > Settings > Data Forwarding.
  3. Click + to add a new destination.
  4. Select AWS Archive bucket for Destination Type.
  5. Configure the following:
    • Destination Name. Enter a name to identify the destination.
    • Bucket Name. Enter the exact name of the S3 bucket.
    • Description. You can provide a meaningful description of the connection.
    • Access Method. Select Role-based access or Key access based on the AWS authentication you are providing. Role-based access is preferred. This was completed in step 1, Grant Sumo Logic access to an AWS Product.
      • For Role-based access enter the Role ARN that was provided by AWS after creating the role. 
        data forwarding Role ARN input blur.png
      • For Key access enter the Access Key ID and Secret Access Key. See AWS Access Key ID and AWS Secret Access Key for details.
    • S3 Region. Select the S3 region or keep the default value of Others. The S3 region must match the appropriate S3 bucket created in your Amazon account.
  6. Click Save.

If Sumo Logic is able to verify the S3 credentials, the destination will be added to the list of destinations and you can start archiving to the destination via processing rules.

Create a Processing Rule

A new processing rule type named Archive messages that match allows you to archive log data at the Source level on Installed Collectors.

An Archive processing rule acts like an exclude filter, functioning as a blacklist filter where the matching data is not sent to Sumo Logic, and instead sends the excluded data to your AWS Archive bucket.

To configure processing rules for Archive using the web application follow these steps:

  1. Go to Manage Data > Collection > Collection.
  2. Search for the Source that you want to configure, and click the Edit link for the Source.
  3. Scroll down to the Processing Rules section and click the arrow to expand the section.
  4. Click Add Rule.
  5. Type a Name for this rule. (Names have a maximum of 32 characters.)
  6. For Filter, type a regular expression that defines the messages you want to filter. The rule must match the whole message.
    • For multi-line log messages, to get the lines before and after the line containing your text, wrap the segment with (?s).* such as: (?s).*matching text(?s).*
  7. Select Archive messages that match as the rule type. This option is visible only if you have defined at least one AWS Archive bucket destination, as described in the previous section. 
  8. Select the Destination from the drop-down menu.
    processing rule to archive.png
  9. Click Apply.
    • The new rule is listed along with any other previously defined processing rules.
  10. Click Save to save the rules you defined and start archiving data that matches the rule.

Archive format

Forwarded Archive files are prepended with a filename prefix based on the receipt time of your data with the following format:


Example format of an Archived log message:

{"_id":"763a9b55-d545-4564-8f4f-92fe9db9acea","date":"2019-11-15T13:26:41.293Z","sourceName":"/Users/sumo/Downloads/Logs/ingest.log","sourceHost":"sumo","sourceCategory":"logfile","message":"a log line"}


By default, the Collector will complete writing logs to an archive file either once every hour or when the file has grown to a size of 100MB, whichever comes first. You can configure batch time and size with the following parameters. batch parameters
Parameter Description Data Type Default The time interval in milliseconds of batch files.

The interval must be between five minutes and one hour.
Integer 3600000
archive.disk.flush.max.size.bytes The maximum size of an archive file in bytes (binary format).

The file size must be between 1MB and 100MB.

The size of the log file is calculated before compression.
Integer 104857600
buffer.max.disk.bytes The maximum size in bytes of the on-disk buffer per archive destination.

When the maximum is reached the oldest modified file(s) are deleted. 
Integer 1073741824
buffers.max.memory.bytes The maximum size in bytes for in-memory consumption by messages across all archive destinations before messages are flushed to disk. Integer 10485760

Ingest data from Archive

You can ingest a specific time range of data from your Archive at any time with an AWS S3 Archive Source. First, create an AWS S3 Archive Source, then create an ingestion job.


  • Filenames or object key names must be in either of the following formats:
    • Sumo Logic Archive format
    • prefix/dt=YYYYMMDD/hour=HH/fileName.json.gz
  • If a Field is tagged to an archived log message and the ingesting Collector or Source has a different value for the Field, the ingesting Collector and Source values take precedence.
  • If the Collector or Source that Archived the data is deleted the ingesting Collector and Source metadata Fields are tagged to your data.

Create an AWS S3 Archive Source

An AWS S3 Archive Source allows you to ingest your Archived data. Configure it to access the AWS S3 bucket that has your Archived data.

  1. In Sumo Logic select Manage Data > Collection > Collection.
  2. On the Collectors page, click Add Source next to a Hosted Collector, either an existing Hosted Collector or one you have created for this purpose.
  3. Select AWS S3 Archive.
    archive icon.png
  4. Enter a name for the new Source. A description is optional.
  5. Select an S3 region or keep the default value of Others. The S3 region must match the appropriate S3 bucket created in your Amazon account.
  6. For Bucket Name, enter the exact name of your organization's S3 bucket. Be sure to double-check the name as it appears in AWS.
  7. For Path Expression, enter the wildcard pattern that matches the Archive files you'd like to collect. The pattern:
    • can use one wildcard (*).
    • can specify a prefix so only certain files from your bucket are ingested.
      • For example, if your filename is:
        you could use prefix* to only ingest from those matching files.
    • can NOT use a leading forward slash.
    • can NOT have the S3 bucket name.
  8. For Source Category, enter any string to tag to the data collected from this Source. Category metadata is stored in a searchable field called _sourceCategory.
  9. Fields. Click the +Add Field link to add custom metadata Fields.
    • Define the fields you want to associate, each field needs a name (key) and value. 
      • green check circle.png A green circle with a check mark is shown when the field exists and is enabled in the Fields table schema.
      • orange exclamation point.png An orange triangle with an exclamation point is shown when the field doesn't exist, or is disabled, in the Fields table schema. In this case, an option to automatically add or enable the nonexistent fields to the Fields table schema is provided. If a field is sent to Sumo that does not exist in the Fields schema or is disabled it is ignored, known as dropped.
  10. For AWS Access you have two Access Method options. Select Role-based access or Key access based on the AWS authentication you are providing. Role-based access is preferred, this was completed in the prerequisite step Grant Sumo Logic access to an AWS Product.
    • For Role-based access enter the Role ARN that was provided by AWS after creating the role. 
      Role based access input roleARN.png
    • For Key access enter the Access Key ID and Secret Access Key. See AWS Access Key ID and AWS Secret Access Key for details.
  11. Create any Processing Rules you'd like for the AWS Source.
  12. When you are finished configuring the Source click Save.

Archive page

The Archive page provides a table of all the existing AWS S3 Archive Sources in your account and ingestion jobs. An ingestion job is a request to pull data from your S3 bucket. The job begins immediately and provides statistics on its progress. To ingest from your Archive you need an AWS S3 Archive Source configured to access your AWS S3 bucket with the archived data.

Archive table.png

Details pane

Click on a table row to view the Source details. This includes:

  • Name
  • Description
  • AWS S3 bucket
  • All ingestion jobs that are and have been created on the Source.
    • Each ingestion job shows the name, time window, and volume of data processed by the job. Use the Open in Search link to view the data in a Search that was ingested by the job.
    • Hover your mouse over the information icon to view who created the job and when.

Archive page details pane.png

Create an ingestion job
  1. In Sumo Logic select Manage Data > Collection > Archive.
  2. On the Archive page search and select the AWS S3 Archive Source that has access to your archived data.
  3. Click New Ingestion Job and a window appears where you:
    1. Define a mandatory job name that is unique to your account.
    2. Select the date and time range of archived data to ingest. A maximum of 3 days is supported.
      Archive ingest job.png
  4. Click Ingest Data to begin ingestion. The status of the job is visible in the Details pane of the Source in the Archive page.
Job Status

An ingestion job will have one of the following statuses:

  • Pending - The job is queued before scanning has started.
  • Scanning - The job is actively scanning for objects from your S3 bucket. Your objects could be ingesting in parallel.

  • Ingesting - The job has completed scanning for objects and is still ingesting your objects.

  • Failed - The job has failed to complete. Partial data may have been ingested and is searchable.

  • Succeeded - The job completed ingesting and your data is searchable.