Skip to main content
Sumo Logic

Archive

Archive allows you to forward log data from Installed Collectors to AWS S3 buckets to collect at a later time. If you have logs that you don't need to search immediately you can archive them for later use. You can ingest from your Archive on-demand with hourly granularity.

Archive

To archive your data you need a Processing Rule configured to send to an AWS Archive Destination. First, create an AWS Archive Destination, then create Processing Rules to start archiving.

Create an AWS Archive Destination

  1. Follow the instructions on Grant Access to an AWS Product to grant Sumo permission to send data to the destination S3 bucket.
  2. In Sumo Logic, choose Manage Data > Settings > Data Forwarding.
  3. Click + to add a new destination.
  4. Select AWS Archive bucket for Destination Type.
    destinationType.png
  5. Configure the following:
    • Destination Name. Enter a name to identify the destination.
    • Bucket Name. Enter the exact name of the S3 bucket.
    • Description. You can provide a meaningful description of the connection.
    • Access Method. Select Role-based access or Key access based on the AWS authentication you are providing. Role-based access is preferred. This was completed in step 1, Grant Sumo Logic access to an AWS Product.
      • For Role-based access enter the Role ARN that was provided by AWS after creating the role. 
        data forwarding Role ARN input blur.png
      • For Key access enter the Access Key ID and Secret Access Key. See AWS Access Key ID and AWS Secret Access Key for details.
    • S3 Region. Select the S3 region or keep the default value of Others. The S3 region must match the appropriate S3 bucket created in your Amazon account.
  6. Click Save.

If Sumo Logic is able to verify the S3 credentials, the destination will be added to the list of destinations and you can start archiving to the destination via processing rules.

Create a Processing Rule

A new processing rule type named Archive messages that match allows you to archive log data at the Source level on Installed Collectors.

To configure processing rules for Archive using the web application follow these steps:

  1. Go to Manage Data > Collection > Collection.
  2. Search for the Source that you want to configure, and click the Edit link for the Source.
  3. Scroll down to the Processing Rules section and click the arrow to expand the section.
  4. Click Add Rule.
  5. Type a Name for this rule. (Names have a maximum of 32 characters.)
  6. For Filter, type a regular expression that defines the messages you want to filter. The rule must match the whole message.
    • For multi-line log messages, to get the lines before and after the line containing your text, wrap the segment with (?s).* such as: (?s).*matching text(?s).*
  7. Select Archive messages that match as the rule type. This option is visible only if you have defined at least one AWS Archive bucket destination, as described in the previous section. 
  8. Select the Destination from the drop-down menu.
    processing rule to archive.png
  9. Click Apply.
    • The new rule is listed along with any other previously defined processing rules.
  10. Click Save to save the rules you defined and start archiving data that matches the rule.

Archive format

Forwarded Archive files are prepended with a filename prefix based on the receipt time of your data with the following format:

dt=<date>/hour=<hour>/minute=<minute>/<deploymentName>/<collectorId>/<sourceId>/v1/<fileName>.txt.gzip

Example format of an Archived log message:

{"_id":"763a9b55-d545-4564-8f4f-92fe9db9acea","date":"2019-11-15T13:26:41.293Z","sourceName":"/Users/sumo/Downloads/Logs/ingest.log","sourceHost":"sumo","sourceCategory":"logfile","message":"a log line"}

Batching

By default, the Collector will complete writing logs to an archive file either once every hour or when the file has grown to a size of 100MB, whichever comes first. You can configure batch time and size with the following collector.properties parameters.

collector.properties batch parameters
Parameter Description Data Type Default
archive.cloud.flush.time.ms The time interval in milliseconds of batch files.

The interval must be between five minutes and one hour.
Integer 3600000
archive.disk.flush.max.size.bytes The maximum size of an archive file in bytes (binary format).

The file size must be between 1MB and 100MB.

The size of the log file is calculated before compression.
Integer 104857600
buffer.max.disk.bytes The maximum size in bytes of the on-disk buffer per archive destination.

When the maximum is reached the oldest modified file(s) are deleted. 
Integer 1073741824
buffers.max.memory.bytes The maximum size in bytes for in-memory consumption by messages across all archive destinations before messages are flushed to disk. Integer 10485760

Ingest data from Archive

You can ingest a specific time range of data from your Archive at any time with an AWS S3 Archive Source. First, create an AWS S3 Archive Source, then ingest on command.

Rules

  • Filenames or object key names must be in either of the following formats:
    • Sumo Logic Archive format
    • prefix/dt=YYYYMMDD/hour=HH/fileName.json.gz
  • If a Field is tagged to an archived log message and the ingesting Collector or Source has a different value for the Field, the ingesting Collector and Source values take precedence.
  • If the Collector or Source that Archived the data is deleted the ingesting Collector and Source metadata Fields are tagged to your data.

Create an AWS S3 Archive Source

An AWS S3 Archive Source allows you to ingest your Archived data. Configure it to access the AWS S3 bucket that has your Archived data.

  1. In Sumo Logic select Manage Data > Collection > Collection.
  2. On the Collectors page, click Add Source next to a Hosted Collector, either an existing Hosted Collector or one you have created for this purpose.
  3. Select AWS S3 Archive.
    archive icon.png
  4. Enter a name for the new Source. A description is optional.
  5. Select an S3 region or keep the default value of Others. The S3 region must match the appropriate S3 bucket created in your Amazon account.
  6. For Bucket Name, enter the exact name of your organization's S3 bucket. Be sure to double-check the name as it appears in AWS.
  7. For Path Expression, enter the wildcard pattern that matches the Archive files you'd like to collect. The pattern:
    • can use one wildcard (*).
    • can specify a prefix so only certain files from your bucket are ingested.
      • For example, if your filename is:
            prefix/dt=<date>/hour=<hour>/minute=<minute>/<collectorId>/<sourceId>/v1/<fileName>.txt.gzip
        you could use prefix* to only ingest from those matching files.
    • can NOT use a leading forward slash.
    • can NOT have the S3 bucket name.
  8. For Source Category, enter any string to tag to the data collected from this Source. Category metadata is stored in a searchable field called _sourceCategory.
  9. Fields. Click the +Add Field link to add custom metadata Fields.
    • Define the fields you want to associate, each field needs a name (key) and value. 
      • green check circle.png A green circle with a check mark is shown when the field exists and is enabled in the Fields table schema.
      • orange exclamation point.png An orange triangle with an exclamation point is shown when the field doesn't exist, or is disabled, in the Fields table schema. In this case, an option to automatically add or enable the nonexistent fields to the Fields table schema is provided. If a field is sent to Sumo that does not exist in the Fields schema or is disabled it is ignored, known as dropped.
  10. For AWS Access you have two Access Method options. Select Role-based access or Key access based on the AWS authentication you are providing. Role-based access is preferred, this was completed in the prerequisite step Grant Sumo Logic access to an AWS Product.
    • For Role-based access enter the Role ARN that was provided by AWS after creating the role. 
      Role based access input roleARN.png
    • For Key access enter the Access Key ID and Secret Access Key. See AWS Access Key ID and AWS Secret Access Key for details.
  11. Create any Processing Rules you'd like for the AWS Source.
  12. When you are finished configuring the Source click Save.

On command ingestion

To ingest from your Archive you need an AWS S3 Archive Source configured to access your AWS S3 bucket with the archived data.

  1. In Sumo Logic select Manage Data > Collection > Collection.
  2. On the Collectors page search for the AWS S3 Archive Source that has access to your archived data.
  3. Click the Ingest link for the Source.
    ingest button.png
  4. A window appears where you can select the date and time range of archived data to ingest.
    ingest link form.png
  5. When finished selecting the date and time range click Ingest Data to begin ingestion.