Skip to main content
Sumo Logic

File Format for Data Forwarding to an Amazon S3 Bucket

 

After you start forwarding data to S3, you should start to see file objects posted in your configured bucket. The log messages are accumulated and returned after being ingested by Sumo.

The log messages are saved in CSV files in compressed gzip files and named according to the convention you specified when you configured Sumo to start data forwarding, as described in step 4 of  Start data forwarding to S3.  (The file naming convention for legacy data forwarding is described below in Legacy File Naming Format.) 

Messages are buffered during data ingest for approximately 5 minutes, or until 100MB of data is received. These file objects will contain the raw messages received as well as the system metadata for the messages, including:

  • messageId: The unique ID for the specific message within Sumo Logic.

  • sourceName: The Source file name metadata applied to the message on ingest.

  • sourceHost: The name of the host that originally sent the data to the service.

  • sourceCategory: The Source Category metadata applied to the message on ingest.

  • messageTime: The parsed message time from the log message, as epoch.

  • receiptTime: The time the service originally received the message, as epoch.

  • sourceID: The unique ID of the Source configured to send the message to the service.

  • collectorId: The unique ID of the Collector configured to send the message to the service.

  • count: The message number from the specific log Source Name. These should be sequential for a specific Source file.

  • format: The timestamp format used to parse the message time from the log message

  • encoding: The encoding of the original file contents.

  • message: The raw log message as read from the original Source.

Example

Metadata fields:

messageId,sourceName,sourceHost,sourceCategory,messageTime,receiptTime,sourceId,collectorId,count,format,view,encoding,message

Sample object:

"-9223371513354977010","/usr/sumo/logs/cqsplitter/cqsplitter.log","nite-cqsplitter-1","cqsplitter","1472590091453","1472590094034","101688020","100607825","979","plain:atp:o:0:l:29:p:yyyy-MM-dd HH:mm:ss,SSSZZZZ","JchenTest2","UTF8","2016-08-30 13:48:11,453 -0700 WARN  [hostId=nite-cqsplitter-1] [module=cqsplitter] [localUserName=cqsplitter] [logger=cqsplitter.engine.CQsMultiMatchersManager] [thread=DTP-cqsplitter.receiver.consumer.v2.threadpool-6] MultiMatcher queue for customer 0000000000000131 is at capacity, adding element will block."

Legacy File Naming Format

The file naming convention for legacy data forwarding (prior to January 2017) is: 

<start_epoch>-<end_epoch>--<objectid>.csv.gz

Where:

  • start_epoch is the epoch time representing the parsed message time of the first message contained within the file
  • end_epoch is the epoch time representing the parsed message time of the last message contained within the file.
  • objectid is a unique ID for the file object, which is generated by Sumo at creation time.