Skip to main content
Sumo Logic

Local File Source

To collect log messages from the same machine where a Collector is installed, create a Local File Source. If you are editing a Source, metadata changes are reflected going forward. Metadata for previously collected logs will not be retroactively changed.

Local File Sources can collect logs that use the following encoding:

  • UTF-8 (default)
  • UTF-16
  • UTF-16BE
  • UTF-16LE
  • UTF-32
  • UTF-32BE
  • UTF-LE

UTF-16 formats are often used internationally; additionally they are common with logs from Microsoft services, such as MS SQL Server and MS SharePoint. When using UTF-16 encoding, the setting applies to all logs collected by that Source. For example, when using a wildcard path expression, ensure that all the files that meet the filter are using the same content encoding.

To configure a Local File Source:

  1. In the Web Application, select Manage > Collection
  2. Find the name of the installed Collector to which you'd like to add a Source. Click Add... then choose Add Source from the pop-up menu.

  3. Select Local for the Source type. 
  4. Set the following choices:
  • Name. Type the name you'd like to display for the new Source. Description is optional.
  • File Path. List the full path to the file you want to collect. For files on Windows systems (not including Windows Events), enter the absolute path including the drive letter. Escape special characters and spaces with a backslash (\). If you are collecting from Windows using CIFS/SMB, see Prerequisites for Remote Windows Event Log Collection.
    Use a single asterisk wildcard [*] for file or folder names [var/foo/*.log]. Use two asterisks [**] to recurse within directories and subdirectories [var/**/*.log].
  • Collection start time. Choose how far back you'd like to begin collecting historical logs. For example, choose 7 days ago to begin collecting logs with a last modified date within the last seven days. To begin more recently, choose 24 hours.
    IMPORTANT: This setting cannot be changed. 
  • Source Host. Sumo Logic uses the hostname assigned by the OS unless you enter a different host name. Hostname metadata is stored in a searchable field called _sourceHost. Avoid using spaces in metadata tags so that you do not have to quote the source host or the source category in the search query field. For more information, see Metadata Naming Conventions. The hostname can be a maximum of 128 characters.
  • Source Category. Enter any string to tag the output collected from this Source. (Category metadata is stored in a searchable field called _sourceCategory.)
  1. Set any of the following options under Advanced:
  • Blacklist. In the Blacklist field, enter the path for files to exclude from the Source collection. Wildcard syntax is allowed when specifying unwanted files. For example, you are collecting /var/log/*.log but don’t want to collect unwanted*.log, then specify /var/log/unwanted*.log in the blacklist. You can also exclude subdirectories, for example, if you are collecting /var/log/**/*.log but do not want to collect anything from /var/log/unwanted directory, specify /var/log/unwanted.
    You don't need to blacklist compressed files. Sumo Logic automatically excludes compressed files when collecting data.
  • Enable Timestamp Parsing. This option is selected by default. If it's deselected, no timestamp information is parsed at all.
  • Time Zone. There are two options for Time Zone. You can use the time zone present in your log files, and then choose an option in case time zone information is missing from a log message. Or, you can have Sumo Logic completely disregard any time zone information present in logs by forcing a time zone. It's very important to have the proper time zone set, no matter which option you choose. If the time zone of logs can't be determined, Sumo Logic assigns logs UTC; if the rest of your logs are from another time zone your search results will be affected.
  • Timestamp Format. By default, Sumo Logic will automatically detect the timestamp format of your logs. However, you can manually specify a timestamp format for a Source. See Timestamps, Time Zones, Time Ranges, and Date Formats for more information.
  • Encoding. UTF-8 is the default, but you can choose another encoding format from the menu.
  • Enable Multiline Processing. Multiline processing is enabled by default. Use this option if you're working with multi-line messages (for example, log4J or exception stack traces). Deselect this option if you want to avoid unnecessary processing when collecting single-message-per-line files (for example, Linux system.log).
  • Infer Boundaries. Enable when you want Sumo Logic to automatically attempt to determine which lines belong to the same message.
    If you deselect the Infer Boundaries option, you will need to enter a regular expression in the Boundary Regex field to use for detecting the entire first line of multi-line messages.
  • Boundary Regex. You can specify the boundary between messages using a regular expression. Enter a regular expression for the full first line of every multi-line message in your log files. For an example, see the information on boundary regex in this topic. For more information, see Define Boundary Regex for Multi-line Messages
  1. Create any processing rules you'd like for the new Source.
  2. When you are finished configuring the Source click Save.

You can return to this dialog and edit the settings for the Source at any time.

How does Sumo Logic handle log file rotation?

Sumo Logic handles log file rotation without any additional configuration. For example, let's say that an active log file is named error.log, and that it's rotated to error.log.timestamp every night. In this case, Sumo Logic detects that the file is rotated, and continues to monitor both the rotated file as well as the new error.log file, assuming that the first 2048 bytes of the error.log file and the rotated file are different.