Skip to main content
Sumo Logic

Local File Source

To collect log messages from the same machine where a Collector is installed, create a Local File Source. If you are editing a Source, metadata changes are reflected going forward. Metadata for previously collected logs will not be retroactively changed.

Supported encoding for local file sources

Local File Sources can collect logs that use the following encoding:

  • US-ASCII
  • UTF-8 (default)
  • UTF-16
  • UTF-16BE
  • UTF-16LE
  • UTF-32
  • UTF-32BE
  • UTF-LE

UTF-16 formats are often used internationally; additionally they are common with logs from Microsoft services, such as MS SQL Server and MS SharePoint. When using UTF-16 encoding, the setting applies to all logs collected by that Source. For example, when using a wildcard path expression, ensure that all the files that meet the filter are using the same content encoding.

Avoiding file contention

When the Sumo collector accesses a log file to read its content, the collector opens the file in non-exclusive read mode. The file is opened for read access only, and no read or write locks are requested. File contention issues are still possible, however. For example, if another process attempts to open a file with a read lock at the same time the file is being read by the collector, that attempt will fail. The Add-Content PowerShell cmdlet is known to require a such read lock, and should therefore never be used to populate a file being watched by a Sumo collector.

Configure a local file source

  1. In Sumo Logic select Manage Data > Collection > Collection.
  2. Find the name of the installed Collector to which you'd like to add a Source. Click Add... then choose Add Source from the pop-up menu.


     
  3. Select Local for the Source type. 
  4. Set the following choices:
  • Name. Type the name you'd like to display for the new Source. Description is optional.
  • File Path. List the full path to the file you want to collect. For files on Windows systems (not including Windows Events), enter the absolute path including the drive letter. Escape special characters with a backslash (\). If you are collecting from Windows using CIFS/SMB, see Prerequisites for Remote Windows Event Log Collection.
    Use a single asterisk wildcard [*] for file or folder names [var/foo/*.log]. Use two asterisks [**] to recurse within directories and subdirectories [var/**/*.log].
  • Collection should begin. Choose or enter how far back you'd like to begin collecting historical logs. You can either:
    • Choose a predefined value from dropdown list, ranging from "Now" to “72 hours ago” to “All Time”, or
    • Enter a relative value. To enter a relative value, click the Collection should begin field and press the delete key on your keyboard to clear the field. Then, enter a relative time expression, for example-1w. You can define when you want collection to begin in terms of months (M), weeks (w), days (d), hours (h) and minutes (m).
  • Source Host. Sumo Logic uses the hostname assigned by the OS unless you enter a different host name. Hostname metadata is stored in a searchable field called _sourceHost. Avoid using spaces in metadata tags so that you do not have to quote the source host or the source category in the search query field. The hostname can be a maximum of 128 characters. 

    If the source you are configuring is on the same installed collector as a Docker log source, you can construct the Source Host metadata field using Docker variables. For more information, see Configuring sourceCategory and sourceHost using variables below.

  • Source Category. Enter any string to tag the output collected from this Source. (Category metadata is stored in a searchable field called _sourceCategory.)

    If the source you are configuring is on the same installed collector as a Docker log source, you can construct the Source Category metadata field using Docker variables. For more information, see Configuring sourceCategory and sourceHost using variables below.

  1. Set any of the following options under Advanced:
  • Blacklist. In the Blacklist field, enter the path for files to exclude from the Source collection. Wildcard syntax is allowed when specifying unwanted files. For example, you are collecting /var/log/*.log but don’t want to collect unwanted*.log, then specify /var/log/unwanted*.log in the blacklist. You can also exclude subdirectories, for example, if you are collecting /var/log/**/*.log but do not want to collect anything from /var/log/unwanted directory, specify /var/log/unwanted.
    Compressed Files
    You don't need to blacklist compressed files that end with the file extensions tar, bz2, z, zip, jar, war, 7z, rar, gz, or tar.gz. Sumo Logic, automatically excludes the those compressed file extensions when collecting data.
     
  • Enable Timestamp Parsing. This option is selected by default. If it's deselected, no timestamp information is parsed at all.
    • Time Zone. There are two options for Time Zone. You can use the time zone present in your log files, and then choose an option in case time zone information is missing from a log message. Or, you can have Sumo Logic completely disregard any time zone information present in logs by forcing a time zone. It's very important to have the proper time zone set, no matter which option you choose. If the time zone of logs can't be determined, Sumo Logic assigns logs UTC; if the rest of your logs are from another time zone your search results will be affected.
    • Timestamp Format. By default, Sumo Logic will automatically detect the timestamp format of your logs. However, you can manually specify a timestamp format for a Source. See Timestamps, Time Zones, Time Ranges, and Date Formats for more information.
  • Encoding. UTF-8 is the default, but you can choose another encoding format from the menu.
  • Enable Multiline Processing. See Collecting Multiline Logs for details on multiline processing and its options. This is enabled by default. Use this option if you're working with multiline messages (for example, log4J or exception stack traces). Deselect this option if you want to avoid unnecessary processing when collecting single-message-per-line files (for example, Linux system.log). Choose one of the following:
    • Infer Boundaries. Enable when you want Sumo Logic to automatically attempt to determine which lines belong to the same message.
      If you deselect the Infer Boundaries option, you will need to enter a regular expression in the Boundary Regex field to use for detecting the entire first line of multiline messages.
    • Boundary Regex. You can specify the boundary between messages using a regular expression. Enter a regular expression that matches the entire first line of every multiline message in your log files.
  1. Create any processing rules you'd like for the new Source.
  2. When you are finished configuring the Source click Save.

You can return to this dialog and edit the settings for the Source at any time.

Configuring sourceCategory and sourceHost using variables

In collector version 19.216-22 and later, if you have a Docker logs source on the same installed collector where you are configuring the new source, you can define the sourceCategory (and sourceHost, if the source supports that field) for the new source using system environment variables defined on the collector’s host. To do so, specify the environment variables to include the metadata field in this form:

{{sys.VAR_NAME}} 

Where VAR_NAME is an environment variable name, for example:

{{sys.PATH}}

You can use multiple variables, for example:

{{sys.PATH}} - {{sys.YourEnvVar}}

You can incorporate text in the metadata expression, for example:

AnyTextYouWant {{sys.PATH}} - {{sys.YourEnvVar}}

If a user-defined variable doesn’t exist, that portion of the metadata field will be blank.

How does Sumo Logic handle log file rotation?

Sumo Logic handles log file rotation without any additional configuration. For example, let's say that an active log file is named error.log, and that it's rotated to error.log.timestamp every night. In this case, Sumo Logic detects that the file is rotated, and continues to monitor both the rotated file as well as the new error.log file, assuming that the first 2048 bytes of the error.log file and the rotated file are different.