Sumo Logic Search supports using metadata tags in your messages such as Source Host and Source Category. This metadata is attached to your log messages at collection-time. All metadata is determined by the values you enter when you configure a Source. These tags are very important since they provide valuable keywords and terms you can use to find targeted results in search queries.
Suggestions for the following logical taxonomies are explained in the next sections:
- Collector. The name of the Collector entered at activation time.
- Source. The name of the Source entered when the Source is created.
- Source Category. This is a completely open tag determined by your entry to the "Category" field when you configure a Source. The tags you enter can help you to search by data type, machine type, function, location, or any category you choose without the need to specify which Collector or Source the messages belong to.
- Source Host. For Remote and Syslog Sources, this is a fixed value determined by the hostname you enter in the "Hostname" field (your actual system values for hosts). For a Local File Source, you can overwrite the host system value with a new value of your choice. The hostname can be a maximum of 128 characters.
- Source Name. A fixed value determined by the path you enter in the "File" field when configuring a Source. This metadata tag cannot be changed.
Metadata names must use alphanumeric characters, and may use delimiters such as underscores, hyphens, and periods. Spaces are allowed, but not recommended. If you use spaces, you will be required to use quotation marks in your searches. For details, see Best Practices.
Source Category metadata is a completely open metadata tag. The Source Category metadata is stored in a Sumo Logic field called _sourceCategory. This field is created when you enter text into the Source Category field at Source configuration time. If you prefer to set the tags higher up in the logical hierarchy, you can alternatively enter text in the Collector configuration for all Sources belonging to a Collector. For example, if you have three Syslog Sources feeding into one Collector, you might want to set a list of tags at the Collector level rather than tagging each Source separately. Note that the more specific Source-level tags override the more general Collector level tags.
Log categories can be somewhat complex, as many log files may belong to more than one logical category. For example, you may collect Apache logs for several reasons, including performance monitoring, security, and for audit compliance. Some of the major categories include:
- Security (for security related logs)
- Application (for application logs)
- Audit (logs for audit compliance)
- Performance (performance related logs)
- Debug (for application development debugging)
- Health (for system health logs)
- OS (for Operating System level logs)
In many cases, it may be difficult to foresee all the searches, reports, and use cases you will eventually have for these log files. You may want to try chaining metadata tags using underscores (or slashes), and following a General-to-Specific order, like this:
The implied distinction between, for instance, OS_Application_Mail and say, Application_Mail would be for cases where you may simply be running the Mail Transfer Agent (MTA) that came with your flavor of Linux by default in order to send system notifications from cron jobs (OS_Application_Mail), versus, you are running an MTA as a service to provide email capabilities to your organization or customers (Application_Mail).
This allows you to do searches such as:
For more information, see Best Practices: Good Source Category, Bad Source Category.
Hostname metadata is stored in a Sumo Logic field called _sourceHost. For the hostname, Sumo Logic retrieves and uses the host’s actual OS-level hostname. For Local Sources, you can enter a different value in this field if you choose. If you choose to overwrite the system hostname, Sumo Logic recommends that you carefully select a meaningful name that uniquely identifies the host from which data is collected.
Remote collections present a special circumstance for Source Host metadata since one Remote Source can be configured to collect the same file from multiple hosts. In this case, Sumo Logic will tag each message with just one hostname (the host from which the message originated).
If you choose to overwrite the system names with custom metadata, the recommended best practice is to organize your hostnames in an easy to follow hierarchy such as:
You will then be able to use wildcards to refine a search by any one of the chained terms in the Source Host metadata. For example:
When it comes to Source Host metadata, it's usually best to stick to your organization's current conventions.
The metadata field called Source Name (_sourceName) contains the file path entered when you created your Source. If your Source points to more than one file path, then messages from each file path are tagged with the specific path from which they were collected.
When entering Source metadata, it is strongly recommended that you use underscores to string together your metadata tags without using any space characters. Though spaces and special characters are allowed in the metadata fields, if you use spaces:
- You will need to quote the metadata exactly as you entered it at Source configuration time to find results.
- You will not be allowed to use wildcards since wildcards cannot be used with quotes.
If you use underscores instead of spaces, you will be able to use wildcards in your searches to match a partial tag for the metadata field. So, for example, we recommend you enter metadata into the Source Category field like this:
You can then search using wildcards by typing something like _sourceCategory=*firewall* in the query. Note that metadata tags can be changed later, but the changes are not retroactive. The new tags will only be applied going forward and cannot be changed for data that is already in the Sumo Logic Cloud.
You would still use slashes to separate levels of metadata, such as: