Search optimization tools speed the search process, delivering query results in less time and improving productivity for forensic analysis and log management.
Search speed generally depends on the amount of data and the type of query run against the data. Search optimization tools segment the data and queue it up for quick results.
An index, or proper subset of the data, is central to search optimization. When you run a search against an index, search results are returned more quickly and efficiently because the search is run against a smaller data set.
Sumo Logic supports index-based and field-based methods for search optimization.
Even with these methods, you need to ensure you are following our best practices for queries.
- Partitions route unstructured data into an index (see also: How to Optimize Your Search with Partitions)
- Scheduled Views pre-aggregate data and then index it
With metadata tags assigned to your logs you can reference them in the scope (keyword search expression) of queries to drastically increase search performance.
Metadata is typically from your system or environment, and adds context about what or where the data came from and any associated services or apps. Logs and metrics use metadata that can be customized to anything you need.
In addition to having more data to reference in query operations, this allows you to define a more specific scope of data in search expressions, improving search performance, and allows more specific search filters in Roles and routing expressions in Partitions.
- Log metadata is configured in Sumo as fields consisting of key-value pairs that are tagged to logs during collection.
- You can define fields with Field Extraction Rules by parsing fields when log messages are ingested.
- You can define fields on data sent to Sumo by manually defining them on Sources and Collectors.
- You can provide custom fields through HTTP headers.
- Our AWS Metadata Source allows you to collect tags from EC2 instances running on AWS.
Sumo Logic provides a number of features you can use to enrich the metrics you collect with metadata. Metric metadata provides considerable benefits when you query your metrics: you can scope your metric queries to return only the metrics of interest. Metric metadata can also give you insight that can't be gleaned from unadorned metrics, especially in highly containerized and orchestrated environments.
- Metric metadata is referenced in Sumo with selectors consisting of key-value pairs that are tagged to metrics during collection.
- You can use the metric rules editor to tag metrics with data derived from the metric identifier, and then use those tags in metric queries.
- You can attach custom metadata through HTTP headers.
- You can use the AWS Metadata (Tag) Source for Metrics to apply tags from your EC2 instances to host metrics, Graphite metrics, and Carbon 2.0 metrics you collect.
Search optimization process
When data enters Sumo Logic, search optimization is done in the following order:
- Metadata is applied to your data as Fields. The order of precedence for field assignment from highest to lowest is.
- Field Extraction Rule (FER)
- Amazon EC2 resource tags
- Amazon EC2 instance information
- HTTP Header
- Partitions and Scheduled Views are applied. If both Partitions and Scheduled Views are defined, the Partitions are applied first.
- The data is indexed.
- The optimized and indexed data is available for use with other Sumo Logic features.
Is there such a thing as creating too many indexes?
Yes. Indexes can be overused, and in some situations, they can even slow the search process. When designing your organization's indexes, think about the minimal amount of data it makes sense to index, regardless of the tool. When running a search on non-indexed data, Sumo Logic might need to process all indexed data as well, which can take a long time.
How do Partitions and Scheduled Views differ?
Partitions begin building a non-aggregate index from the date a Partition started, only indexing data moving forward.
Scheduled Views backfill, meaning that all data that extends back to the start date of the Scheduled View can be queried.
Choosing the right indexed search optimization tool
Here's a quick look at how to choose the right indexed search optimization tool.
|I want to...
|Run queries against a certain set of data
|Choose if the quantity of data to be indexed is more than 2% of the total data.
|Choose if the quantity of data to be indexed is less than 2% of the total data.
|Use data to identify long-term trends
|Segregate data by sourceCategory
|Have aggregate data ready to query
|Use RBAC to deny or grant access to the data set
|Reuse the fields that I'm parsing for other searches against this same sourceCategory
How is data added to Partitions and Scheduled Views?
As data enters Sumo Logic, it is first routed to any Partitions for indexing. It is then checked against Scheduled Views, and any data that matches the Scheduled Views is indexed.
Data can be in both a Partition and a Scheduled View because the two tools are used differently (and are indexed separately). Although Partitions are indexed first, the process does not slow the indexing of Scheduled Views.