Best Practices: Search Rules to Live By
Rule 1 - Be specific with search scope
At a minimum, all searches should use one or more metadata tags in the scope, for example: _sourceCategory, _source, _sourceName, _sourceHost, or _collector.
If possible, also use one or more keywords to limit the scope.
Rule 2 - Limit search time range
Use the smallest time range required for your use case. When reviewing data over long time ranges, build and test your search against a shorter time range first, then extend the time range once the search is finalized.
Rule 3 - Use fields extracted by FERs and avoid the where operator
Whenever possible, use keyword searches and fields already extracted using Field Extraction Rules (FERs) to filter data instead of using the where operator. If it is not possible to only use a keyword or pre-extracted field, use both a keyword search AND the where clause.
Best approach - Field Extraction Rule field AND keyword
_sourceCategory=foo and fielda=valuea
Good approach - Keyword search AND where operator
_sourceCategory=foo and valuea
| parse "somefield *" as somefield
| where somefield="valuea"
Least preferred approach - No keyword search, no pre-extracted field
_sourceCategory=foo
| parse "somefield *" as somefield
| where somefield="valuea"
Rule 4 - Filter your data before aggregation
When filtering data, make the result set you are working with as small as possible before conducting aggregate operations like sum, min, max, and average. As stated in Rule 1, keywords and metadata in your search scope are the priority. If you must use a where
clause, refer to Rule 3.
Best approach
_sourceCategory=Prod/User/Eventlog user="john"
| count by user
Least preferred approach
_sourceCategory=Prod/User/Eventlog
| count by user
| where user="john"
Rule 5 - Use parse anchor instead of parse regex for structured messages
As Rule 3 states, it is best to use pre-extracted fields. If you need to parse a field that is not pre-extracted, use parse anchor. If you are dealing with unstructured messages that are more complex, leverage parse regex and place it in a Field Extraction Rule.
Rule 6 - When using parse regex avoid expensive tokens
If you need to use parse regex, avoid the use of expensive operations like .* Just as Rule 1 states for your search scope, be as specific as you can with your regular expressions as well.
Example log message
52.87.131.109 - - [2016-09-12 20:13:52.870 +0000] "GET /blog/index.php HTTP/1.1" 304 8932
Best approach
| parse regex "(?<client_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s"
Least preferred approach
| parse regex "(?<client_ip>.*)\s-"
Rule 7 - Use partitions and scheduled views
Sumo provides two index-based search optimization features: partitions and scheduled views. When you run a search against an partition or scheduled view, search results are returned more quickly and efficiently because the search is run against a smaller data set. For more information, see Optimize Search Performance.
Rule 8 - Use Search Parameters
If your search contains filtering criteria that could change each time the search is executed, take advantage of Search Templates. Search templates make it easier for less expert users to obtain search results, and also reduces the risk that such users will run expensive searches.
Rule 9 - Aggregate before a lookup
Whenever possible, you should aggregate data prior to doing a lookup. In some cases, this will significantly reduce the amount of data the lookup is referencing.
Best approach
| count by client_ip
| lookup is_bad_ip from shared/bad/ips on client_ip=ip
Less preferred approach
| lookup is_bad_ip from shared/bad/ips on client_ip=ip
| count by is_bad_ip
Rule 10 - Put pipe-delimited operations on separate lines
For readability, use a soft return in the query field to put each new pipe-delimited operation on a separate line.
Best approach
_sourceCategory=Apache/Access and GET
| parse "\"GET * HTTP/1.1\"\" * * \"\"*\"\"" as url,status_code,size,referrer
| count by status_code,referrer
| sort _count
Less preferred approach
_sourceCategory=Apache/Access and GET | parse "\"GET * HTTP/1.1\"\" * * \"\"*\"\"" as url,status_code,size,referrer | count by status_code,referrer | sort _count
Rule 11 - Pin searches with long time ranges
A query with a longer time range can run past the default time window for Sumo Logic. To protect against an interruption in a query with a significant time range, pin it. A pinned search can run in the background for up to 24 hours.