Skip to main content
Sumo Logic

outlier

Given a series of time-stamped numerical values, using the Outlier operator in a query can identify values in a sequence that seem unexpected, and would identify an alert or violation, for example, for a scheduled search.

To do this, the Outlier operator tracks the moving average and standard deviation of the value, and detects or alerts when the difference between the value exceeds mean by some multiple of standard deviation, for example, 3 standard deviation.

For a comparison to Anomaly Detection, see Use Cases for Anomaly Detection vs. Outlier

Syntax:

  • ... | timeslice 1m | max(x) as response_time by _timeslice | outlier response_time
  • ...| timeslice 1m | count (_sourcehost) as _sourcehost by _timeslice | outlier _count

Make sure that your syntax includes only one key field: _timeslice. This is necessary in order to make the line chart option available.

The second syntax example use an additional “group by” clause to find outliers for multiple values of 
_sourcehost. See the example below for details.

This syntax adds the following fields to the output:

  • response_time_error - This is the response_time - mean.
  • response_time_lower - This is the mean - threshold*standard deviation.
  • response_time_upper - This is the mean + threshold*standard deviation.
  • response_time_indicator - This is 1 for value outside of the lower and upper boundaries.
  • response_time_violation - This is 1 for hitting specified number of consecutive indicators.

There are defaults for all parameters, but you can configure parameters through keyword arguments, such as window length or threshold.

For example, this query would set the following parameters:

... | outlier response_time window=5,threshold=3,consecutive=2,direction=+-

  • window - Use the trailing 5 data points to calculate mean and sigma. The default is 10.
  • threshold - Calculate violation based on +/- 3 standard deviations. The default is 3.0.
  • consecutive - Only set response_time_violation to 1 if 2 or more consecutive data points are observed further than 3 standard deviations from the rolling average. The default is 1.
  • direction - Uses +-, +, or -, for which direction triggers violations:
    • Use +- for positive or negative deviations. This is the default.
    • Use + for only positive deviations (more than expected).
    • Use - for only negative deviations (less than expected).

Rules:

  • The Outlier operator must appear after a group by aggregator, such as count, min, max, or sum.
  • The original target field must be numeric.

Limitations

  • Because the most recent time bucket in a query may have incomplete data, it is ignored by outlier. Consequently, if an alert is set to trigger on <field_name>_violation changing to 1, this alert will trigger one timeslice later.

Examples

IIS logs

Run the following query to find outlier values in IIS logs over the last 6 hours.

_sourceCategory=IIS/Access
| parse regex "\d+-\d+-\d+ \d+:\d+:\d+ (?<server_ip>\S+) (?<method>\S+) (?<cs_uri_stem>/\S+?) \S+ \d+ (?<user>\S+) (?<client_ip>[\.\d]+) "
| parse regex "\d+ \d+ \d+ (?<response_time>\d+)$"
| timeslice 15m 
| max(response_time) as response_time by _timeslice
| outlier response_time window=5,threshold=3,consecutive=2,direction=+-

The outlier values are represented by the pink triangles in the resulting chart.

Apache logs - Sever Errors Over Time

Run the following query to find outlier values in Apache logs over the last 3 hours.

_sourceCategory=Apache/Access
| parse "HTTP/1.1\" * " as status_code
| where status_code matches "5*"
| timeslice 5m 
| count(status_code) as status_code by _timeslice
| outlier status_code window=5,threshold=3,consecutive=1,direction=+-

The outlier values are represented by the pink triangles in the resulting chart.

Use an additional “group by” clause to find outliers for multiple values of _sourcehost.

You can also run a query like this:

_sourcecategory=database
| timeslice 1m
| count by _timeslice,_sourcehost
| outlier _count by _sourcehost

This way, you can run outlier analysis separately for each value of _sourcehost, as shown.

This example will only produce an aggregation table, not a chart, but the indicator and violation fields will correctly reflect each _sourcehost processing.

Multidimensional Outlier Detection

The Outlier operator supports multidimensional or multi-time series detection. Multidimensional outlier detection is useful when you want to monitor the behavior of each user, server, application feature, or other single “entity”, rather than some aggregation across all entities.

For example, you could detect failed logins by user. To do so, you would want to understand whether any user account, individually, has experienced a strange amount of failed logins, not whether we’ve seen some spike in the average or total amount of failed logins across all users. The latter may be useful, but with hundreds or thousands of users (entities), a spike in failed logins may get lost in the noise of a “normal” amount of total failed logins, and you could miss a spike in failed logins for one specific user.

Other examples include:

  • Detecting anomalies while tracking page faults, disk operation, or CPU utilization for all the nodes in a cluster simultaneously.
  • Monitoring the performance of every workstation simultaneously, without the need to build an outlier report for each one.
  • Monitoring failed image uploads for every user of an application (not total failed uploads across all users).

If you have used the outlier operator, it is easy to create a multidimensional outlier operation. Just add by <dimension> to the end of the query.

For example, the following example query will determine many time series, one per each _sourcehost:

_sourcecategory=database
| timeslice 1m
| count by _timeslice,_sourcehost
| outlier _count by _sourcehost 

You can display the raw results of a multidimensional time series in a table chart, but currently other chart options are not available.

In the following table chart, a value of 1 in the _count_violation column indicates that the data point corresponding to that timeslice is an outlier.

multi_outlier_group_by.png

Alerts Based on Multidimensional Outlier Results

To create an alert based on the multi-series outlier table above, extract _count_violation.

This way, you won’t need to build an alert for each series of data (each _sourcehost in the previous example), and you can automatically monitor a dynamic series for deviating behavior.

The following example query allows you to monitor when application users experience failures. It monitors all user accounts by unique user ID, and applies outlier to the amount of “fail” messages that occur across every user account:

_sourceCategory=Prod
| parse "UserID:* " as user_id
| parse "Result:* " as result
| where result = "Fail"
| timeslice 1h
| count by user_id, _timeslice
| outlier _count by user_id
| fields _timeslice,user_id, _count_violation
| transpose row _timeslice column user_id

Once you have run the query, you can click Save As to create a Scheduled Search and configure it to send an alert when any user account experiences an unusual amount of failures, or other event you want to monitor each series of data for.

To visualize your results, on the Search page, you can create a column chart, then change the stacking property to normal to display alerts by unique user_id (the multidimensional aspect).

outlier_alert.png

Chart Multidimensional Outlier Results

This section provides two examples of how to display multidimensional outlier results in charts.

Example 1: Outlier Distribution Across Time

In this example, we’ll extract _count_violation from the multi-series outlier table and display that. This allows you to display the distribution of outliers among various time-series.

error (_sourceCategory=mix* or _sourceCategory=con*)
| timeslice 1m
| count by _timeslice, _sourcecategory
| outlier _count by _sourcecategory  
| fields _timeslice,_sourcecategory, _count_violation
| transpose row _timeslice column _sourcecategory

When you select a line chart, this example will display something like the following:

multi_outlier_example1.png

Example 2: Outlier Ranking

This example query uses the _count_error (distance from the expected value for that timeslice) and the value of the standard deviation for the baseline, then determines how many standard deviation a data point is from its expected value.

This way, you can display outliers visually in terms of deviation from the expected value.

_view=customer_events_adhoc_search timezone=america*
| timeslice 1h | count  by _timeslice, timezone
| outlier _count by timezone | where _count_std >0
| if(_count_violation=1,abs(_count_error)/_count_std, 0) as deviation
| fields _timeslice, timezone, deviation | transpose row _timeslice column timezone

When you select a line chart, this example will display something like the following:

multi_outlier_example2.png

In the line chart, you can see which series is producing the most “deviating” outliers.

This approach effectively displays the severity of the outlier, because the spikes represent the magnitude (how many standard deviations the value is from the mean) in one time-series compared to another time-series.