Skip to main content
Sumo Logic


Given a series of time-stamped numerical values, using the Outlier operator in a query can identify values in a sequence that seem unexpected, and would identify an alert or violation, for example, for a scheduled search.

To do this, the Outlier operator tracks the moving average and standard deviation of the value and detects or alerts when the difference between the value exceeds mean by some multiple of the standard deviation, for example, 3 standard deviations.


  • ... | timeslice 1m | max(x) as response_time by _timeslice | outlier response_time
  • ... | timeslice 1m | count(_sourcehost) by _timeslice, _sourcehost | outlier _count by _sourcehost

Make sure that your syntax includes only one key field: _timeslice. This is necessary in order to make the line chart option available.

The second syntax example uses an additional “group by” clause to find outliers for multiple values of 
_sourcehost. See the example below for details.

This syntax adds the following fields to the output:

  • response_time_error - This is the response_time - mean.
  • response_time_lower - This is the mean - threshold*standard deviation.
  • response_time_upper - This is the mean + threshold*standard deviation.
  • response_time_indicator - This is 1 for value outside of the lower and upper boundaries.
  • response_time_violation - This is 1 for hitting specified number of consecutive indicators.

There are defaults for all parameters, but you can configure parameters through keyword arguments, such as window length or threshold.

For example, this query would set the following parameters:

... | outlier response_time window=5,threshold=3,consecutive=2,direction=+-

  • window - Use the trailing 5 data points to calculate mean and sigma. The default is 10.
  • threshold - Calculate violation based on +/- 3 standard deviations. The default is 3.0.
  • consecutive - Only set response_time_violation to 1 if 2 or more consecutive data points are observed further than 3 standard deviations from the rolling average. The default is 1.
  • direction - Uses +-, +, or -, for which direction triggers violations:
    • Use +- for positive or negative deviations. This is the default.
    • Use + for only positive deviations (more than expected).
    • Use - for only negative deviations (less than expected).


  • The Outlier operator must appear after a group by aggregator, such as count, min, max, or sum.
  • The original target field must be numeric.


  • Because the most recent time bucket in a query may have incomplete data, it is ignored by outlier. Consequently, if an alert is set to trigger on <field_name>_violation changing to 1, this alert will trigger one timeslice later.


IIS logs

Run the following query to find outlier values in IIS logs over the last 6 hours.

| parse regex "\d+-\d+-\d+ \d+:\d+:\d+ (?<server_ip>\S+) (?<method>\S+) (?<cs_uri_stem>/\S+?) \S+ \d+ (?<user>\S+) (?<client_ip>[\.\d]+) "
| parse regex "\d+ \d+ \d+ (?<response_time>\d+)$"
| timeslice 15m 
| max(response_time) as response_time by _timeslice
| outlier response_time window=5,threshold=3,consecutive=2,direction=+-


The outlier values are represented by the pink triangles in the resulting chart.

Apache logs - Server Errors Over Time

Run the following query to find outlier values in Apache logs over the last 3 hours.

| parse "HTTP/1.1\" * " as status_code
| where status_code matches "5*"
| timeslice 5m 
| count(status_code) as status_code by _timeslice
| outlier status_code window=5,threshold=3,consecutive=1,direction=+-


The outlier values are represented by the pink triangles in the resulting chart.

Use an additional “group by” clause to find outliers for multiple values of _sourceHost.

You can also run a query like this:

| timeslice 1m
| count by _timeslice, _sourceHost
| outlier _count by _sourceHost

This way, you can run outlier analysis separately for each value of _sourceHost, as shown.

Group by

This example will only produce an aggregation table, not a chart, but the indicator and violation fields will correctly reflect each _sourceHost processing.

Alert on an outlier

This query  counts the number of errors over time and sends an alert when an outlier is detected. When an outlier is detected, the value of <field_name>_violation will be set to 1. In the example below, the <field_name> is _count. By creating a Real Time Alert and sending a notification if greater than 0 results are found, you can alert on an outlier.

| timeslice by 15m
| count as today by _timeslice
| compare timeshift -1d as vs_yesterday // create a delta field that represents the difference between historical and current data
| (today - today_vs_yesterday) as delta // use an Outlier to statistically monitor spikes or dips in the delta
| outlier delta

Multidimensional Outlier Detection

The Outlier operator supports multidimensional or multi-time series detection. Multidimensional outlier detection is useful when you want to monitor the behavior of each user, server, application feature, or other single “entity”, rather than some aggregation across all entities.

For example, you could detect failed logins by user. To do so, you would want to understand whether any user account, individually, has experienced a strange amount of failed logins, not whether we’ve seen some spike in the average or total amount of failed logins across all users. The latter may be useful, but with hundreds or thousands of users (entities), a spike in failed logins may get lost in the noise of a “normal” amount of total failed logins, and you could miss a spike in failed logins for one specific user.

Other examples include:

  • Detecting anomalies while tracking page faults, disk operation, or CPU utilization for all the nodes in a cluster simultaneously.
  • Monitoring the performance of every workstation simultaneously, without the need to build an outlier report for each one.
  • Monitoring failed image uploads for every user of an application (not total failed uploads across all users).

If you have used the outlier operator, it is easy to create a multidimensional outlier operation. Just add by <dimension> to the end of the query.

For example, the following example query will determine many time series, one per each _sourceHost:

| timeslice 1m
| count by _timeslice,_sourceHost
| outlier _count by _sourceHost

You can display the raw results of a multidimensional time series in a table chart, but currently other chart options are not available.

In the following table chart, a value of 1 in the _count_violation column indicates that the data point corresponding to that timeslice is an outlier.


Alerts Based on Multidimensional Outlier Results

To create an alert based on the multi-series outlier table above, extract _count_violation.

This way, you won’t need to build an alert for each series of data (each _sourcehost in the previous example), and you can automatically monitor a dynamic series for deviating behavior.

The following example query allows you to monitor when application users experience failures. It monitors all user accounts by unique user ID, and applies outlier to the amount of “fail” messages that occur across every user account:

| parse "\"UserId\":\"*\"" as user_id
| parse "\"ResultStatus\":\"*\"" as result
| where result = "fail"
| timeslice 30m
| count by user_id, _timeslice
| outlier _count by user_id
| fields _timeslice,user_id, _count_violation
| transpose row _timeslice column user_id

Once you have run the query, you can click Save As to create a Scheduled Search and configure it to send an alert when any user account experiences an unusual amount of failures, or other event you want to monitor each series of data for.

To visualize your results, on the Search page, you can create a column chart, then change the stacking property to normal to display alerts by unique user_id (the multidimensional aspect).


Chart Multidimensional Outlier Results

This section provides two examples of how to display multidimensional outlier results in charts.

Example 1: Outlier Distribution Across Time

In this example, we’ll extract _count_violation from the multi-series outlier table and display that. This allows you to display the distribution of outliers among various time-series.

error (_sourceCategory=Apache* or _sourceCategory=IIS*)
| timeslice 1m
| count by _timeslice, _sourceCategory
| outlier _count by _sourceCategory
| fields _timeslice,_sourceCategory, _count_violation
| transpose row _timeslice column _sourceCategory

When you select a line chart, this example will display something like the following:

Outlier Distribution

Example 2: Outlier Ranking

This example query uses the _count_error (distance from the expected value for that timeslice) and the value of the standard deviation for the baseline, then determines how many standard deviation a data point is from its expected value.

This way, you can display outliers visually in terms of deviation from the expected value.

| timeslice 30m
| count  by _timeslice, status_code
| outlier _count by status_code
| where _count_std >0
| if(_count_violation=1,abs(_count_error)/_count_std, 0) as deviation
| fields _timeslice, status_code, deviation
| transpose row _timeslice column status_code

When you select a line chart, this example will display something like the following:

Outlier Ranking

In the line chart, you can see which series is producing the most “deviating” outliers.

This approach effectively displays the severity of the outlier, because the spikes represent the magnitude (how many standard deviations the value is from the mean) in one time-series compared to another time-series.