Skip to main content
Sumo Logic

Metrics Monitors and Alerts

For your metrics query, you can set a monitor on a time series to alert when the metric has crossed a static threshold, and then send an email or Webhook notification. You can set a maximum of one critical alert, one warning alert, and one missing data alert for each monitor, each with one or more notification destinations.

A monitor can alert on a single time series, multiple time series, or a join of two metrics queries. In other words, if your monitor query produces 10 different time series, you will get alerts on all of those time series individually. If your query applies to the join condition only, you will receive a separate alert on that joined value.

For example, if you create a monitor to alert on CPU across 10 hosts, you will receive a separate alert for each individual host that crosses the threshold you set.

You can create a metrics monitor and alert in either of the following ways:

  • On the metric query page when you initially create your query.
  • On the Manage Data > Alerts > Metrics Monitors page.

Limitations and tips

  • The timeshift operator is not supported in metric monitor queries.
  • Depending on how your metrics are collected, some queries may produce a large number of time series and may require aggregation for more accurate alerting. For example, metric=CPU_User may result in eight different time series on an 8-core machine (leading to eight different alerts from a single host). In this example, you would need to aggregate by node by adding | avg by _sourceHost to your query.
  • Each account can create up to 1,500 monitors.
  • Each monitor is restricted to sending 1,200 notifications over the last 24 hour period. If this limit is reached, the monitor will be muted for 6 hours and the monitor owner will be emailed. Notifications will be re-enabled after six hours or if the monitor is manually un-muted. 
  • Consider the following when selecting a time window:
    • For infrequently reported metrics, specify a time period greater than the reporting frequency to avoid false alerts. For example, if you expect to receive a metric every 5 minutes, select a time period of 10 minutes or greater.
    • For metrics with ingest latency, such as AWS CloudWatch metrics, the time window applies to Sumo Logic's receipt time of the data point, not the timestamp of the data point. For example, given a 15 minute time window, if Sumo Logic receives a data point at 9:15 am (and the timestamp of the data point is 9:00 am), an alert will be triggered if no new data points are received by 9:30 am. In this scenario, specify a time period greater than the reporting frequency of the metric.

Create a monitor and alert

To create a monitor and alert on the Metrics Monitor page

  1. Create a metrics query
  2. Click the alert icon.
  3. The Set Rules pane appears.
    set-rules.png
  4. To create a critical alert:
    1. Click Add threshold for Critical.
    2. The critical threshold dialog appears.
      critical-threshold.png
    3. In the field labeled 1, Select greater than or less than.
    4. In the field labeled 2, enter a threshold metric value.
    5. In the field labeled 3, select one of the following:
      • for. Choose this if you want to trigger the alert if the metric value crosses the threshold value and stays beyond the threshold continuously for a period of time.
      • at least once in. Choose this if you want to trigger the alert if the metric crosses the threshold value at least once during a period of time.
    6. In the field labeled 4, select a time window.
    7. For Send Notification via:
      • If you want to send an email notification leave Email selected and enter one or more email addresses, separated by commas.
      • If you want to send a  Webhook notification, click the pull-down and select a Webhook connection from the list. 
    8. To add an additional notification to be sent when the critical alert is triggered, select + while hovering over the rule.
  5. To create a warning alert:
    1. Click Add threshold for Warning.
    2. The warning threshold dialog appears.
      warning-threshold.png
    3. In the field labeled 1, Select greater than or less than.
    4. In the field labeled 2, enter a threshold metric value.
    5. In the field labeled 3, select one of the following:
      • for. Choose this if you want to trigger the alert if the metric value crosses the threshold value and stays beyond the threshold continuously for a period of time.
      • at least once in. Choose this if you want to trigger the alert if the metric crosses the threshold value at least once during a period of time.
    6. In the field labeled 4, Select a time window.
    7. For Send Notification via:
      • If you want to send an email notification leave Email selected and enter one or more email addresses, separated by commas.
      • If you want to send a  Webhook notification, click the pull-down and select a Webhook connection from the list. 
    8. To add an additional notification to be sent when the warning alert is triggered, select + while hovering over the rule.
  6. To create a missing data alert:
    1. Click Add threshold for Missing Data.
    2. The missing data threshold dialog appears.
      missing-data-alert.png
    3. In the field labeled 1, select one of: 
      • all time series. Notifications trigger when Sumo Logic has not received any new data points on the entire monitor for the time period specified. For example, if you are monitoring CPU across 5 hosts, you will only be notified when all 5 of the hosts stop reporting data.
      • any time series. Notifications trigger when Sumo Logic has not received any new data points on a single time series in the monitor. For example, if you are monitoring CPU across 5 hosts, you will be notified when any 1 of the 5 hosts stops reporting data.
    4. In the field labeled 2, select a time window.
    5. For Send Notification via:
      • If you want to send an email notification leave Email selected and enter one or more email addresses, separated by commas.
      • If you want to send a  Webhook notification, click the pull-down and select a Webhook connection from the list. 
    6. To add an additional notification to be sent when the missing data alert is triggered, select + while hovering over the rule.
  7. Under Set Name and Description, add a name for your monitor. The description is optional.
  8. Click Save.

Your new Monitor is saved and displayed in the Manage Data > Alerts > Metrics Monitors page.

View monitor status 

Search Monitors.png

The Manage Data > Alerts > Metrics Monitors page displays the following information about your metrics monitors:

  • Status. Status is a visual representation of the state of each time series in the monitor. It is shaded according to the proportion of Critical, Warning and OK time series within that monitor.
  • Critical. Number of time series in Critical condition.
  • Warning. Number of time series in Warning condition.
  • OK. Number of time series in OK condition.
  • Muted. Displays the mute icon if the monitor is muted.
  • Missing. The number of time series that Sumo previously tracked that have disappeared. For example, if Sumo had been monitoring CPU_User for 100 servers, and two of them stop sending metrics, the value of Missing would be 2. 
  • Name. Name of the monitor given when it was created.
  • Created By. The user that created the monitor. 
  • Status Since. The time that the monitor status most recently changed. For example, if the status changed from OK to Critical at 3:43:17 AM, the column reflects that time.

You can perform the following actions from this page:

  • Search monitors using the Search Monitors field. You can search by:
    • Name
    • Description
    • Created By
    • Query
  • From the Showing action menu, search monitors by status.
  • To edit a monitor, hover over a monitor in the list and click Edit.
  • To create a new monitor, click +

You can also select a monitor from the list to display the information dialog on the right. From here you can:

  • View monitor details.
  • Click Edit to edit a monitor.
  • Click Mute to mute a monitor. You can mute a monitor for a specific period of time, or indefinitely. Muting a monitor turns off notifications. The monitor rules will still be evaluated, and monitor status will still appear in the Metric Monitors page.
  • Click the Delete icon to delete a monitor.

Role capability for managing monitors

A user must be an admin user, or have the Manage Monitors role capability in order to manage monitors. This capability allows users to create, edit, mute and delete monitors. Without this capability, users can view existing monitors on the Metrics Monitors page, but cannot create, edit, mute, or delete monitors. 

Metrics monitor alert email

The following is an example of the metrics monitor email alert that is sent to recipients.

Example monitor

Monitors evaluate metrics in real-time and trigger alerts as soon as the rules are satisfied. The rules that you set are continuously evaluated with a rolling time window. The following example illustrates how a monitor would trigger alerts based on these rolling time windows.

At 3:00 PM, the Critical rule ("Greater than 60 at least once in the last 5 minutes") immediately triggers and sends an email notification since it passes 60. It will remain in this Critical state for at least 5 minutes, since the rolling window will capture that data point until 3:05 PM.

At 3:01 PM, the time series equals 85, so it will remain in the "Critical" state until at least 3:06 PM.

From 3:01-3:06 PM, the "Critical" rule is still satisfied and no further notifications are sent.

At 3:06 PM, the Critical rule is no longer satisfied since the last "Critical" data point occurred at 3:01 PM. The Monitor will then check the Warning rule ("Greater than 20 for all of the last 5 minutes"). Since the time series was above 20 from 3:01-3:06 PM, the time series state changes to "Warning" and a Slack notification is sent.