Skip to main content
Sumo Logic

Metrics Monitors and Alerts

For your metrics query, you can set a monitor on a time series to alert when the metric has crossed a static threshold, and then send an email or Webhook notification. You can set a maximum of one critical alert, one warning alert, and one missing data alert for each monitor, each with one or more notification destinations.

A monitor can alert on a single time series, multiple time series, or a Join of two metrics queries. In other words, if your monitor query produces 10 different time series, you will get alerts on all of those time series individually. If your query applies to the Join condition only, you will receive a separate alert on that Joined value.

For example, if you create a monitor to alert on CPU across 10 hosts, you will receive a separate alert for each individual host that crosses the threshold you set.

You can create a metrics monitor and alert in either of the following ways:

  • On the Metrics page when you initially create your query.
  • On the Alerts > Metrics Monitors page (Manage > Metrics Monitors in the classic UI).

Create a Monitor and Alert

To create a monitor and alert on the Metrics Monitor page:

  1. Create a metrics query
  2. Click the alert icon.

  3. Under Set Rules, you can create a maximum one critical alert and one warning for this monitor.
    • Add threshold for Critical. Click and select the parameters for your critical alert.
      If the metric of any time series is:
      • Greater than/less than a number. Select an option. For example, 80.
      • For all of (or a percentage of) the time series.
      • The last. Select a time period between 5 minutes and an hour. For example, the last 5 minutes.
      • Send notification via. Select Email or a pre-configured Webhook Connection. For email, enter recipients email addresses, separated by commas
    • Add threshold for Warning. Click and select the parameters for your warning alert.
      Else if the metric of any time series is:
      • Greater than/less than a number. Select an option. For example, 60.
      • For all of (or a percentage of) the time series.
      • The last. Select a time period between 5 minutes and an hour. For example, the last 5 minutes.
      • Send notification via. Select Email or a pre-configured Webhook Connection. For email, enter recipients email addresses, separated by commas.
    • Add threshold for Missing Data. Click and select the parameters for your missing data alert.
      If data has not been seen for:
      • All time series / any time series. Choose whether to be notified when no data has been seen on the entire monitor ("all time series") or when no data has been seen for any single time series ("any time series").
      • The last. Select a time period between 5 minutes and 1 hour. For example, the last 5 minutes.
      • Send notification via. Select Email or a pre-configured Webhook Connection. For email, enter recipients email addresses, separated by commas.

        Notes about missing data alerts:
        • All time series: these notifications trigger when Sumo Logic has not received any new data points on the entire monitor for the time period specified. For example, if you are monitoring CPU across 5 hosts, you will only be notified when all 5 of the hosts stop reporting data.
        • Any time series: these notifications trigger when Sumo Logic has not received any new data points on a single time series in the monitor. For example, if you are monitoring CPU across 5 hosts, you will be notified when any 1 of the 5 hosts stops reporting data.
        • For infrequently reported metrics, specify a time period greater than the reporting frequency to avoid false alerts. For example, if you expect to receive a metric every 5 minutes, select a time period of 10 minutes or greater.
        • For metrics with ingest latency (e.g., AWS CloudWatch), the Missing Data time period applies to Sumo Logic's receipt time of the data point, not the timestamp of the data point. 
          Example: With a 15 minute missing data rule, if Sumo Logic receives a data point at 9:15am (and the timestamp of the data point is 9:00am), an alert will be triggered if no new data points are received by 9:30am. In this scenario, specify a time period greater than the reporting frequency of the metric.
    • To add additional notifications per threshold, select + while hovering over the rule. To delete the rule entirely, click the Delete icon.
      delete box.png
  4. Under Set Name and Description, add a name for your monitor. The description is optional.
  5. Click Save.

Your new Monitor is saved and displayed in the Manage Data > Alerts > Metrics Monitors page (Manage > Alerts > Metrics Monitors in the classic UI).

Manage Metrics Monitors

The Metrics Monitors page displays the following information about your metrics monitors:

  • Status. Status is a visual representation of the state of each time series in the monitor. It is shaded according to the proportion of Critical, Warning and OK time series within that monitor.
  • Critical. Number of time series in Critical condition.
  • Warning. Number of time series in Warning condition.
  • OK. Number of time series in OK condition.
  • Muted. Displays the mute icon if the monitor is muted.
  • Name. Name of the monitor given when it was created.
  • Description. Displays the monitor description, if available.
  • Status Since. The time that the monitor status most recently changed. For example, if the status changed from OK to Critical at 3:43:17 AM, the column reflects that time.

You can perform the following actions from this page:

  • Filter monitors using the Filter Monitors search field. You can filter on:
    • Name
    • Description
    • Created By
    • Query
  • From the Showing action menu, filter monitors by status.
  • To edit a monitor, hover over a monitor in the list and click Edit.
  • To create a new monitor, click +

You can also select a monitor from the list to display the information dialog on the right. From here you can:

  • View monitor details.
  • Click Edit to edit a monitor.
  • Click Mute to mute a monitor.
  • Click the Delete icon to delete a monitor.

Monitor permissions

By default, all users will have the Manage Monitors capability enabled in the Roles. Select Manage > Users and Roles > Roles (Manage > Roles in the classic UI).  This capability allows users to create, edit, mute and delete monitors. Without this permission, users can still view existing monitors on the Metrics Monitors page, but they will be unable to create, edit, mute, or delete any monitors. 

Metrics Monitor Alert Email

The following is an example of the metrics monitor email alert that is sent to recipients.

Metrics Monitor Limitations

These limitations apply to metrics monitors.

  • Monitors: Each account can create up to 50 monitors. To create more than 50 monitors, contact Support.
  • Notifications: Each monitor is restricted to sending 1,200 notifications over the last 24 hour period. If this limit is reached, the monitor will be muted for 6 hours and the monitor owner will be emailed. Notifications will be re-enabled after 6 hours or if the monitor is manually un-muted. 

Example Monitor

Monitors evaluate metrics in real-time and trigger alerts as soon as the rules are satisfied. The rules that you set are continuously evaluated with a rolling time window. The following example illustrates how a monitor would trigger alerts based on these rolling time windows.

At 3:00 PM, the Critical rule ("Greater than 60 at least once in the last 5 minutes") immediately triggers and sends an email notification since it passes 60. It will remain in this Critical state for at least 5 minutes, since the rolling window will capture that data point until 3:05 PM.

At 3:01 PM, the time series equals 85, so it will remain in the "Critical" state until at least 3:06 PM.

From 3:01-3:06 PM, the "Critical" rule is still satisfied and no further notifications are sent.

At 3:06 PM, the Critical rule is no longer satisfied since the last "Critical" data point occurred at 3:01 PM. The Monitor will then check the Warning rule ("Greater than 20 for all of the last 5 minutes"). Since the time series was above 20 from 3:01-3:06 PM, the time series state changes to "Warning" and a Slack notification is sent.