Skip to main content
Sumo Logic

SLO Dashboards

Learn more about viewing SLO dashboards, where you can gain insight into the health and status of services and systems based on your SLI and SLO configurations.

SLO Dashboards provide an active view into the health and status of services and systems based on your SLI and SLO configurations.

Viewing SLO Dashboards

After setting up your SLO monitors, you'll start receiving notifications, which you can configure to be sent to you by email, Slack channel, and other options. To begin reviewing your data for this alert in Sumo, you would click View SLO Dashboard.

slo-email-alert.png

The dashboard will load in Sumo Logic with that time period in view with vital information to begin investigating the service. For example, selecting the option for this error opens the following board. Here, we can review the current SLI and target, the remaining error budget, compliance settings, and review trending issues caught by the SLO.

slo-dashboard-alert.png

Dashboard Metrics

Each dashboard contains the following information:

slo-dash-annotated.png

A

General information, including the SLO name/description and SLI information:

  • Signal Type: Latency, Error, Availability, Throughput, Other

  • Evaluation Type: Windows-based, Request-based

B

Panels showing:

  • Current SLI: Calculated currently tracked SLI using the configured SLI, SLO, and queries

  • Target: Configured SLO target

  • Error Budget Remaining (relative and absolute): The calculated remaining budget from the configured maximum. If this value is negative (0% (-2h), for example) that means you've gone over your error budget limit (in this case, 2 hours more downtime than what was allowed).

  • Compliance: The configured compliance as Rolling or Calendar and selected window

C

Error Budget Burndown: Chart tracking amount of error budget and the events that consumed it within the compliance period. Hover over any timeline to receive more information.

D

Event History: Tracked events that occurred during the compliance period as successful (good) and unsuccessful (bad) events. Hover over the chart to learn more about the total number of good or bad events, timeframe, and more.

E

Compliance History/Historical Data: Displays SLI and SLO for up to 30 compliance periods.

Setting Granularity for SLO Time Ranges

To modify the time range, select and drag across dates to zoom in further. This can be useful if you want to zoom in for granular details, especially for charts with larger compliance periods.

slo-zoom.gif

You can also filter by compliance period to view your past activity and plan ahead:

slo-zoom.gif

Refreshing Your Data

You can also refresh individual charts or the entire page by clicking the Reset button.

slo-reset.png

To revise or review your SLO parameters, click Go to SLO Definition to open the specific SLO.

slo-def.png

SLO as Code

You can use the Sumo Logic Terraform provider to automate SLO folder and SLO creation. This can be useful for organizations that want to templatize SLOs, standardize SLO configuration, monitors and dashboards and automate SLO-related workflows.  Use the Monitor Terraform provider to create monitors associated with SLOs. 

SLO as Log Messages

Sumo Logic continuously computes data for your SLO behind the scenes. This data, which powers your SLO dashboard, is also made available as log messages that conform to the following schema:

  • Time: timestamp
  • sloId: Id of the SLO, as displayed in the SLO dashboard URL
  • goodCount: count of good requests, for request-based, and good windows for windows-based SLOs, based on SLO query definition
  • totalCount: count of eligible requests for request-based, and eligible windows for windows-based SLOs, based SLO query definition
  • sloVersion: version of SLO definition

View the schema by executing the following query:

_view=sumologic_slo_output sloId = "000000000000008A" (replace with a valid SLO Id)

These log messages will be delayed by 1 hour, as the system ensures consistency to account for ingest delay of source telemetry.

Use Case Recipes

A developer responsible for a microservice wants to create dashboard panels that depict the trend of SLI and error budget in a proprietary microservice. The recipes below show how to recreate the panels in the pre-built SLO dashboard for various combinations of evaluation types (windows or request-based) and calendar versus rolling compliance periods. The resulting panels can be added to any dashboard.

The following query computes hourly SLI and error budget trend for a window-based SLO:

_view=sumologic_slo_output sloId="<your SLO Id>"
| dedup by _messagetime
| timeslice 1h  // data granularity for this panel
| sum(goodCount) as goodWindows, sum(totalCount) as totalWindows by _timeslice
| totalWindows - goodWindows as badWindows
| sort by _timeslice asc
| accum(totalWindows) as totalWindowsSoFar
| accum(badWindows) as badWindowsSoFar
| 7 * 24 * 60 as totalWindowsInCompliance // total number of windows in the SLO. Each window is 1m in size and compliance period is 7d. Replace 7 by number of days in compliance period.
| 0.999 as slo // replace this by target/100 here
| 100 - 100 * badWindowsSoFar / totalWindowsInCompliance as sli
| 100* (sli - slo)/(100-slo) as error_budget_pct_remaining
| slo*100 as slo
| fields _timeslice, sli, error_budget_pct_remaining, slo

This query can recreate a panel similar to the one below without the Trend Forecast:

undefined

Adjust the chart by changing the “maximum value” to 100 and the “minimum value” to the lowest value you want to track using visualization settings under the line chart.

Scenario 2

The following query computes hourly SLI and error budget trend for a request-based SLO:

_view=sumologic_slo_output sloId=”<your SLO Id>"
| dedup by _messagetime
| timeslice 1h  // data granularity for this panel
| sum(goodCount) as goodRequests, sum(totalCount) as totalRequests by _timeslice
| totalRequests - goodRequests as badRequests
| sort by _timeslice asc
| accum(totalRequests) as totalRequestsSoFar
| accum(badRequests) as badRequestsSoFar
| 0.99 as slo // replace this by target/100 here
| 100 - 100 * badRequestsSoFar / totalRequestsSoFar as sli
| 100* (sli - slo)/(100-slo) as error_budget_pct_remaining
| slo*100 as slo
| fields _timeslice, sli, error_budget_pct_remaining, slo

This query can recreate a panel similar to the one below without the Trend Forecast:

undefined

Adjust the chart by changing the “maximum value” to 100 and the “minimum value” to lowest value you want to track using visualization settings under the line chart.

Scenario 3

The following query computes SLI, Error Budget Remaining and Error Budget Remaining Minutes for a 30-day compliance and window-based SLO:

_view=sumologic_slo_output sloId=”<your SLO Id>"
| dedup by _messagetime
| sum(goodCount) as goodCount, sum(totalCount) as totalCount
| goodCount*100/totalCount as sli
| 99.9 as target // replace the target here
| (sli - target)*100/(100 - target) as errorBudgetRemainingPercent
| errorBudgetRemainingPercent*30*86400*(100 - target)/100/100/60 as errorBudgetRemainingMins  // replace 30 by day count in your compliance period. Note that this is applicable for window based SLIs only
| goodCount - (totalCount*target/100) as errorBudgetRemainingRequests // applicable for request-based SLIs only
| if (errorBudgetRemainingPercent < 0, 0, errorBudgetRemainingPercent) as errorBudgetRemainingPercent
| fields -goodCount, totalCount

This query can recreate panels similar to the one below:

undefined

Scenario 4

The following query computes event history for an SLO:

_view=sumologic_slo_output sloId=”<your SLO Id>"
| dedup by _messagetime
| timeslice 1h // granularity of data
| sum(goodCount) as goodCount, sum(totalCount) as totalCount by _timeslice
| goodCount*100/totalCount as successfulWindows // replace successfulWindows by successfulRequests and unsuccessfulWindows by unsuccessfulRequests (below) for request based SLO
| 100-successfulWindows as unsuccessfulWindows
| fields -goodCount, totalCount

This query can recreate panels similar to the one below:

undefined

Adjust the chart by changing the “maximum value” to 100 and the “minimum value” to lowest value you want to track using visualization settings under line chart.

Scenario 5

The following query computes SLI trend over multiple 7d calendar compliance periods:

// REQUEST-BASED, CALENDAR COMPLIANCE
// Coffee Prep Latency should not exceed 1 second for 95% of requests in calendar 7d
// This query works for both request based and window based SLOs

_view=sumologic_slo_output sloId=”<your SLO Id>"
| dedup by _messagetime
| timeslice 7d // put compliance period here
| sum(goodCount) as goodRequests, sum(totalCount) as totalRequests by _timeslice
| totalRequests - goodRequests as badRequests
| sort by _timeslice asc
| accum(totalRequests) as cumTotalRequests
| accum(badRequests) as cumBadRequests
| 95 as slo // replace your target here
| 100 - 100 * cumBadRequests / cumTotalRequests as sli
| 100* (sli - slo)/(100-slo) as error_budget_pct_remaining
| fields _timeslice, sli, error_budget_pct_remaining, slo

This query replicates the following panel:
undefined

It is recommended to choose “change axis” -> set “maximum value” as 100.

To use this correctly, ensure the following:

  • To render one compliance period, make sure that the time range of dashboard matches the compliance period 
  • To build a dashboard of compliance history over multiple compliance periods, change the timeslice to match the compliance period and set the dashboard time range to multiple compliance periods

Dashboards built using such queries will show slightly different numbers from the pre-built dashboards due to differences in the storage backend for these two approaches. Use the pre-built dashboard if SLO precision is important.