This page is a high-level overview of metrics in Sumo and the terminology we use to talk about metrics. It provides a brief description of Sumo metric features.
What is a metric?
A metric is a set of data points that measure the value of something. For example, the weather service collects measurements of temperature over time. Perhaps you weigh yourself once a week, or check the height of your child every 3 months.
Sumo customers use Sumo to collect metrics that measure the availability, usage, and performance of their application and computing resource, sometimes many times per minute.
In Sumo, we use the term time series to refer a set of timestamped values of a specific measurement. More generally, we often refer to metrics as time series data.
How do metrics get into Sumo?
Sumo administrators set up metric sources to receive metrics. A metric source understands a particular type of metric. For example, Sumo’s host metrics source knows how to ingest system metrics (network, CPU, file system, and so on) from Linux and Windows. Our Cloudwatch source knows how to ingest metrics from AWS.
Metrics flow into Sumo as individual data points. The frequency varies.
How does Sumo store metrics?
Sumo stores metrics as raw data points and also in summarized form, in rollup tables.
Raw data points are individual data points. We sometimes refer to the raw data points we store as the baseline table.
While the baseline table contains raw data, rollup tables contain aggregated metric values. Sumo has two sets of rollup tables: one with the metric values for each time series aggregated by minute and one by hour. Sumo performs five types of aggregation on raw data points: avg, max, min, count, and sum. (The average value of the data points, the maximum value of the data points, the minimum value, the number of data points, and the sum of all the values.)
The process of calculating aggregated values for the individual data points in a time bucket is called quantization. The quantization process is described in detail on Metric Quantization.
Sumo supports the Graphite, Carbon 2.0, and Prometheus metric formats. For more information, see Metric Formats.
Currently available metric sources are:
HTTP Logs and Metrics source. You can use an HTTP source on hosted collector to collect Graphite, Carbon 2.0, and Prometheus metrics from environments where it is impractical to deploy an installed collector.
Host Metrics source. You can use a host metrics source on an installed collector to collect CPU, memory, TCP, networking, and disk metrics on Linux and Windows machines.
Amazon CloudWatch Source for Metrics. You can use a CloudWatch source on a hosted collector to collect metrics for a variety of AWS resources.
Streaming Metrics Source. You can use Sumo’s streaming metrics source with an installed collector to collect metrics over TCP or UDP in Graphite, Carbon 2.0, or Prometheus format.
AWS Metadata (Tag) Source for Metrics. This is a special type of source, in that it doesn’t collect metrics, but instead collects tags from EC2 instances running on AWS. Sumo applies the collected tags to metrics ingested by two Sumo source types: the streaming metric source and the host metrics source. Tagging metrics with the EC2 tags allows you query metrics using EC2 tags.
Docker Stats source. You can use the Docker Stats source on an installed collector to collect Docker container metrics, such as CPU usage, Memory usage, Network IO, and Disk IO.
Metric rules editor
You can can use Sumo’s metric rules editor to tag metrics with key-value pairs derived from the metrics. Then, you can use those key-value pairs in metric queries. This is especially useful if you ingest Graphite-formatted metrics. For example, given a Graphite metric like this one:
cluster-1.node-1.cpu-1.cpu-idle 97.29 1460061337
You can use the rules editor to tag the metric with these key-value pairs:
cluster = cluster-1 node = node-1 cpu = cpu-1 measurement = cpu-idle
Sumo creates a key-value pair for each dot-separated segment of the metric path. This makes it a lot easier to query Graphite metrics.
To query metrics in Sumo, you open a metrics query tab. The metric tab below shows the results of two queries. At the top of the tab, you can see the quantization level (size of the time buckets over which the raw data points are aggregated) and the time range for the query. You can mouse over a time series to see the metric value for a time bucket.
What’s in a metric query?
Regardless of the metric format, metric queries in Sumo take the same form. A metric query is made up of selectors and operators.
You use selectors to identify what metrics you want to return. A metric query must have at least one selector. A selector can be one or more key-value pairs or a simple string. For example
cluster=cluster-1 node=node-1 cpu=cpu-1 metric=cpu_idle
You can use any of the standard Sumo metadata fields: _source, _sourceCategory, _sourceHost, _sourceId, and _sourceName) in selectors.
You can use metric operators of various sorts to process the metric data that matches your selectors. For example, you can use aggregation operators like avg, max, min, sum, and count; statistical operators like pct and rate; and select time series that have the highest or lowest values of a particular metric using topk and bottomk.
For more information, see Metric Operators.
Here are some simple sample queries:
||Returns all CPU_Idle time series in the currently selected time range.|
||Returns cpu-system time series from node “search-1” in cluster “search” whose average over the currently selected time range is greater than 80%.|
||Returns the average of all of the cpu_system time series in deployment “prod”.|
Creating a query
You can simply type in a query on a metrics query tab. Or, you can let Sumo help you build a query. When you click in a query entry field on a metrics query tab, Sumo prompts you with a list of tags, as shown below.
After you select a tag, Sumo prompts with a list of values for that tag, as shown below
Joining metric queries
You can perform basic math operations (+, -, *, /) on two or more metrics queries, and use an additional query to apply an operation to the results of the other queries. For example, in the metric query tab below, the first two queries return the incoming and outgoing network packets per second across all interfaces. The third query returns the difference between the incoming rate and the outgoing rate. Note that in the visualization, display of the first two queries is toggled off, so only the correlated results are shown.
Understanding query results
When you mouse over a data point the value of the metric is displayed.
Note that each point that is charted is an aggregation of the values of the individual metric data points in a particular time bucket. The time bucket duration is the quantization value shown at the top of the metric query tab, 5 seconds in the example above.
For the sample query above, Sumo automatically determined an appropriate quantization level (based on the age of the data and the volume of data points), and aggregates the data points using the average function.
Changing quantization settings
When you run a metric query, you can optionally use the
quantize operator to specify a desired quantization interval and the aggregation function you want Sumo to use aggregate metric data points.
For example, if you’d like to aggregate data points over 30 second intervals, using the min aggregation function, you could re-write the first query shown above like this:
_contentType=HostMetrics metric=CPU_Idle | quantize to 30s using min
For more information, see Choosing a Quantization Interval for a Chart.
You can configure a metric monitor for a metric query so that Sumo will send an alert notification, when the query results match the rules that you define for the monitor. There are two notification types: email and WebHook.
You can set up several types of monitors: Critical, Warning, and Missing Data. Critical and Warning result in value-based alerts—they are triggered when a metric in a time series varies from a threshold for a specified period of time. A Missing Data monitor triggers an alert when no data is received for a specified period of time.
Sumo provides a UI that helps you define a metric monitor, as shown below.
For more information, see Monitors and Alerts.
With Sumo metrics, an account has Data Points per Minute (DPM) limit, which is shown on the Account Page in the Sumo web app. To allow for spikes in metrics ingestion, Sumo applies a multiplier to your DPM limit to allow you send metrics at a higher rate, referred as your DPM burst limit, before Sumo starts to throttle your sources. The multiplier depends on your daily DPM account limit. When you exceed your DPM burst limit, Sumo throttles your metric sources—your ingestion will be slowed down until the rate of ingestion returns is within the allowable contracted limits. For more information, see Metric Throttling.
Metrics ingest data volume index
An account’s metrics ingestion volume in data points is tracked in Sumo’s metrics ingest data volume index. You can query the index to see the total data points that were ingested over a time range, by collector, source, source name, source category, and source host. For more information, see Metrics Ingest Data Volume Index and Metric Volume Queries.