Skip to main content
Sumo Logic

Lab 8 - Relating metrics to logs by using Key Value pairs and advance comparison operators

This lab shows an example of how to use business and operational from a fictitious travel application's data to populate metrics.
You can also use metrics as indicators for troubleshooting by connecting them with related logs. These metrics were generated using metric rules, which parses metrics from logs. We will also look at a couple of metric operators for comparison. In this lab we will connect our logs with the metrics and then also explore advance metric operators timeshift, delta, and rate of change.

NOTE: We will be using Advanced Mode, which is shown in  Lab 6


Key Value Pair usage

In this lab, we will look at metrics from our TravelLogic Demo. These metrics were generated using metric rules, which parses metrics from logs. We can go look at these metric rules click Manage Data>Metrics>Metrics Rules.

Let's begin by testing the Graphite Metrics Rule we just reviewed during Training.

1. Click +New and Click Metrics. Check to see if you are in advanced query mode by clicking the More action button and ensure that Basic Query is shown then click More Action button and select Advanced Query Mode to ensure you are in Advanced Query Mode.

  1. First, let's query a metric using its raw Graphite name. In a metrics query, enter the following:

    travel.training.counters.travel-checkout*.bookings.success.count

  2.  On a second query, run the following to query the exact same metric. However, this time, we are taking advantage of the key-value tags created by the Metrics Rule

    type=bookings metric=success.count
    Remember to select Advanced Mode on the second query when you create it.

    clipboard_e018f9e27b782a591d4cd2fabe8984fda.png

  3. You will notice you are graphing the exact same metric twice.
    clipboard_ec886404884666309d8467bad7d9165e0.png
    You can check this by clicking on the items listed in the lower left corner to remove their traces.

Now let's learn how to use some additional operators. In the next steps we'll plot data from an online travel website to determine successful versus unsuccessful bookings.

Travel Logic Demo Usage  
  1. In a new Metrics tab, add a query to search for all your successful bookings. And ensure you are in the Advance Query mode, if not refer to step 1.  

    type=bookings metric=success.count

  2. Select the Last 60 minutes from the time drop down selector, otherwise you may not see any data results for the next step.

  3. In a second query underneath the first one, search for all failed bookings:

    type=bookings metric=fail.count

    clipboard_ea04b8233b3660ea370b11237a13ed272.png

  4. Click on the Chart tab, your results should look like this:
    clipboard_e73878b07e2050921ab6b518706c75e66.png

  5. You can explore the Display Options tab, which is located on the far right edge.  This allows you to change the Display, Display overrides, Axes, Legend and JSON values that make up this chart. For more information on Metric Charts

clipboard_e4cf233f6626e6859e81d4a8987fabe84.png

  1. Back in the query tab, toggle off the success.count by clicking on the metric=success.count
    _sourceHost=100.98.82.59,  on the bottom left corner. NOTE: The source host IP address may be different.

    clipboard_ef55ec8a4fc772ae7a29f8f94ff4ab0a5.png

Once you turn off the success.count, you should now be only seeing the failure counts.  The result should look something like:
clipboard_ed46a1d3329bce4fdc0010edf514f881c.png

NOTE: If desired you can click on the 3 grey dots in the top right corner to view the query info, refresh the query, or add this chart to a Dashboard.

Lastly, let's learn how to correlate metrics to relevant logs to identify the root cause.

Metrics allow you to identify symptoms in your environment (WHAT is going on?). Relevant logs help you identify the cause (WHY is this happening?). Let's again look for successful and failed bookings, but this time, let's take a look at the relevant logs to identify why we have failed bookings.

  1. Identify counts of successful booking and failed bookings for your travel website.

    clipboard_e6c84de57d9b717d829c4d5e34dc9a79f.png

  2. Click on the clipboard_eb24f2aa4676755e140827fd8652c451d.png at the end of the second query (#B) statement to add another query line, as shown below:

    clipboard_e1fc8a79d2aec420f15ab858fde48e1f5.png

  3. Switch the Metrics label to Logs as shown below:
    clipboard_e6d6a84a3132e2f5a87f383c021436400.png

  4. To overlay your metrics with the relevant logs, enter this log query as depicted below:

    _sourceCategory=*training/travel/checkout* error | timeslice 1h| count by _timeslice

  5. Change the time to -24h as shown below
    clipboard_e412a0837bbee6bd2362c77c115be602b.png

    You will notice the the count line is shown, but due to the scale the other lines are suppressed at the bottom on the chart.  Lets fix this

  6. Select the Display overrides tab on the far right side and select the #C as the Query or series name.  Under Style select AxisYType and set it to Right Y-Axis

    clipboard_e0a33ec44433245cddc61ae60e11f4035.png
    Your result should now look like this.  Ensure that the Right-Axis is checked on the Axis tab. 

    clipboard_ea4f0fb41ab4deda3b00816027a37fed3.png

So now you can correlate metric data with log data. We are seeing that our fail.count are up when we get errors in our logs. 

Timeshift Operator


Let's now compare KPIs at different time periods using the timeshift operator. The timeshift operator shifts the time series of your query. It's very useful to compare across multiple time periods.

  1. In a new Metrics tab, add a query to search for all your mean latency for the last 60 minutes.

    metric=latency.mean

  2. Compare that with your latency from 1 day ago.

    metric=latency.mean | timeshift 1d


clipboard_eb436389202807acd9dd94e70b9d12c29.png

19. Select Chart, the results are shown below: 

clipboard_e353140a030b6b30b89a202e3886eabee.png

This will allow us to see metrics from now and from other timeframes, so we can compare and see the differences. 

Similar to logs, metrics have  the usual operators (min, max, sum, count, avg). However, oftentimes, what you want to measure is change.

Rate of Change Operator


In this next exercise, we will identify rate of change to get early warning on impending issues.

  1. In a new Metrics tab, add a query to search for a count of packets received in the last 60 minutes.

    type=packets_received metric=count

  2. To find the difference between one data point and the next, edit your query to show the delta.

    type=packets_received metric=count | delta

  3. However, to find the rate of change, in this case, packets received per second, edit your query to

    type=packets_received metric=count | rate

With this last query, you're able to determine if the rate at which packets are being received is increasing gradually or spiking quickly. Identifying an outlier on a rate of change is a better indicator of an impending problem.

clipboard_e08c72e24507cfc34e4df219fd2cb7557.png

Congratulations, you have now seen how to compare log data with metric data, as well as using the timeshift, rate, delta and other operators which allow you to see and understand the changes in your data.