Lab 13 - Create Meaningful Alerts
In this lab, rather than alerting on simple counts, learn to create an alert that will notify you when your 404s increase at a higher rate than your overall traffic (measured by 200s). See this blog post for full explanation.
-
Search only for messages with status code 200 or 404
-
Count 200 messages as Successes and 404 messages as Fails
-
Sum Successes and Fails to get a count
-
Create a ratio of fails to successes
-
Use outlier operator to identify anomalies in the ratio
_sourceCategory=Labs/Apache/Access (status_code=200 or status_code=404)
| timeslice 1m
| if (status_code = "200", 1, 0) as successes
| if (status_code = "404", 1, 0) as fails
| sum(successes) as success_cnt, sum(fails) as fail_cnt by _timeslice
| (fail_cnt/(success_cnt+fail_cnt)) * 100 as failure_rate_pct
| outlier failure_rate_pct window=5, threshold=3, consecutive=1, direction=+
-
Adding this line allows you to filter out only outliers (where ration increase is higher than normal) . You can now create a Scheduled Search to Alert when this query has results.
| where failure_rate_pct_violation > 0