Skip to main content
Sumo Logic

count, count_distinct, and count_frequent

Aggregating (group-by) functions are used in conjunction with the group operator and a field name.

Only the word by is required to represent the group operator. The count function is also an operator in its own right, and therefore can be used with or without the word by.

count

Counts total number of messages that match the keyword search within the time range analyzed.

Syntax:

  • count
  • count by

Rules:

  • Creates field named _count

Examples:

... | count as countOfPort group by srcAddress, tgtAddress

 

* | parse "GET * " as url
| count by url
| sort by _count

Note that when you want to count more than one field, you must create an alias using the as operator to rename the _count fields.

count_distinct

Counts only distinct occurrences of the value of a field being counted within the time range analyzed.

Syntax:

  • count_distinct(fieldname)

Rules:

  • Creates field named _count_distinct

Examples:

... | count_distinct(username) group by hostname

_sourceCategory=*apache*
| parse "* -" as src_ip
| count_distinct(src_ip)

By default, ordering is not defined inside of groups created using a group-by expression. To order your results, use the sort operator.

If the number of distinct items returned is less than 100, the count_distinct function provides an exact number. If the number of distinct items returned is larger than 100, count_distinct instead uses an approximate algorithm, and displays a message that explains: count_distinct saw more than 100 values, results may be approximate

The approximation algorithm uses a relative error parameter of 2%, for example:

  • 65% of the time, results are within +/- 2%.
  • 95% of the time, results are within +/- 4%.
  • 99% of the time, results are within +/- 6%.

So for example, if the true count of distinct items is 1,000, the result returned by the approximation algorithm is between 950 and 1050 about 95% of the time.

The error parameter value is important to making the count_distinct function return results quickly and in a scalable way.

Also, note that when you want to count the distinct occurrences of more than one field, you must create an alias using the as operator to rename the _count_distinct fields. See this example:

_sourceCategory=PaloAltoNetworks
| count_distinct(threatid) as cntthreatid, count_distinct(repeatcnt) as cntrepeatcnt

count_frequent

The count_frequent function can be used in cases where you want to identify the most common values for aggregations with over 10,000 distinct groups. This query returns the highest-count 10,000 results in sorted order. The resulting count field is called _approxcount because it is only an estimate of the true count; the estimate may be incorrect, but can only be over (it will never be under).

The count_frequent function is followed immediately by one or more field names.

You can use the count_frequent operator in Dashboard queries, but the number of results returned is limited to the top 100 most frequent results. All results are available when the search is run on the Search page, but only the top 100 are displayed in the Panel.

Syntax:

  • count_frequent fieldname
  • For multiple fields: count_frequent field1, field2, field3 (and so on)

Rules:

  • Creates field named _approxcount
  • Cannot be used with other aggregating functions like sum or avg.
  • Sort is built into the query and defaults to a most-to-least order.

Example:

* | parse "srcIP=*, url=*" as srcIP, url 
| count_frequent srcIP, url