Skip to main content
Sumo Logic

Subqueries

You can run a query on the results of another query using a subquery.

Subqueries allow for filtering within a query. Especially when you may not be sure of the exact filter criteria but can write a short query to do that for you. Subqueries help you correlate events across different Sumo queries, by allowing one query to pass results back to another query to narrow down the set of messages that are searched in that query. Sometimes this offers a faster approach than a join, where you would have to unite large sets of data and then search through the results to form a conclusion. If you can do some processing to narrow down the scope of data, you can form a subquery.

In subqueries, the parent query contains the main body of the query while the child query contains the results necessary for filtering the parent query.

  • Child query. Handles the filtering. Runs first and provides intermediate input for the parent query. This can be a query different time range than the parent query, but you will have to specify it.
  • Parent query. Depends on the input from a child query or queries to finish its execution.

After you have both parent and child queries set up and running independently, subqueries can begin filtering provided the child query has been added to the parent query in one of two places:

Syntax

You can only use subquery in the source expression of your query (before the first pipe, |). You can use subquery to specify values for built-in fields, FER fields, or to specify search keywords. You can also use subquery to build filter expressions for the if and where operators. The versatility of subquery  comes from the fact that it can control the output from the child query to match what is expected in the parent query.

You can use the subquery keyword to create a subquery in any Sumo query, using this syntax:

Parent query
[subquery [from=fromTime] [to=toTime] : <child query>
| compose <field1 [,field2,field3...]> [maxresults=<int>] keywords
]
Rest of parent query

Required arguments

compose controls the output of the child query that is plugged back into the parent query. The compose operator controls three things :

  •  Fields from the child query that are returned back to the parent query. You can return more than one field back to the parent query.
  •  Number of results returned from child query to the parent query. You can limit the results using the  maxResults clause with the compose operator. (Currently, you are limited to 10,000 results returned from a child query.)
  •  Format in which the values are returned. This is controlled using the keyword clause at the end of the query. This clause will be used if you are using child query to specify search clause in your source expression. The keyword cause returns the results of the given field by converting the tabular results into standard DNF format; no field information is returned to the parent.

Optional arguments

[from=fromTime] [to=toTime]  The subquery can contain a different time range for the child query. By default, the child query runs on the same time range as the parent query. To run subqueries on a different time range than the parent query, you must specify the range in your child query.  You can specify either relative time or absolute time. Use both from and to , as shown below to explicitly specify a time range.

[maxresults=<int>]  To increase performance there's a perference for returning only 2500 results, you can use maxresults to specify more results upt

[keyword] if the keyword clause is not specified then the results are returned in standard DNF format with field name information returned as key-value pairs back to parent, so those fields should be present in the parent as well in order for it to work properly. 

 

Example subquery 

Let’s say that that our company has a shopping website, and we want to track purchases made by our most active user. We can use subquery in this case to get the desired results. Below are the steps we would follow:

  1. Create a Sumo query that gives us items checked out and items purchased by a specific user (parent query).
  2. Create a Sumo query that tracks the most active user on the website (child query).
  3. Using a subquery, pass the user_id or user_ip from the child query to the parent query, so that the complete workflow happens within a single Sumo query.

Step 1: Create a parent query

Our parent query provides statistics such items checked out, and items purchased for a given user. For the example we are tracking users through their IP and we can use the following query over our custom application logs.

_sourceCategory=reinvent/travel/checkout 243.63.233.30
| json field=_raw "funcName"
| where funcname in ("process_cart","charge")
| if (funcname = "process_cart" , "Checkout", "Purchased") as funcname
| count by funcname

With a subquery we can pass the IP address in red dynamically to this query.

clipboard_e5a557c6cd8e8097e79c1598416032d8a.png

Step 2: Create a child query

Our child query provides the most active user, and we track it using IP addresses in our web server logs using the this query:

_sourceCategory=reinvent/travel/nginx
      | count by src_ip
      | topk(1,_count)

 I want to pass back this IP address to other query which tracks the user’s activity.

clipboard_ef2185c81c2d257a45fd46a6f5b9c178c.png

Step 3: Create a subquery

Now we combine the two queries into a subquery to allow the parent query to harness the child query results. There are two approaches to the subquery:

  • Specify it as keywords for your search criteria to narrow down the search scope.  We will build the subquery in place of the static IP address in the parent query by using the subquery syntax [subquery: ...]
  • Use filtering operators such as where and if. If we were filtering the IP address in a where clause then we can substitute it with subquery that dynamically generates the filter expression.   
Keywords

With keywords, you can replace the IP address with the subquery syntax and copy-paste the child query that we developed when we created a child query.

_sourceCategory=reinvent/travel/checkout
[subquery:_sourceCategory=reinvent/travel/nginx
     | count by src_ip
     | topk(1,_count)
     | compose src_ip keywords
]
| json field=_raw "funcName"
| where funcname in ("process_cart","charge")
| if (funcname = "process_cart" , "Checkout", "Purchased") as funcname
| count by funcname

We also have to append the compose operator at the end of the child query to let Sumo know what information to pass back to the parent query. In this case we only want to pass  the IP address back so we specify scr_ip field. At the end of the expression we have to mention keyword, which basically tells the system that the output will be inform of keyword expressions that can be plugged in the parent query. If you don't use keywords then src_ip field would be expected in your parent query and if not found will generate an error.

clipboard_e7da7cf36d758ecbd8383b8f6ff641261.png

Where and If

We can replicate the same query above using a filter operator such as where or if.

If we have to use a filter condition then we would need to parse out IP address in my parent query and make sure that the field name is consistent with child query. Once we have parsed out the ip address field I would need to create a where clause in the parent query and have the child query inside it, just like before. The only difference is the compose operator. In this case we don’t have the keywords at the end of the operator since we want to pass back the field_name information back to the parent query.

_sourceCategory=reinvent/travel/checkout
| json field=_raw "source_ip" as src_ip
| json field=_raw "funcName"
| where funcname in ("process_cart","charge")
| where [subquery:_sourceCategory=reinvent/travel/nginx
      | count by src_ip
      | topk(1,_count)| compose src_ip]
| if (funcname = "process_cart" , "Checkout", "Purchased") as funcname
| count by funcname

clipboard_e80b31772b236645f5e6ae1417b9c7653.png

Subquery with a different time range

By default, the child query runs on the same time range as the parent query. To run subqueries on a different time range than the parent query, you must specify the child query time range range in your child query.  You can specify either relative time or absolute time.

Use both from and to , as shown below to explicitly specify a time range.

Parent query
[subquery [from=(fromTime)] [to=(toTime)] : <child query>
| compose <field1 [,field2,field3...]> [maxresults=<int>] [keywords]
]

Rest of parent query

To specify a relative time range from only. See the following section for an example.

Time range usage examples

Relative

[subquery from=(-15m):error
| count by _sourcehost
| topk(1, _count)
| compose _sourceHost]
| count by _sourceHost

Absolute
[subquery from=(2018/07/08 23:13:36) to=(2018/07/09 23:13:36):error
| count by _sourcehost
| topk(1, _count)
| compose _sourceHost]
| count by _sourceHost

Supported time formats

Relative 

  • s - seconds
  • m - minute
  • h - hour
  • d - day
  • w - week

Epoch

Timestamp in millis since 01/01/1970 00:00:00.000 UTC

Absolute Time

We support the ISO 8601 format. 

  • yyyy-MM-dd HH:mm:ss.SSS
  • yyyy-MM-dd HH:mm:ss
  • yyyy-MM-dd HH:mm
  • yyyy-MM-dd
  • MM-dd HH:mm:ss.SSS
  • MM-dd HH:mm:ss
  • MM-dd HH:mm
  • MM-dd
  • yyyy/MM/dd HH:mm:ss.SSS
  • yyyy/MM/dd HH:mm:ss
  • yyyy/MM/dd HH:mm
  • yyyy/MM/dd
  • MM/dd HH:mm:ss.SSS
  • MM/dd HH:mm:ss
  • MM/dd HH:mm
  • MM/dd

Better Alerting with Subquery

Generally when you build alert you have to perform some complex calculations to make sure that alerts only fire when something goes wrong. These complex calculations are only required for alert firing, but are generally not required in the actual results that are returned from the alert.

Before subquery this level of detail was not possible in your alerts, but with Subquery you can easily get it.  For example, you want to get alerted whenever you have an issue deploying an index. First see if there is a given index was not successfully deployed even after retrying _count > 3, if that happens then subquery will send useful information about the given index, like which host it was deployed on etc.

_sourceCategory=search "error while retrying to deploy index"
| parse \",name=*-*\" as cus,index | where [subquery: _sourceCategory=search "error while retrying to deploy index"
!info
         | parse ",indexName='*-*'" as cus,index
         | count by index // Ignore cases where retry might have happened.
         | where _count > 3
         | compose by index]
| count by _sourceHost , cus,index

Check Malicious Activity with Subquery

The following search allows a security analyst how to track logs related to a malicious IP address that was flagged by Guard Duty and also by a CrowdStrike Threat feed.

_sourceCategory=weblogs
[subquery:_sourceCategory="Labs/SecDemo/guardduty" "EC2 Instance" "communicating on an unusual server port 22"
| json field=_raw "service.action.networkConnectionAction.remoteIpDetails" as remoteIpDetails
| json field=_raw "service.action.networkConnectionAction.connectionDirection" as connectionDirection
| where connectionDirection = "OUTBOUND"
| json field=remoteipdetails "ipAddressV4" as src_ip
| lookup type, actor, raw, threatlevel from sumo://threat/cs on src_ip=threat
| where threatlevel = "high"
| compose src_ip]

Best practices

Here's a few tips and tricks to help you have a smoother experience with subqueries.

  • Your queries will perform better if you have the child query in the source expression (before the first pipe (|)), rather than having it in the filter clause. The below examples highlight this point. In the first here we use subquery before the first pipe and it executes in 17 seconds:
    17seconds.png
    compare that to where subquery is used in the where clause and you can see it takes 29 seconds to execute
    29Seconds.png
  •  If the child query is used to build the filter clause, try having the filter clause close to the source expression rather than having it further down in the query to improve performance. Your query should more like the one on the right
Less Efficient More Efficient

query scope
| operator 1
| operator 2
..
| operator n
| subquery
| some more operations

query scope
| subquery
| operator 1
| operator 2




..
| operator n

  • Run the child query in a separate tab first. Append the compose operator at the end of that query to check the results that are returned. When you are happy with the query, copy it into your main subquery. This pre-testing reduces the chances of creating queries that generate incorrect results. The screenshot show how you can build your child query in a separate tab using compose operator.

GuardDuty.png

 

  • If your subquery is generating too many records, try reducing the time range of the query.

Limitations

Subqueries have some limitations.

  • You cannot return more than 10,000 unique results (rows) from the child query. The number of columns returned by the child query does not influence the number of results.
  • The Log Search view does not present a histogram for child queries. The histogram that appears represents only the parent query.
  • Everything returned from the API will be from the final output. Child data is not returned.
  • Subquery is not supported in the following cases:
    •   In Scheduled Views
    •   Inside FERs
    •   Live Dashboards
    •   Real-time Alerts