Skip to main content
Sumo Logic

Beta - Subqueries

You can run a query on the results of another query.

Subqueries help you correlate events across different Sumo queries, by allowing one query to pass results back to another query to narrow down the set of messages that are searched in that query. Sometimes this offers a faster approach than a join, where you would have to unite large sets of data and then search through the results to form a conclusion. If you can do some processing to narrow down the scope of data , you can form a subquery.

In subqueries the parent query contains the main body of the query while the child query contains the results necessary for the parent query.

  •     Child query. Runs first and provides intermediate input for the parent query.
  •     Parent query. Depends on the input from a child query or queries to finish execution.

Syntax

You can use the following syntax to create a subquery in any Sumo query.

Parent query
[subquery: <child query>
| compose <field1 [,field2,field3...]> [maxresults=<int>] [keywords]
]
Rest of parent query

You can only use subquery in the source expression of your query (before the first  |) to specify values for built in fields, FER fields, or specify search keywords. Subquery can also be used to build filter expressions for if & where operators. This versatility of subquery to be used anywhere comes from the fact that it can control the output from the child query to match what is expected in the parent query.

The compose operator controls the output of the child query that is plugged back into the parent query. The compose operator basically controls three things :

  1.  What fields from the child query are returned back to the parent query. You can return more than one field back to the parent query.
  2.  Number of results returned from child query to the parent query. You can limit the results using maxResults clause with compose operator. (Currently there is a hard limit of 2500 results returned from a child query.)
  3.  Format in which the values are returned. This is controlled using the “keyword” clause at the end of the query. This clause will be used if you are using child query to specify search clause in your source expression. The keyword cause basically does the following:  
    • It returns the results of the given field by converting the tabular results into standard DNF format, no field information is returned back to the parent.
    • If the keyword clause is not specified then the results are returned in standard DNF format with field name information returned as key value pair back to parent, so those fields should be present in the parent as well in order for it to work properly. 

Example

Let’s say that that our company has a shopping website, and we want to track purchases made by our most active user. We can use subquery in this case to get the desired results. Below are the steps we would follow:
    1.    Create a Sumo query that gives us items checked out and items purchased by a specific user (parent query).
    2.    Create a Sumo query that tracks the most active user on the website (child query).
    3.    Using subquery pass the user_id or user_ip from the child query to the parent query, so that the complete workflow happens within a single Sumo query.

Step 1: Create Parent Query

Our parent query provides statistics such items checked out, and items purchased for a given user. For the example we are tracking users through their IP and we can use the following query over our custom application logs.

_sourceCategory=reinvent/travel/checkout 243.63.233.30
| json field=_raw "funcName"
| where funcname in ("process_cart","charge")
| if (funcname = "process_cart" , "Checkout", "Purchased") as funcname
| count by funcname

With subquery we can pass the IP address in red dynamically to this query.

clipboard_e5a557c6cd8e8097e79c1598416032d8a.png

Step 2: Create Child Query

Our child query provides the most active user, and we track it using IP addresses in our web server logs using the below mentioned query :

_sourceCategory=reinvent/travel/nginx
      | count by src_ip
     | topk(1,_count)

 I want to pass back this IP address to other query which tracks the user’s activity.

clipboard_ef2185c81c2d257a45fd46a6f5b9c178c.png

Step 3: Create a Subquery

Now we combine the two queries into a subquery to allow the parent query to harness the child query results. There are two approaches to the subquery:

  •     Specify it as keywords for your search criteria to narrow down the search scope.  We will build the subquery in place of the static IP address in the parent query by using the subquery syntax [subquery: ...]
  •     Use filter operators such as where and if. If we were filtering the IP address in a where clause then we can substitute it with subquery that dynamically generates the filter expression.   
Keywords Approach

With the keywords approach, you can replace the IP address with the subquery syntax and copy paste the child query that we developed when we created a child query.

_sourceCategory=reinvent/travel/checkout
[subquery:_sourceCategory=reinvent/travel/nginx
      | count by src_ip
     | topk(1,_count)
     | compose src_ip keywords
]
| json field=_raw "funcName"
| where funcname in ("process_cart","charge")
| if (funcname = "process_cart" , "Checkout", "Purchased") as funcname
| count by funcname

We also have to append the compose operator at the end of the child query to let Sumo know what information to pass back to the parent query. In this case we only want to pass  the IP address back so we specify scr_ip field. At the end of the expression we have to mention keyword, which basically tells the system that the output will be inform of keyword expressions that can be plugged in the parent query. If you don't use keywords then src_ip field would be expected in your parent query and if not found will generate an error.

clipboard_e7da7cf36d758ecbd8383b8f6ff641261.png

Filter Operator Approach

We can replicate the same query above using a filter operator such as where.

If we have to use a filter condition then we would need to parse out IP address in my parent query and make sure that the field name is consistent with child query. Once we have parsed out the ip address field I would need to create a where clause in the parent query and have the child query inside it, just like before. The only difference is the compose operator. In this case we don’t have the keywords at the end of the operator since we want to pass back the field_name information back to the parent query.

_sourceCategory=reinvent/travel/checkout
| json field=_raw "source_ip" as src_ip
| json field=_raw "funcName"
| where funcname in ("process_cart","charge")
| where [subquery:_sourceCategory=reinvent/travel/nginx
      | count by src_ip
      | topk(1,_count)| compose src_ip]
| if (funcname = "process_cart" , "Checkout", "Purchased") as funcname
| count by funcname

clipboard_e80b31772b236645f5e6ae1417b9c7653.png

Best Practices

Here's a few tips and tricks to help you have a smoother experience with subqueries.

  •  Your queries will perform better if you have the child query in the source expression (before the first pipe (|)), rather than having it in the filter clause
  •  If the child query is used to build the filter clause, then try having the filter clause close to the source expression rather than having it further down in the query as it would give better performance.
  •  Run the child query in a separate tab first. You can even append the compose operator there to check the results returned. Once you are confident with the query that you build, copy it into your main subquery. This reduces the chances of creating queries that provide wrong results. 
  •  If your subquery is generating too many records, try reducing the time range of the query.

Limitations

There are some limitations of subqueries

    1.    You cannot return more than 2500 unique results (rows) back from the child query. Number of columns sent back from the child query does not influence the number of results.
    2.    Currently, child query will run on the same time range as the parent. In the future we will support different time ranges for child queries.
    3.    Log Search Page will not show histogram for child queries.
    4.    If getting the status of an Search job thought API message Count, histogram Buckets,pending Errors, pending Warnings, recordCount will only reflect status of the parent query, and not the child queries.

Where can you use Subqueries?

Here's the list of what is support

  •   Search API
  •    Log Search Page (UI)
  •    Dashboards (Not supported in live mode)
  •    Scheduled Searches (but not real-time scheduled searches)

Where are Subqueries not supported?

  •    Scheduled Views
  •    FERs
  •    Live Dashboards
  •    Real-time Scheduled Searches

Known Issues

We're still working on some features during this Beta period. Here are some known issues that you might experience:

  •  If there are syntactical issues within the child query then the UI doesn't display correct warning messages like it does for normal Sumo queries.
  •  If your subquery returns no results, then you might get unexpected errors from your subquery. We recommend doing a null check before the results are returned by the child query.
  •  Our UI doesn’t provide autocomplete for the subquery syntax, and the inline help is also not available.
  •  There's no error message to prevent users from adding subquery to live dashboard. This can break the dashboard since subquery is not support in live mode.
  •  Currently you cannot cancel, pause, or resume a subquery.
  •  If you have to many subqueries running concurrently, then you might get a stack overflow error when you try to start a new subquery. In this case, you should wait for sometime for the earlier subqueries to finish before starting your new query.