Skip to main content
Sumo Logic

Parse Variable Patterns Using Regex

The Parse Regex operator (also called the extract operator) enables users comfortable with regular expression syntax to extract more complex data from log lines. Parse regex can be used, for example, to extract nested fields.

User added fields, such as extracted or parsed fields, can be named using alphanumeric characters as well as underscores ("_") and dashes ("-­"). They must start and end with an alphanumeric character.

Syntax:

  • ... | parse regex "start_anchor_regex(?<field_name>.*?)stop_anchor_regex" | ...
  • ... | parse regex "start_anchor_regex(?<field_name>.*?)stop_anchor_regex" nodrop | ...
  • ... | parse regex field=<field_name> "start expression(?<fieldname>field expression) stop expression" | ...

Options:

  • parse field=fieldname The parse field=fieldname option allows you to specify a field to parse other than the default message. For details, see Parse field
  • * | parse "a=*," as a nodrop The parse nodrop option forces results to also include messages that do not match any segment of the parse term. For details, see Parse nodrop
  • parse multi The parse multi option allows you to parse multiple values within a single log message. See Parse multi. You can use the alternate term "extract".

For more information on Regular Expressions, see the Perl documentation. Or try the regex tester at regex101.com.

Rules:

  • Regex must be a valid Java regular expression enclosed within quotes.
  • Matching is case sensitive. If any of the text segments cannot be matched, then none of the variables will be assigned.
  • If no field is specified, then the entire text of incoming messages is used.
  • Multiple parse expressions are processed in the order they are specified. Each expression always starts matching from the beginning of the message string.
  • Multiple parse expressions can be written with shorthand using comma-separated terms.
  • Can be used with the parse anchor operator.

Examples 

Parsing an IP address

Extracting IP addresses from logs is straight-forward using a parse regex similar to:

... | parse regex "(?<ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) " | ...

Parsing multiple fields in a single query.

Parse regex supports parsing out multiple fields in one query. For example, say we want to parse username and host information from logs. Use a query similar to:

... | parse regex "user=(?<user>.*?):" 
| parse regex "host=(?<msg_host>.*?):" 
| ...

Indicating an OR condition to use non-capturing groups

In situations where you want to use an OR condition, where you have multiple possibilities that may match the regular expression, the best practice is to use non-capturing groups (?: regex).

To specify a list of alternative strings in a regular expression, use the group syntax. For example, for the following two log lines:

Oct 11 18:20:49 host123.example.com 16234563: Oct 11 18:20:49: %SEC-6-IPACCESSLOGP: list 101 denied tcp 10.1.2.3(1234) -> 10.1.2.4(5678), 1 packet
Oct 11 18:20:49 host123.example.com 16234564: Oct 11 18:20:49: %SEC-6-IPACCESSLOGP: list 101 accepted tcp 10.1.2.5(4321) -> 10.1.2.6(8765), 1 packet


you can write the following query to extract the "protocol":

parse regex "list 101 (accepted|denied) (?<protocol>.*?) "


The Sumo Logic query language actually requires that groups that are not captured to an alias must be marked explicitly as non-capturing groups.

So, you would actually write:

parse regex "list 101 (?:accepted|denied) (?<protocol>.*?) "


But if you mean to also capture whether it is an "accepted" or a "denied" into an alias, then you would include:

parse regex "list 101 (?<status>accepted|denied) (?<protocol>.*?) "

Parse multi

In addition to parsing a field value, the multi option (also called parse multi) allows you to parse multiple values within a single log message. This means that the multi keyword instructs the parse regex operator to not just look for the first value in a log message, but for all of the values, even in messages with a varying number of values. As a part of this process, the multi keyword creates copies of each message so that each individual value in a field can be counted.

For example, say our firewall log messages look like this:

http://www.bigdata.biz" "CU1_4919|967:925:123

From this message, we'd like to extract the firewall codes. Use the multi keyword in the parse regex:

... parse url first
| parse regex "Firewall Rules: \|(?<trigger_rules>.*?)\|"
| parse regex field=trigger_rules "(?<trigger_rule>\d+)" multi

The output looks like:

As each value has its own message, you can use any of the parsed values in an aggregation.