Skip to main content
Sumo Logic

Lab 3 - Parsing Options

This lab teaches you about the parsing options available.
Parsing your logs allow you to provide structure to your messages, identifying the fields that are meaningful to you.

 

  1. By default your Dynamic parsing mode option is enabled as shown below with Auto Parse Mode selected. Dynamic Parsing allows automatic field extraction from your JSON log messages when you run a search. This allows you to view fields from JSON logs without having to manually specify parsing logic. Dynamic Parsing extracts JSON fields when you run a query, at search time (run time). Use the following code to obtain the count for awsregion coming in from your AWS CloudTrail data source. After running the query, notice that in the Field Browser's Hidden Fields you have all the JSON metadata fields including awsregion because you are in Auto Parse Mode.

    Screen Shot 2021-04-16 at 3.34.36 PM.png

_sourceCategory=Labs/AWS/CloudTrail
| count by awsregion

Image of JSON Auto field selections

  1. Now, you will change the Dynamic Parse mode to manual mode and rerun the query. Manual mode will require all parsing to be contained in your query. Notice that you get this warning "Field awsregion not found, please check the spelling and try again." This is because count is trying to count metadata, awsregion, that hasn't been created. To fix this query for manual mode, you will need to parse awsregion explicitly. Use the query below which has the json field parse for awsregion. 

_sourceCategory=Labs/AWS/CloudTrail
| json field=_raw "awsRegion"as awsregion
| count by awsregion

  1. The nodrop option for the parse operator allow users to include messages in your results that do not meet the pattern criteria. Run a search for Apache Error logs for the last 15 minutes and notice that not all messages have a client ip.

_sourceCategory=Labs/Apache/Error

  1. Run the same search, but this time, parse the client ip. Notice how all other messages without the [client *] pattern are dropped.

_sourceCategory=Labs/Apache/Error

| parse "[client *]" as client_ip

  1. Add the nodrop option. Notice how non-matched messages are kept, with an empty client_ip. Notice how a nodrop combined with additional parse statements can allow you to parse logs of varying patterns/formats.

_sourceCategory=Labs/Apache/Error

| parse "[client *]" as client_ip nodrop

| parse "mod_log_sql: *" as message

  1. Filter those parsed by one or the other statement by using the isEmpty, isBlank or isNull operators.

_sourceCategory=Labs/Apache/Error

| parse "[client *]" as client_ip nodrop

| parse "mod_log_sql: *" as message

| where isBlank(client_ip)

  1. The parse field option allows you to do further parsing on an already extracted field. In this example, we want to identify the top 5 committers in GitHub. Search committers in the last 30 days, and parse their email address.

_sourceCategory=Labs/Github and "committer"

| parse "\"email\":\"*\"" as email

  1. Now use the parse field option to further parse the email address into user and domain. Lastly, count by user and identify the top 5 committers.

_sourceCategory=Labs/Github and "committer"

| parse "\"email\":\"*\"" as email

| parse field=email "*@*" as users, domain

| count by users

| top 5 users by _count

  1. The parse multi option allows you to extract multiple occurrences of  the same pattern within one message. By default, parse only extracts the first occurrence. First, search the Snort data and extract the ip address.

_sourceCategory=labs/snort

| parse regex "(?<ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

  1. Now use parse multi and notice how each message is repeated for each occurrence of an ip address, allowing you to do accurate counts.

_sourceCategory=labs/snort

| parse regex "(?<ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" multi

  1. Field Extraction Rules extract fields at the time the log messages are ingested. You can see all FERs available (and their details) under Manage Data → Logs → Field Extraction Rules. Taking advantage of the Apache Access rule, run a search to identify the count of 404s by source ip.

_sourceCategory=Labs/Apache/Access and status_code=404

| count by src_ip

Image of Field Extraction Rules settings

QUIZ: True or False

  1. csv, json, split, keyvalue are all parsing operators.

  2. Once a field has been parsed, it cannot be parsed any further.

  3. Fields parsed by the Field Extraction Rules are available in the Field Browser.