Skip to main content
Sumo Logic

Parse XML Formatted Logs

The XML operator uses a subset of the XPath 1.0 specification to provide a way for you to parse fields from XML logs. Using it, you can specify what to parse from an XML log using an XPath reference.

Ingested XML files must be well-formed and valid in order to be parsed by the XML operator. If the XML is not valid, you will receive an error.

Syntax

  • | parse XML [field=<field_name>] "<xpath_expression>"[, "<xpath_expression>"] [as <fields>] [nodrop]

Options

  • field=<field_name> 

    The field=fieldname option allows you to specify a field to parse other than the default message. For details, see Parse field

  • nodrop 

    The nodrop option forces results to also include messages that do not match any segment of the parse term. For details, see Parse nodrop

Rules

  • If no field is specified, then the entire text of incoming messages is used.
  • If the XPaths are not valid, an error is thrown.
  • If the number of field names don't match the specified XPaths, an error is thrown.
  • If the field is not well-formed XML, null is returned, unless you have specified nodrop.
  • If the XPath doesn't match anything in the log, then null is returned, unless you have specified nodrop.
  • If the XPath matches an element, then its string representation is returned.
  • If the XPath matches multiple elements, then the first one is returned.

Example 1

This example references the following log:

<users> <user id="123" role="manager"> <first_name>Sally</first_name> <last_name>Jones</last_name> <email>sally@emailplace.com</email> </user> <user id="456" role="contributor"> <first_name>Bob</first_name> <last_name>Smith</last_name> <email>bob@emailplace.com</email> </user> </users>

Parse a field

You can parse information using an XPath reference, such as the first_name element value:

* | parse xml "/users/user/first_name/text()" as first_name

The text() function will pull the text value of the element. The results would return a field named first_name with the value of Sally.

To parse the id attribute you'd use the following:

* | parse xml "/users/user/@id" as id

The results would return a field named id with a value of 123.

Parse a field in an array with the same name

To parse the second user element in the array you'd use the following:

* | parse xml "/users/user[2]/first_name/text()" as first_name

The results would return a field named first_name with a value of Bob.

To parse the id value for this element you'd use the following:

* | parse xml "/users/user[2]/@id" as id

To parse the last element in an array you'd use the following:

* | parse xml "/users/user[last()]/first_name/text()" as first_name

To parse an element based on an attribute, in this example where id="456", you'd use the following:

* | parse xml "/users/user[@id=456]/first_name/text()" as first_name"

Example 2

This example references the following log:

<af type="nursery" id="102" timestamp="Nov 20 04:41:11 2013" intervalms="1089510.533">
<minimum requested_bytes="48" />
<time exclusiveaccessms="0.163" meanexclusiveaccessms="0.163" threads="0" lastthreadtid="0x0000000034520C00" />
<refs soft="40652" weak="35055" phantom="594" dynamicSoftReferenceThreshold="10" maxSoftReferenceThreshold="32" />
<nursery freebytes="0" totalbytes="324978688" percent="0" />
<tenured freebytes="61087704" totalbytes="553484288" percent="11" >
<soa freebytes="33414104" totalbytes="525810688" percent="6" />
<loa freebytes="27673600" totalbytes="27673600" percent="100" />
<refs soft="40619" weak="29867" phantom="586" dynamicSoftReferenceThreshold="10" maxSoftReferenceThreshold="32" />
<time totalms="91.622" />
</af>

Parse a field

You can parse information using an XPath reference, such as:

* | parse XML "/af/@type"

This will add the value of the attribute type, of the root af element, in a field called /af/@type, with value nursery.

The results are:

Parse a more complex field

Use a query such as:

* | parse xml "/af/minimum/@requested_bytes"

This will add the value of the attribute requested_bytes, of the root af/minimum element, in a field called /af/minimum@requested_bytes, with the number of requested bytes.

Parse multiple fields

Use a query such as:

| parse xml "/af/@type", "/af/@timestamp"

This will add the value of the attribute type and timestamp, of the root af element, in a field called /af/@type and /af/@timestamp, with value nursery and Nov 20 04:41:11 2013 respectively.

xml multiple fields results.png

XPath subset limitations

The full XPath 1.0 specification is not supported. In order to increase performance, Sumo Logic supports a subset of the specification, including the following caveats:

Forward only

The XML operator only allows XML paths to go deeper into the tree. For example, this expression is not allowed:

/af/nursery/../@type

Full location paths

You must specify the full path to the elements you want to parse. This means that "self-or-descendant" expressions are not supported. For example, the following paths are not allowed:

//af

/af//nursery

No expanded syntax axis specifiers

Expanded syntax is not supported. For example, the following expressions cannot be used:

/child::af

/descendant-or-self::af