Skip to main content

Sumo Logic Kubernetes Helm Chart Log Collection

By default, log collection is enabled. This includes both container logs and systemd logs.

Container logs are read and parsed directly from the Node filesystem, where the kubelet writes them under the /var/log/pods directory.

Systemd logs are read and parsed directly from the Node journal.

They are then sent to a metadata enrichment service which takes care of adding Kubernetes metadata, custom processing, filtering, and finally sending the data to Sumo Logic. Both the collection and the metadata enrichment are done by the OpenTelemetry Collector.

See the Solution Overview diagram for a visualization.


High level configuration for logs is located in values.yaml under the sumologic.logs key.

Configuration specific to the log collector DaemonSet can be found under the otellogs key.

Finally, configuration specific to the metadata enrichment StatefulSet can be found under the metadata.logs key.

Container Logs​

Configuration specific to container logs is located under the sumologic.logs.container key.

Multiline log parsing​

By default, each line output by an application is treated as a separate log record. However, some applications can actually output logs split into multiple lines - this is often the case for stack traces, for example. If we want such a multiline log to appear in Sumo Logic as a single record, we need to tell the collector how to distinguish between lines which start a new record and ones which continue an existing record.

Multiline log parsing can be configured using the sumologic.logs.multiline section in user-values.yaml.

enabled: true
first_line_regex: "^\\[?\\d{4}-\\d{1,2}-\\d{1,2}.\\d{2}:\\d{2}:\\d{2}"

where first_line_regex is a regular expression used to detect the first line of a multiline log.

This feature is enabled by default and the default regex will catch logs starting with a ISO8601 datetime. For example:

2007-03-01T13:00:00Z this is the first line of a log record
this is the second line
and this is the third line
2007-03-01T13:00:01Z this is a new log record

This feature can rarely cause problems by merging together lines which are supposed to be separate. In that case, feel free to disable it.

Conditional multiline log parsing​

Multiline log parsing can be also configured per specific conditions:

enabled: true
first_line_regex: "^\\[?\\d{4}-\\d{1,2}-\\d{1,2}.\\d{2}:\\d{2}:\\d{2}"
- first_line_regex: <regex 1>
condition: <condition 1>
- first_line_regex: <regex 2>
condition: <condition 2>
# ...

In that case first_line_regex of first matching condition is applied, and sumologic.logs.multiline.first_line_regex is used as expression for logs which do not match any of the condition.

Conditions have to be valid OpenTelemetry Expression.

The following variables may be used in the condition:

  • body. body of a log
  • attributes[""]
  • attributes[""]
  • attributes[""]
  • attributes["log.file.path"]. log path on the node (/var/log/pods/...)
  • attributes["stream"]. may be either stdout or stderr

Consider the following example:

enabled: true
first_line_regex: "^\\[?\\d{4}-\\d{1,2}-\\d{1,2}.\\d{2}:\\d{2}:\\d{2}"
- first_line_regex: "^@@@@ First Line"
condition: 'attributes[""] == "foo"'
- first_line_regex: "^--- First Line"
condition: 'attributes[""] matches "^bar-.*'
- first_line_regex: "^Error"
condition: 'attributes["stream"] == "stderr" and attributes[""] != "lorem"'

It is going to:

  • Use ^@@@@ First Line expression for all logs from foo namespace
  • Use ^--- First Line expression for all remaingin logs from containers, which names start with bar-
  • Use ^Error expression for all remaining stderr logs which are not in lorem namespace
  • Use ^\\[?\\d{4}-\\d{1,2}-\\d{1,2}.\\d{2}:\\d{2}:\\d{2} expression for all remaining logs

Logs which match multiple conditions are processed only by the first match.

Container Log format​

There are three log formats available: fields, json_merge and text. fields is the default.

You can change it by setting:

format: fields

We're going to demonstrate the differences between them on two example log lines:

  • A plain text log
    2007-03-01T13:00:00Z I am a log line
  • A JSON log
    { "log_property": "value", "text": "I am a json log" }

json log format​

json log format is an alias for the fields log format.

fields log format​

Logs formatted as fields are wrapped in a JSON object with additional properties, with the log body residing under the log key.

For example, log line 1 will show up in Sumo Logic as:

log: "2007-03-01T13:00:00Z I am a log line",
stream: "stdout",
timestamp: 1673627100045

If the log line contains json, as log line 2 does, it will be displayed as a nested object inside the log key:

log: {
log_property: "value",
text: "I am a json log"
stream: "stdout",
timestamp: 1673627100045

json_merge log format​

json_merge is identical to fields for non-JSON logs, but behaves differently for JSON logs. If the log is JSON, it gets merged into the top-level object.

Log line 1 will show up the same way as it did for fields:

log: "2007-03-01T13:00:00Z I am a log line",
stream: "stdout",
timestamp: 1673627100045

However, the attributes from log line 2 will show up at the top level:

log: {
log_property: "value",
text: "I am a json log"
stream: "stdout",
timestamp: 1673627100045
log_property: "value",
text: "I am a json log"

text log format​

The text log format sends the log line as-is without any additional wrappers.

Log line 1 will therefore show up as plain text:

2007-03-01T13:00:00Z I am a log line

Whereas log line 2 will be displayed as JSON:

log_property: "value",
text: "I am a json log"

If you want to send metadata along with an unstructured log record, you have to use resource level attributes, because record level attributes are going to be removed before sending log to Sumo Logic. See Mapping OpenTelemetry concepts to Sumo Logic for more details.

Systemd Logs​

Configuration specific to systemd logs is located under the sumologic.logs.systemd key. Most configuration options for systemd logs are shared with container logs, and are documented under Modification and filtering.

Selecting systemd units to collect logs from​

You can control which systemd units to collect logs from by setting:

- docker.service

The default is:

- addon-config.service
- addon-run.service
- cfn-etcd-environment.service
- cfn-signal.service
- clean-ca-certificates.service
- containerd.service
- coreos-metadata.service
- coreos-setup-environment.service
- coreos-tmpfiles.service
- dbus.service
- docker.service
- efs.service
- etcd-member.service
- etcd.service
- etcd2.service
- etcd3.service
- etcdadm-check.service
- etcdadm-reconfigure.service
- etcdadm-save.service
- etcdadm-update-status.service
- flanneld.service
- format-etcd2-volume.service
- kube-node-taint-and-uncordon.service
- kubelet.service
- ldconfig.service
- locksmithd.service
- logrotate.service
- lvm2-monitor.service
- mdmon.service
- nfs-idmapd.service
- nfs-mountd.service
- nfs-server.service
- nfs-utils.service
- node-problem-detector.service
- ntp.service
- oem-cloudinit.service
- rkt-gc.service
- rkt-metadata.service
- rpc-idmapd.service
- rpc-mountd.service
- rpc-statd.service
- rpcbind.service
- set-aws-environment.service
- system-cloudinit.service
- systemd-timesyncd.service
- update-ca-certificates.service
- user-cloudinit.service
- var-lib-etcd2.service

Kubelet Logs​

Kubelet logs are a subset of systemd logs, but can be configured separately due to their particular significance for Kubernetes observability. Configuration specific to kubelet logs is located under the sumologic.logs.kubelet key. Most configuration options for kubelet logs are shared with container logs, and are documented under Modification and filtering.

Modification and filtering​

These settings are identical for container, systemd and kubelet logs. The only difference is the top-level section name.

Setting source name and other built-in metadata​

It's possible to customize the built-in Sumo Logic metadata (like source name for example) for both container and systemd logs:

## Set the _sourceHost metadata field in Sumo Logic.
sourceHost: ""
## Set the _sourceName metadata field in Sumo Logic.
sourceName: "%{namespace}.%{pod}.%{container}"
## Set the _sourceCategory metadata field in Sumo Logic.
sourceCategory: "%{namespace}/%{pod_name}"
## Set the prefix, for _sourceCategory metadata.
sourceCategoryPrefix: "kubernetes/"
## Used to replace - with another character.
sourceCategoryReplaceDash: "/"

As can be seen in the above example, these fields can contain templates of the form %{field_name}, where field_name is the name of a resource attribute. Available resource attributes include the values of sumologic.logs.fields, which by default are:

  • cluster
  • container
  • daemonset
  • deployment
  • host
  • namespace
  • node
  • pod
  • service
  • statefulset

in addition to the following:

  • _collector
  • pod_labels_* where * is the Pod label name


Please see the doc about filtering data.

Modifying log records​

To modify log records, use OpenTelemetry processors. Add them to sumologic.logs.container.(otelcol|systemd).extraProcessors.

Here are some examples.

To modify log body, use the Transform processor:

- transform/mask-card-numbers:
- context: log
- replace_pattern(body, "card=\\d+", "card=***")

To modify record attributes, use the Attributes processor:

- attributes/delete-record-attribute:
- action: delete
key: unwanted.attribute
# To rename old.attribute to new.attribute, first create new.attribute and then delete old.attribute.
- attributes/rename-old-to-new:
- action: insert
key: new.attribute
from_attribute: old.attribute
- action: delete
key: old.attribute

To modify resource attributes, use the Resource processor:

- resource/add-resource-attribute:
- action: insert
key: environment
value: staging
- resource/remove:
- action: delete
key: redundant-attribute

Adding custom fields​

To add a custom field named static-field with value hardcoded-value to logs, use the following configuration:

- resource/add-static-field:
- action: insert
key: static-field
value: hardcoded-value

To add a custom field named k8s_app with a value that comes from e.g. the pod label, use the following configuration:

- resource/add-k8s_app-field:
- action: insert
key: k8s_app

Make sure the field is added in Sumo Logic.


By default, the metadata enrichment service provisions and uses a Kubernetes PersistentVolume as an on-disk queue that guarantees durability across Pod restarts and buffering in case of exporting problems.

This feature is enabled by default, but it only works if you have a correctly configured default storageClass in your cluster. Cloud providers will do this for you when provisioning the cluster. The only alternative is disabling persistence altogether.

Persistence can be customized via the metadata.logs.persistence section:

enabled: true
# storageClass: ""
accessMode: ReadWriteOnce
size: 10Gi
## Add custom labels to all otelcol statefulset PVC (logs and metrics)
pvcLabels: {}

These settings affect persistence for metrics as well.

Advanced Configuration​

This section covers more advanced ways of configuring logging. Knowledge of OpenTelemetry Collector configuration format and concepts will be required.

Direct configuration​

There are two ways of directly configuring OpenTelemetry Collector for both log collection and metadata enrichment. These are both advanced features requiring a good understanding of this chart's architecture and OpenTelemetry Collector configuration.

The metadata.logs.config.merge and otellogs.config.merge keys can be used to provide configuration that will be merged with the Helm Chart's default configuration. It should be noted that this field is not subject to normal backwards compatibility guarantees, the default configuration can change even in minor versions while preserving the same end-to-end behavior. Use of this field is discouraged - ideally the necessary customizations should be able to be achieved without touching the otel configuration directly. Please open an issue if your use case requires the use of this field.

The metadata.logs.config.override and otellogs.config.override keys can be used to provide configuration that will be completely replace the default configuration. As above, care must be taken not to depend on implementation details that may change between minor releases of this Chart.

See Sumologic OpenTelemetry Collector configuration for more information.

Disabling container logs​

Container logs are collected by default. This can be disabled by setting:

enabled: false

Multiline unstructured logs with HTTP sources​

By default, the Helm Chart sends data to Sumo using the OpenTelemetry Protocol (OTLP), and therefore uses the OTLP Source. However, if you've chosen to use a plain HTTP Source by setting sumologic.logs.source_type to http, be aware that this source does not support client-side multiline parsing for logs in text format. You'll need to do multiline detection in the source itself. This can be set up in the Helm Chart configuration the following way:

## Disable automatic multiline detection on collector side
use_autoline_matching: false
## Set the following multiline detection regexes on collector side:
## - \{".* - in order to match json lines
## - \[?\d{4}-\d{1,2}-\d{1,2}.\d{2}:\d{2}:\d{2}.*
## Note: `\` is translated to `\\` and `"` to `\"` as we pass to terraform script
manual_prefix_regexp: (\\{\".*|\\[?\\d{4}-\\d{1,2}-\\d{1,2}.\\d{2}:\\d{2}:\\d{2}.*)
Privacy Statement
Terms of Use

Copyright Β© 2024 by Sumo Logic, Inc.