Skip to main content
Sumo Logic

Set up collection for Kubernetes

This page provides an overview of the collection process for Kubernetes environments, and then walks you through configuring log and metric collection. The Sumo Logic Kubernetes App provides services for managing and monitoring Kubernetes worker nodes works in conjunction with the Kubernetes Control Plane App that monitors the master node control plane, including the API server, etcd, kube-system, as well as worker nodes. You will set up both of these apps in the configuration process.

Collection overview

Sumo Logic collects logs, events, metrics, and security data with Fluentbit, FluentD, Prometheus, and Falco. These collectors are all open source collectors that are maintained by the cloud native computing foundation (CNCF). The collected data streams through a centralized FluentD pipeline for metadata enrichment. Sumo Logic tags the container, pod, node, and cluster, as well as identifying the service, namespace, and deployment. 

K8s_Centralized_Collection.png

Log and metric types

The Kubernetes Control Plane App uses logs and metrics.

Log sources

The Sumo Logic Kubernetes app uses FluentBit and FluentD to collect logs.

Metric sources
  • Kubernetes API Server Metrics.
  • Scheduler Metrics.
  • Controller Manager Metrics. 
  • Node-exporter Metrics. 
  • kube-state-metrics.

Metrics are collected using Prometheus with FluentD. For additional information on metrics options you can configure for collection, see this document.

Configuring log and metric collection

The Sumo Logic Kubernetes Control Plane App works in conjunction with the  Kubernetes App to monitor the master node control plane, including the API server, etcd, kube-system, as well as worker nodes. You configure log and metric collection when you install the Kubernetes App, as described in this section.

Step 1. Setup and install the Kubernetes App

The Sumo Logic Kubernetes App provides the services for managing and monitoring Kubernetes worker nodes. You must set up collection and install the Kubernetes App before you install the Kubernetes - Control Plane App. You configure log and metric collection during this process.

To set up and install the Kubernetes app, review our best practices and follow the instructions in this document.

Step 2. Install the Kubernetes Control Plane App 

The process for installing the Kubernetes Control Plane App varies depending on the platform of your cluster. This section provides information on the available Kubernetes platforms. Choose the procedure meant for your cluster environment.

Custom Kubernetes cluster

If you built your own Kubernetes cluster, you should follow the steps recommended in this section. You configured log and metric collection when you installed the Kubernetes App, and are now ready to install the Kubernetes Control Plane App.

To install the Kubernetes Control Plane App, follow the instructions on this page.

Managed service provider

If you are using a managed service provider, you should follow the steps recommended in this section for your managed service. You configured log and metric collection when you installed the Kubernetes App, and are now ready to install the appropriate control plane app for your platform:

Best practices

Setting the scrape interval globally in Prometheus

During installation you can set the scrape interval globally for Prometheus to reduce or increase the frequency at which metrics are collected. For example, the following flag passed into the helm install command will set the scrape interval to every 2 minutes instead of every 30 seconds (default).

--set prometheus-operator.prometheus.prometheusSpec.scrapeInterval="2m"

You can also set this in the values.yaml file by adding scrape_interval. See Prometheus documentation for reference.

Considerations:

  • Reducing the scrape interval works best for metrics sources that generate low volumes of metrics.
    • Examples of sources that generate low volumes of metrics
      • kube-controller-manager-metrics
      • kube-scheduler-metrics
    • Examples of sources which generate higher volumes of metrics
      • apiserver-metrics
      • Kube-state-metrics
  • Increasing the scrape interval may affect some dashboard panels, preventing them from rendering properly.

Fluentd collection performance

We have benchmarked Fluentd collection performance of logs and metrics so you can best determine the number of Fluentd replicas needed for your workload. You may use the following observation when sizing your Fluentd deployment.

Using Kubernetes version 1.13 and Sumo Logic Helm chart version 0.6.0-0.8.0 each Fluentd replica could handle the following:

Log volume Metrics data points per minute (DPM)
1.3 MB/s 0 DPM
750 KB/s 20K DPM
250 KB/s 40K DPM
0 bytes 50K DPM

This benchmarking was performed on an AWS EC2 M4.Large instance with 2 vCPU and 8G RAM.

Benchmark Configuration

Fluentd log collection was tested with an internal log generator capable of a production load at variant rates.

Our metrics workload was generated with Avalanche, a load generator for producing Prometheus metrics. Avalanche was configured with the following parameters

--metric-count=200
--series-count=100
--port=9006
--series-interval=60000
--metric-interval=60000
--value-interval=60

Metrics generated by Avalanche were collected by an instance of Prometheus and then forwarded to a single replica of Fluentd, before being sent to a Sumo Logic HTTP Source endpoint.

Multiline Log Support

By default, we use a regex that matches the first line of multiline logs that start with dates in the following format: 2019-11-17 07:14:12.

If your logs have a different date format you can provide a custom regex to detect the first line of multiline logs. See collecting multiline logs for details on configuring a boundary regex.

New parsers can be defined under the parsers key of the fluent-bit configuration section in the values.yaml file as follows:

parsers:
  enabled: true
  regex:
    - name: multi_line
    regex: (?<log>^{"log":"\d{4}-\d{1,2}-\d{1,2} \d{2}:\d{2}:\d{2}.*)
    - name: new_parser_name
    ## This parser matches lines that start with time of the format : 07:14:12
    regex: (?<log>^{"log":"\d{2}:\d{2}:\d{2}.*)

The regex used for Parser_Firstline needs to have at least one named capture group.

To use the newly defined parser to detect the first line of multiline logs, change the Parser_Firstline parameter in the Input plugin configuration of fluent-bit:

Parser_Firstline new_parser_name

You can also use the optional-extra parser to interpret and structure multiline entries. When Multiline is On, if the first line matches Parser_Firstline, the rest of the lines will be matched against Parser_N.

Parser_Firstline multi_line
Parser_1 optional_parser

Fluentd autoscaling

We have provided an option to enable autoscaling for Fluentd deployments. This is disabled by default.

To enable autoscaling for Fluentd:

  • Enable metrics-server dependency Note: If metrics-server is already installed, this step is not required.

## Configure metrics-server
## ref: https://github.com/helm/charts/blob/...er/values.yaml
metrics-server:
  enabled: true

  • Enable autoscaling for Fluentd

fluentd:
  ## Option to turn autoscaling on for fluentd and specify metrics for HPA.
  autoscaling:
    enabled: true

Fluentd File-based buffer

By default, we use the in-memory buffer for the Fluentd buffer, however for production environments we recommend you use the file-based buffer instead.

The buffer configuration can be set in the values.yaml file under the fluentd key as follows:

fluentd:
  ## Option to specify the Fluentd buffer as file/memory.
  buffer: "file"

We have defined several file paths where the buffer chunks are stored.

Ensure that you have enough space in the path directory. Running out of disk space is a common problem.

Once the config has been modified in the values.yaml file you need to run the helm upgrade command to apply the changes.

$ helm upgrade collection sumologic/sumologic --reuse-values -f values.yaml

See the following links to official Fluentd buffer documentation: