Skip to main content
Sumo Logic

Install the Kubernetes App and View the Dashboards

Installation instructions and dashboard descriptions for the Sumo Logic App for Kubernetes.

This page shows you how to install the Kubernetes App, and provides examples and descriptions of the app predefined dashboards.

Installation overview

Now that you have set up collection for Kubernetes, you can install the Sumo Logic App for Kubernetes by performing the following tasks:

Step 1: Determine custom data filters for source categories

When you install the app, you supply custom data filters that match the source categories that the Fluentd plugin generated for your Kubernetes logs and metrics. The plugin generates the source categories dynamically, and they can vary by environment. For this reason, you need to run a query in Sumo to determine what source categories the plugin created, and form your custom data filters accordingly.

Run the following query in a log search tab in the Sumo web app:

_collector="<Collector Name>"
| count by _sourceCategory

where <Collector Name> is the name of the hosted collector that you configured when you performed the Collect Logs for Kubernetes procedure.

You should see results similar to:

The table below lists the sources created by the plugin in the left column.

  • For the first four sources, the Custom Data Filter column shows the filter that would match the source categories for a source, given the query results shown above. Examine the results of the query you ran above to determine whether the filters show below match the source categories created in your environment. When you supply the values in the following step, tailor the filters as necessary.  

  • For the last two sources, the Source Category contains the source category you should configure for the sources in the following step.

Source Custom Data Filter Source Category
Kube Scheduler Log Source _sourceCategory=*kube/scheduler*  
Kube API Server Log Source _sourceCategory=*kube/apiserver*  
Kube Control Manager Log Source _sourceCategory=*kube/controller/manager*  
Kube-System Namespace Log Source _sourceCategory=*kube/system/*  
Kubernetes API Log Source   k8s/api
Kubernetes Metrics Source   kubernetes/metrics

Metrics exports

Heapster exports the following metrics to its backend. All custom (aka application) metrics are prefixed with 'custom/'.

Metric Name Description
cpu/limit CPU hard limit in millicores.
cpu/node_capacity CPU capacity of a node.
cpu/node_allocatable CPU allocatable of a node.
cpu/node_reservation Share of CPU that is reserved on the node allocatable.
cpu/node_utilization CPU utilization as a share of node allocatable.
cpu/request CPU request (the guaranteed amount of resources) in millicores.
cpu/usage Cumulative amount of consumed CPU time on all cores in nanoseconds.
cpu/usage_rate CPU usage on all cores in millicores.
cpu/load CPU load in milliloads, i.e., runnable threads * 1000
ephemeral_storage/limit Local ephemeral storage hard limit in bytes.
ephemeral_storage/request Local ephemeral storage request (the guaranteed amount of resources) in bytes.
ephemeral_storage/usage Total local ephemeral storage usage.
ephemeral_storage/node_capacity Local ephemeral storage capacity of a node.
ephemeral_storage/node_allocatable Local ephemeral storage allocatable of a node.
ephemeral_storage/node_reservation Share of local ephemeral storage that is reserved on the node allocatable.
ephemeral_storage/node_utilization Local ephemeral utilization as a share of ephemeral storage allocatable.
filesystem/usage Total number of bytes consumed on a filesystem.
filesystem/limit The total size of filesystem in bytes.
filesystem/available The number of available bytes remaining in a the filesystem
filesystem/inodes The number of available inodes in a the filesystem
filesystem/inodes_free The number of free inodes remaining in a the filesystem
disk/io_read_bytes Number of bytes read from a disk partition
disk/io_write_bytes Number of bytes written to a disk partition
disk/io_read_bytes_rate Number of bytes read from a disk partition per second
disk/io_write_bytes_rate Number of bytes written to a disk partition per second
memory/limit Memory hard limit in bytes.
memory/major_page_faults Number of major page faults.
memory/major_page_faults_rate Number of major page faults per second.
memory/node_capacity Memory capacity of a node.
memory/node_allocatable Memory allocatable of a node.
memory/node_reservation Share of memory that is reserved on the node allocatable.
memory/node_utilization Memory utilization as a share of memory allocatable.
memory/page_faults Number of page faults.
memory/page_faults_rate Number of page faults per second.
memory/request Memory request (the guaranteed amount of resources) in bytes.
memory/usage Total memory usage.
memory/cache Cache memory usage.
memory/rss RSS memory usage.
memory/working_set Total working set usage. Working set is the memory being used and not easily dropped by the kernel.
accelerator/memory_total Memory capacity of an accelerator.
accelerator/memory_used Memory used of an accelerator.
accelerator/duty_cycle Duty cycle of an accelerator.
accelerator/request Number of accelerator devices requested by container.
network/rx Cumulative number of bytes received over the network.
network/rx_errors Cumulative number of errors while receiving over the network.
network/rx_errors_rate Number of errors while receiving over the network per second.
network/rx_rate Number of bytes received over the network per second.
network/tx Cumulative number of bytes sent over the network
network/tx_errors Cumulative number of errors while sending over the network
network/tx_errors_rate Number of errors while sending over the network
network/tx_rate Number of bytes sent over the network per second.
uptime Number of milliseconds since the container was started.

Step 2: Install the app

Locate and install the app you need from the App Catalog. If you want to see a preview of the dashboards included with the app before installing, click Preview Dashboards.

To install the app, do the following:

  1. From the App Catalog, search for and select the app.

  2. Click Add to Library and complete the following fields:

  • App Name. You can retain the existing name, or enter a name of your choice for the app.

  • Data Source. For each the sources listed, enter a Custom Data Filter or Source Category, as described below
  1. For “Kube Scheduler Log Source”, select Enter a Custom Data Filter and enter a filter, either _sourceCategory=*kube/scheduler* or one that matches the source categories in your environment.

  2. For “Kube API Server Log Source” select Enter a Custom Data Filter and enter a filter, either _sourceCategory=*kube/apiserver* or one that matches the source categories in your environment. 

  3. For “Kube Control Manager Log Source ”, select Enter a Custom Data Filter and enter a filter, either _sourceCategory=*kube/controller/manager* or one that matches the source categories in your environment. 

  4. For “Kube-System Namespace Log Source ”, select Enter a Custom Data Filter and enter a filter, either _sourceCategory=*kube/system/* or one that matches the source categories in your environment.

  5. For “Kubernetes API Log Source ”, leave Source Category selected, and enter the following source category:
    k8s/api or one that matches the source categories in your environment.

  6. For “Kubernetes Metrics Source ” log source, leave Source Category selected, and enter the following source category:
    kubernetes/metrics

  • Advanced. Select the Location in Library (the default is the Personal folder in the library), or click New Folder to add a new folder.
  1. Click Add to Library.

Step 3: Identify the logged data

This section demonstrates how to create a query that identifies your logged data and populates the Kubernetes App pre-configured dashboards.

To identify the data that's been logged, do the following:

  1. On the Sumo Logic Home page, click Log Search.

Kubernetes_identify-logs1.png

  1. Enter one of the following queries in the search field.
  • For default Out-Of-The-Box settings for source category (lists all the source categories) use this query:
_sourceCategory=kubernetes* | count by _sourceCategory, _collector, _source | sort by _count
  • For customized settings, use this query: 
_source=<http_source_name> _collector=<hosted_collector_name> | count by _sourceCategory, _collector, _source | sort by _count

This query query requires the (hosted) Collector name and  HTTP Source name used to provide the HTTP endpoint in the Fluentd configuration. For more information, see this Sumo Knowledge Base article.

The main environment variables are as follows:

  • `SOURCE_CATEGORY_PREFIX` - appended to the beginning defaults to `kubernetes/`
  • `SOURCE_CATEGORY` - defines what it will be using the curly brace syntax. Default is `"% {namespace}/%{pod_name}"`
  • `SOURCE_CATEGORY_REPLACE_DASH` - if the SOURCE_CATEGORY value contains `-` and is replaced with `/` by default
  1. Select a time interval (default, Last 15 Minutes), and click Start.

Kubernetes_identify_logs2.png

Kubernetes App Dashboards

This section provides examples and explanations for each of the Kubernetes App dashboards.

Kubernetes - Overview

kubernetes-overview.png

Masters. The number of masters.

Nodes. The number of nodes, of any status.

Nodes Ready. The number of nodes whose status is ready.

Pods Scheduled. The number of pods scheduled.

Nodes Not Ready. The number of nodes whose status is not  ready.

Scheduled By Node. The number of pods scheduled by node.

Pods Scheduled By Namespace. The number of pods scheduled by namespace

Container by State. Lists containers and their state.

Pods by Phase. Lists Pods and their status, which can be Running, Pending, Succeeded, or Failed

Pods Scheduled By Namespace. Shows Pods scheduled, and the associated namespace.

Pods Scheduled By Node. Shows Pods scheduled, and the associated Node.

Errors. Shows the count of scheduling errors.

Scheduling Details. Shows Pods scheduling details such as Time, Namespace, Name, Reason, and Type.

Kubernetes - Pods Level Metrics

kubernetes-pod-level-metrics.png

CPU Usage by Pods and Namespace. Shows CPU usage by Pod and Namspace.

Top 10 Pods by Memory Usage. Shows memory usage for the 10 Pods that have used the most memory.

CPU Limit by Pods and Namespace. See the top 10 namespace/pod combinations with the highest CPU Limit.
Top 10 Pods by Memory Limit. See the 10 Pods with the highest Memory Limit.

CPU Request by Pods and Namespace. See the top 10 namespace/pod combinations with the highest configured CPU Request.

Top 10 Pods by Memory Request. See the 10 Pods with the highest Memory Limit.

Top 10 Pods by File System Usage. Shows file system usage for the 10 Pods that have used the most space.

Top 10 Pods by File System Limit. Shows the 10 Pods with the highest File System Limit.

Network (TX) Usage by Pod and Namespace. Shows Transmit network traffic by Pod and Namespace.

Network (RX) Usage by Pod and Namespace. Shows Receive network traffic by Pod and Namespace.

Kubernetes - Namespace Level Metrics

kubernetes-namespace-level-metrics.png

File System Usage. Shows file system usage by Namespace.

Memory Usage. Shows memory usage by Namespace.

CPU Usage. Shows CPU usage by Namespace.

Network (Tx, Rx) Usage. Shows network traffic, both Transmit and Receive, by Namespace.

Kubernetes - Cluster Level Metrics

kubernetes-cluster-level-metrics.png

Overall Cluster CPU Usage. See an area chart that shows CPU metrics for your cluster: avg(limit), avg(request), and avg(usage).

Overall Cluster Memory Usage.  See an area chart that shows memory metrics for your cluster: avg(limit), avg(request), and avg(usage).

Overall Cluster Network Usage. See an area chart that shows network usage metrics for your cluster: sum(rx), sum(rx_errors), sum(tx), and sum(tx_errors).

Overall Cluster File System Usage. See an area chart that shows file system usage metrics for your cluster: sum(limit) and sum(usage).

Kubernetes - API Server

kubernetes-api-server.png

Status Code Trend this Period. Shows a breakdown of status codes returned for API calls.

Top 10 URLS with Problem Status Codes. For the 10 API URLs that returned the most non-2xx status codes, shows the error count and percentage, and the status code
Top URLS with Non-200. For the 10 API URLs that returned the most error codes, shows the error count.

Non-200 Total. Shows the total count of non-200 status codes.

Autoscaler Non-200 Status Outlier. See timelices where the count of non-200 status codes from Autoscaler exceeds the moving average by a statistically significant amount, three standard deviations.

Autoscaler URLs with Problem Status Codes.  See a donut chart that shows which  Autoscaler URLs returned non-200 status codes, and for each URL the count of problem status codes, and its percentage of total problem status codes.

Autoscaler Status Code Trend this Period. See the count of status codes by type over time.

Kubernetes - Kube System

kubernetes-kube-system.png

System Message Breakdown. See a donut chart that shows the breakdown of system messages by resource.

Message Breakdown by Container. See a donut chart that shows the breakdown of system messages by container.

Pod and Container Running in Kube System. See a table that shows lists Pods running in the kube-system namespace, and the container each runs in.

Error Messages. See a table that shows error messages, and the severity and resource associated with each.

Kubernetes - Controller Manager

kubernetes-controller-manager.png

Scale Up Operations. See a line chart that shows scale-up operations that have occured by replicat set, and the count of replicas added.

Top 10 Scale-Ups.  

Full Scale Downs. See a donut chart that shows replica sets that were scaled down to zero replicas, and how many times the scale-downs occurred.

Event Severity Trend. See a line chart that shows the count of messages by severity over time.

Event Severity by Resource. See a horizontal bar chart the shows the message count by severity and resource.

Error Message Counts. See the count of messages of severity level “E” by resource.

Event Severity by Resource. See a horizontal bar chart that shows messages by severity by resource.

Kubernetes - Node Overview

kubernetes-node-overview.png

Node Info. See a table of information about nodes in your cluster.

Node Capacity. See a table of information about the resource capacity of nodes in your cluster, including cpu_capacity, memory_capacity, and pods_capacity.

Node Allocatable Resources. See a table of information about allocatable resources for nodes in your cluster, including cpu_allocatable, memory_allocatable, and pods_allocatable.

Pods Scheduled. See a count of Pods scheduled in your cluster.

Scheduling Details. See a table of scheduling details for scheduled Pods.CPU Usage by Node. See CPU metrics for each node in your cluster, including avg(allocatable), avg(utilization), avg(capacity), and avg(reservation).

Memory by Node. See memory metrics for each node in your cluster, including avg(allocatable), avg(utilization), avg(capacity), and avg(reservation).

File System Usage by Node. See file system metrics for each node in your cluster, including avg(limit) and avg(usage).

Network Usage by Node. See file network usage metrics for each node in your cluster, including avg(rx), avg(rx_errors), avg(tx), and avg(tx_errors).

Kubernetes - Scheduler

kubernetes-scheduler.png

Message Severity Trend. See the count of messages of each severity level per timeslice, where severity can be “i”, “e”, or “w”.

Message Source Trend. See the count of messages of each severity level by source.

Sources by Warnings Trend. See the count of messages of severity level “w” by source.

Warnings.

Top 10 Warnings this Period. See a table that shows the 10 warning messages that have occurred most frequently, and the number of times each message occurred.

Scheduling Trend. See a line chart that shows the number of pods successful scheduled and the number of pods for which scheduling failed.

Successful Scheduling Outlier. Shows Outlier in successful scheduling of Pods.

Pods Failing Scheduling. Shows the trend lines of failed scheduled pods by Pod name.

Top Reasons for Scheduling Fail. See a table of the top 10 reasons that pod scheduling has failed, and a count of failures for each reason.