Skip to main content
Sumo Logic

Collect Metrics for Host and Processes

We use the Telegraf agent for Host and Process metrics collection. Telegraf runs on the same system and uses the input plugins to obtain host and process metrics, and the Sumo Logic output plugin to send the metrics to Sumo Logic.

The following input plugins are used by Sumo Logic

For Linux:

net
Diskio
Disk
Netstat
Mem
CPU
System
Procstat
Process

For Windows:

net
Disk
Netstat
Mem
CPU
System
Procstat
Win_Perf_counters

Configuring Metrics Collection

This section provides instructions for configuring metrics collection for the Sumo Logic App for Host and Process metrics. Follow the below instructions to set up the metric collection for a given machine:

  1. Configure a Hosted Collector
  2. Configure an HTTP Logs and Metrics Source
  3. Install Telegraf
  4. Configure and start Telegraf

1. Configure a Hosted Collector

To create a new Sumo Logic hosted collector, perform the steps in the Configure a Hosted Collector section of the Sumo Logic documentation.

2. Configure an HTTP Logs and Metrics Source

Create a new HTTP Logs and Metrics Source in the hosted collector created above by following these instructions. Suggestions for setting your source category:

  1. For identifying a specific cluster or a group of hosts: <clustername>/metrics

  2. For identifying a group of hosts within a given deployment: <environment name>/<clustername>/metrics

Make a note of the HTTP Source URL and source category

3. Install Telegraf

Use the following steps to install Telegraf on each host machine.

4. Configure and start Telegraf

As part of collecting metrics data from Telegraf, we will use the input plugins (described earlier) to get data from Telegraf and the Sumo Logic output plugin to send data to Sumo Logic.

Create or modify telegraf.conf (in linux it’s located in /etc/telegraf/telegraf.d/ and on Windows, it’s located in C:\Program Files\InfluxData\Telegraf\). Copy and paste the inputs, outputs and processors section  from one of the below files

for Linux - linux_sample_telegraf.conf 

for Windows: windows_sample_telegraf.conf

Please enter values for the following parameters (marked with CHANGE_ME) in the downloaded file:

Here’s an explanation for additional values set by this Telegraf configuration that we request you to please do not modify these values as they will cause the Sumo Logic apps to not function correctly.

For other optional parameters refer to the respective plugin documentation for configuring the input plugins for Telegraf.

For all other parameters please see this doc for more properties that can be configured in the Telegraf agent globally.

Once you have finalized your telegraf.conf file, you can start or reload the telegraf service using instructions from the doc.

At this point, host and process metrics should start flowing into Sumo Logic.

Monitoring processes with certain pattern

exe: Selects the processes that have process names that match the string that you specify

pattern: Selects the processes that have command lines ( including parameters and options used with the command) matching the specified string using regular expression matching rules.

For Linux

On Linux servers, the strings that you specify in an exe or pattern section are evaluated as regular expressions. 

Example

# For filtering executable name containing nginx ( ie, pgrep <exe>)

[[inputs.procstat]]
    pid_tag=false
    exe = "nginx"

# For filtering command lines containing config (ie, pgrep -f <pattern>)

[[inputs.procstat]]
    pid_tag=false
    pattern = "config"

For Windows

On servers running Windows Server, these strings are evaluated as WMI queries. (Ex pattern: "%apache%"). For more information, see LIKE Operator.

Example

# For filtering executable name containing nginx

[[inputs.procstat]]
    pid_finder = "native"
    pid_tag=false
    exe = "%nginx%"

# For filtering command lines containing config

[[inputs.procstat]]
    pid_finder = "native"
    pid_tag=false
    pattern = "%config%"

For defining multiple patterns for multiple processes you can use the plugin multiple times

[[inputs.procstat]]
   pid_tag=false
   exe = "nginx"
   
   
[[inputs.procstat]]
   exe = "tomcat"
   pid_tag=false

 

Troubleshooting Section

  1. To identify the operating system  version and name

    1. For Windows machines, run the command in Powershell to get the OS Version

[System.Environment]::OSVersion.Version
(Get-WmiObject -class Win32_OperatingSystem).Caption

       2. For Linux run below command in terminal 

uname -a
lsb_release -a
  1. To enable debug logs, set  “debug = true” flag in telegraf.conf and run the command, it will output error in stdout

telegraf --config telegraf.conf --test

  1. If the telegraf conf changes are not reflecting make sure to restart Telegraf using the command

    1. Windows - ​​ ./telegraf.exe --service restart

    2. Linux - sudo service telegraf restart

  2. If certain metrics are not coming you may have to run the telegraf agent as root. Check the respective plugin documentation for more information.

Sample Queries

CPU Utilization by Host Panel in Host Metrics - CPU Dashboard

host.name=*  cpu=cpu-total  metric=(host_cpu_usage_user OR host_cpu_usage_system OR host_cpu_usage_iowait OR host_cpu_usage_steal OR host_cpu_usage_softirq OR host_cpu_usage_irq OR host_cpu_usage_nice) | sum by host.name 

CPU Usage Panel in  Process Metrics Details Dashboard

metric=procstat_cpu_usage host.name=*  process.executable.name=* | avg by host.name, process.executable.name | outlier