Elasticsearch - OpenTelemetry Collector
The Elasticsearch app is a unified logs and metrics app that helps you monitor the availability, performance, health, and resource utilization of your Elasticsearch clusters. Preconfigured dashboards provide insight into cluster health, resource utilization, sharding, garbage collection, and search, index, and cache performance.
We use the OpenTelemetry collector to collect Elasticsearch metrics and logs.
The diagram below illustrates the components of the Elasticsearch collection for each database server. OpenTelemetry collector runs on the same host as Elasticsearch, and uses the Elasticsearch Receiver to obtain Elasticsearch metrics, and the Sumo Logic OpenTelemetry Exporter to send the metrics to Sumo Logic. Elasticsearch logs are sent to Sumo Logic through a filelog receiver.
This app includes built-in monitors. For details on creating custom monitors, refer to Create monitors for Elasticsearch app.
Fields Create in Sumo Logic for Elasticsearch
Following are the Fields which will be created as part of Elasticsearch app installation, if not already present:
db.cluster.name
. User configured. Enter a name to identify this Elasticsearch cluster. This cluster name will be shown in the Sumo Logic dashboards.db.system
. Has a fixed value of elasticsearch.sumo.datasource
. Has a fixed value of elasticsearch.db.node.name
. Has the value of host name of the machine which is being monitored.
Prerequisites
For metrics collection
This receiver supports Elasticsearch versions 7.9+.
If Elasticsearch security features are enabled, you must have either the monitor or manage cluster privilege. See the Elasticsearch docs for more information on authorization and Security privileges.
For logs collection
Elasticsearch supports logging via local text log files. Elasticsearch logs have four levels of verbosity. To select a level, set loglevel
to one of:
- debug. A lot of information, useful for development/testing.
- verbose. Includes information not often needed, but logs less than debug.
- notice (default value). Moderately verbose, ideal for production environments.
- warning. Only important/critical messages are logged.
All logging settings are located in Elasticsearch.conf. By default, Elasticsearch logs are stored in /var/log/elasticsearch/ELK-<Clustername>.log
. The default directory for log files is listed in the Elasticsearch.conf file.
For Linux systems with ACL Support, the otelcol install process should have created the ACL grants necessary for the otelcol system user to access default log locations. You can verify the active ACL grants using the getfacl
command. Install the ACL in your Linux environment, if not installed.
The required ACL may not be supported for some rare cases, for example, Linux OS Distro, which is officially not supported by Sumo Logic. In this case, you can run the following command to explicitly grant the permissions.
sudo setfacl -R -m d:u:otelcol-sumo:r-x,d:g:otelcol-sumo:r-x,u:otelcol-sumo:r-x,g:otelcol-sumo:r-x <PATH_TO_LOG_FILE>
Run the above command for all the log files in the directory that need to be ingested, which are not residing in the default location.
If Linux ACL Support is not available, traditional Unix-styled user and group permission must be modified. It should be sufficient to add the otelcol system user to the specific group that has access to the log files.
For Windows systems, log files which are collected should be accessible by the SYSTEM group. Use the following set of PowerShell commands if the SYSTEM group does not have access.
$NewAcl = Get-Acl -Path "<PATH_TO_LOG_FILE>"
# Set properties
$identity = "NT AUTHORITY\SYSTEM"
$fileSystemRights = "ReadAndExecute"
$type = "Allow"
# Create new rule
$fileSystemAccessRuleArgumentList = $identity, $fileSystemRights, $type
$fileSystemAccessRule = New-Object -TypeName System.Security.AccessControl.FileSystemAccessRule -ArgumentList $fileSystemAccessRuleArgumentList
# Apply new rule
$NewAcl.SetAccessRule($fileSystemAccessRule)
Set-Acl -Path "<PATH_TO_LOG_FILE>" -AclObject $NewAcl
Collection configuration and app installation
As part of data collection setup and app installation, you can select the App from App Catalog and click on Install App. Follow the steps below.
Step 1: Set up Collector
If you want to use an existing OpenTelemetry Collector, you can skip this step by selecting the Use an existing Collector option.
To create a new Collector:
- Select the Add a new Collector option.
- Select the platform where you want to install the Sumo Logic OpenTelemetry Collector.
This will generate a command that you can execute in the machine environment you need to monitor. Once executed, it will install the Sumo Logic OpenTelemetry Collector.
Step 2: Configure integration
In this step, you will configure the yaml required for Elasticsearch Collection.
Below are the inputs required:
- Endpoint. Enter the url of the server you need to monitor. Example:
http://localhost:9200
. - User Name. Enter the Elasticsearch Username.
- Elasticsearch cluster log path. By default, Elasticsearch logs are stored in
/var/log/elasticsearch/ELK-<Clustername>.log
. - Tags.
db.cluster.name
.
You can add any custom fields which you want to tag along with the data ingested in Sumo. Click on the Download YAML File button to get the yaml file.
For Linux platform, click on Download Environment Variables File button to get the file with the password which is supposed to be set as environment variable.
Step 3: Send logs and metrics to Sumo Logic
Once you have downloaded the YAML file as described in the previous step, follow the below steps based on your platform.
- Linux
- Windows
- macOS
- Chef
- Ansible
- Puppet
- Copy the yaml file to
/etc/otelcol-sumo/conf.d/
folder in the Elasticsearch instance which needs to be monitored. - Place Env file in the following directory:
/etc/otelcol-sumo/env/
- Testart the collector using:
sudo systemctl restart otelcol-sumo
- Copy the yaml file to
C:\ProgramData\Sumo Logic\OpenTelemetry Collector\config\conf.d
folder in the machine which needs to be monitored. - Restart the collector using:
Restart-Service -Name OtelcolSumo
- Copy the yaml file to
/etc/otelcol-sumo/conf.d/
folder in the Elasticsearch instance which needs to be monitored. - Restart the otelcol-sumo process using:
otelcol-sumo --config /etc/otelcol-sumo/sumologic.yaml --config "glob:/etc/otelcol-sumo/conf.d/*.yaml"
- Copy the yaml file into your Chef cookbook files directory
files/<downloaded_yaml_file>
.` - Use a Chef file resource in a recipe to manage it.
cookbook_file '/etc/otelcol-sumo/conf.d/<downloaded_yaml_file>' do
mode 0644
notifies :restart, 'service[otelcol-sumo]', :delayed
end - Use a Chef file resource in a recipe to manage it.
cookbook_file '/etc/otelcol-sumo/env/<downloaded_env_file>' do
mode 0600
notifies :restart, 'service[otelcol-sumo]', :delayed
end - Add the recipe to your collector setup to start collecting. Every team typically has their established way of applying the Chef recipe. The resulting Chef recipe should look something like:
cookbook_file '/etc/otelcol-sumo/conf.d/<downloaded_yaml_file>' do
mode 0644
notifies :restart, 'service[otelcol-sumo]', :delayed
end
cookbook_file '/etc/otelcol-sumo/env/<downloaded_env_file>' do
mode 0600
notifies :restart, 'service[otelcol-sumo]', :delayed
end
- Place the file into your Ansible playbook files directory.
- Run the Ansible playbook using:
ansible-playbook -i inventory install_sumologic_otel_collector.yaml
-e '{"installation_token": "<YOUR_TOKEN>", "collector_tags": {<YOUR_TAGS>}, "src_config_path": "files/conf.d", "src_env_path": "files/env"}'
- Place the file into your Puppet module files directory
modules/install_otel_collector/files/<downloaded_yaml>
. - Use a Puppet file resource to manage it
file { '/etc/otelcol-sumo/conf.d/<downloaded_yaml_file>':
ensure => present,
source => 'puppet:///modules/install_otel_collector/<downloaded_yaml_file>',
mode => '0644',
notify => Service[otelcol-sumo],
} - Use a Puppet file resource to manage it
file { '/etc/otelcol-sumo/env/<downloaded_env_file>'
ensure => present,
source => 'puppet:///modules/install_otel_collector/<downloaded_env_file>',
mode => '0600',
notify => Service[otelcol-sumo],
} - Apply the Puppet manifest. Every team typically has their established way of applying the Puppet manifes. The resulting Puppet manifest should look something like:
node 'default' {
class { 'install_otel_collector'
installation_token => '<YOUR_TOKEN>',
collector_tags => { <YOUR_TAGS> },
}
service { 'otelcol-sumo':
provider => 'systemd',
ensure => running,
enable => true,
require => Class['install_otel_collector'],
}
file { '/etc/otelcol-sumo/conf.d/<downloaded_yaml_file>':
ensure => present,
source => 'puppet:///modules/install_otel_collector/<downloaded_yaml_file>',
mode => '0644',
notify => Service[otelcol-sumo],
}
file { '/etc/otelcol-sumo/env/<downloaded_env_file>':
ensure => present,
source => 'puppet:///modules/install_otel_collector/<downloaded_env_file>',
mode => '0600',
notify => Service[otelcol-sumo],
}
}
After successfully executing the above command, Sumo Logic will start receiving data from your host machine.
Click Next. This will install the app (dashboards and monitors) to your Sumo Logic Org.
Dashboard panels will start to fill automatically. It's important to note that each panel fills with data matching the time range query and received since the panel was created. Results won't immediately be available, but within 20 minutes, you'll see full graphs and maps.
Sample log messages
{
"type":"server",
"timestamp":"2021-07-12T11:42:25,862+07:00",
"level":"INFO",
"component":"o.e.x.s.a.s.FileRolesStore",
"cluster.name":"elasticsearch",
"node.name":"v103-157-218-134.3stech.vn",
"message":"parsed [0] roles from file [/etc/elasticsearch/roles.yml]"
}
Sample metrics
{
"queryId":"A",
"_source":"sumo_hosted_collector_otel_elasticsearch",
"state":"completed",
"thread_pool_name":"analyze",
"elasticsearch.node.name":"ip-172-31-86-95",
"elasticsearch.cluster.name":"elasticsearch",
"metric":"elasticsearch.node.thread_pool.tasks.finished",
"db.cluster.name":"elastic_otel_cluster",
"_collectorId":"000000000C5B7100",
"deployment.environment":"otel_elastic_dev",
"_sourceId":"0000000000000000",
"unit":"{tasks}",
"db.system":"elasticsearch",
"_sourceHost":"sumoOtelelasticsearch",
"_collector":"sumo_hosted_collector_otel_elasticsearch",
"max":0,
"min":0,
"avg":0,
"sum":0,
"latest":0,
"count":2
}
Sample queries
Sample logs query
This is a sample log query from the Errors panel.
db.system=elasticsearch %"deployment.environment"={{deployment.environment}} db.cluster.name={{db.cluster.name}} ERROR | json "log" as _rawlog nodrop
| if (isEmpty(_rawlog), _raw, _rawlog) as _raw
| json field=_raw "timestamp" as timestamp
| json field=_raw "level" as level
| json field=_raw "component" as es_component
| json field=_raw "message" as message
| where level = "ERROR"
| count
Sample metrics query
This is a sample metrics query from the JVM Memory Used (MB) panel.
deployment.environment=* metric=jvm.memory.heap.used db.cluster.name=* db.node.name=* | sum by db.cluster.name, db.node.name
Viewing Elasticsearch dashboards
All dashboards have a set of filters that you can apply to the entire dashboard. Use these filters to drill down and examine the data to a granular level.
- You can change the time range for a dashboard or panel by selecting a predefined interval from a drop-down list, choosing a recently used time range, or specifying custom dates and times. Learn more.
- You can use template variables to drill down and examine the data on a granular level. For more information, see Filtering Dashboards with Template Variables.
Overview
The Elasticsearch - Overview dashboard provides the health of Elasticsearch clusters, shards analysis, resource utilization of Elasticsearch host and clusters, search and indexing performance.
Total Operations Stats
The Elasticsearch - Total Operations stats dashboard provides information on the operations of the Elasticsearch system.
Thread Pool
The Elasticsearch - Thread Pool dashboard analyzes thread pools operations to manage memory consumption of nodes in the cluster.
Resource
The Elasticsearch - Resource dashboard monitors JVM Memory, Network, Disk, Network and CPU of Elasticsearch node.
Performance Stats
The Elasticsearch - Performance Stats dashboard performance statistics such as latency and Translog operations and size.
Indices
The Elasticsearch - Indices dashboard monitors Index operations, size and latency. It also provides analytics on doc values, fields, fixed bitsets, and terms memory.
Documents
The Elasticsearch - Documents dashboard provides analytics and monitoring on Elasticsearch documents.
Caches
The Elasticsearch - Caches dashboard allows you to monitor query cache size, evictions and field data memory size.
Errors And Warnings
The Elasticsearch - Errors And Warnings dashboard shows errors and warnings by Elasticsearch components.
Garbage Collection
The Elasticsearch - Garbage Collector dashboard provides information on the garbage collection of the Java Virtual Machine.
Login And Connections
The Elasticsearch - Login And Connections dashboard shows geo location of client connection requests, failed connection logins and count of failed login attempts.
Operations
The Elasticsearch - Operations dashboard allows you to monitor server stats and events such as node up/down, index creation/deletion. It also provides disk usage and cluster health status.
Queries
The Elasticsearch - Queries dashboard shows Elasticsearch provides analytics on slow queries, and query shards.
Create monitors for Elasticsearch app
From your App Catalog:
- From the Sumo Logic navigation, select App Catalog.
- In the Search Apps field, search for and then select your app.
- Make sure the app is installed.
- Navigate to What's Included tab and scroll down to the Monitors section.
- Click Create next to the pre-configured monitors. In the create monitors window, adjust the trigger conditions and notifications settings based on your requirements.
- Scroll down to Monitor Details.
- Under Location click on New Folder.
note
By default, monitor will be saved in the root folder. So to make the maintenance easier, create a new folder in the location of your choice.
- Enter Folder Name. Folder Description is optional.
tip
Using app version in the folder name will be helpful to determine the versioning for future updates.
- Click Create. Once the folder is created, click on Save.
Elasticsearch alerts
Alert Name | Alert Description and conditions | Alert Condition | Recover Condition |
---|---|---|---|
Elasticsearch - Cluster Red Alert | Elasticsearch Cluster red health status. | Count > = 1 | Count < 1 |
Elasticsearch - Cluster Yellow Alert | Elasticsearch Cluster yellow health status. | Count > 1 | Count < = 1 |
Elasticsearch - Disk Out of Space Alert | This alerts gets triggered when disk usage is over 90%. | Count > 90 | Count < = 90 |
Elasticsearch - Error Log Too Many Alert | This alert gets triggered when error logs exceeds threshold. | Count > = 1000 | Count < 1000 |
Elasticsearch - Healthy Data Nodes Alert | This alert gets triggered when missing data node in Elasticsearch cluster. | Count < = 1 | Count > 1 |
Elasticsearch - Heap Usage Too High Alert | This alert gets triggered when heap usage is over 90%. | Count > 90 | Count < = 90 |
Elasticsearch - Initializing Shards Too Long Alert | This alerts gets triggered when shard initialization takes more than 5 min. | Count > = 5 | Count < 5 |
Elasticsearch - Pending Tasks Alert | This alert gets triggered when Elasticsearch has pending tasks. | Count > = 5 | Count < 5 |
Elasticsearch - Query Time Slow Alert | This alert gets triggered when slow query time greater than 5 ms. | Count >= 1 | Count < 1 |
Elasticsearch - Query Time Too Slow Alert | This alert gets triggered when Slow Query Too High (10 ms). | Count > = 1 | Count < 1 |
Elasticsearch - Relocating Shards Too Long Alert | This alert gets triggered when shards relocation take more than 5 min. | Count > = 5 | Count < 5 |
Elasticsearch - Too Many Slow Query Alert | This alert gets triggered when too many slow queries are found in 5 minutes window. | Count > = 10 | Count < 10 |
Elasticsearch - Unassigned Shards Alert | This alert gets triggered when Elasticsearch has unassigned shards. | Count > 5 | Count < = 5 |