Skip to main content
Sumo Logic

About Installed Collectors

Overview

A Sumo installed collector is a Java agent that receives logs and metrics from its sources and then encrypts, compresses, and sends the data to the Sumo service. As its name implies, an installed collector is installed in your environment, as opposed to a hosted collector, which resides on the Sumo service. After installing a collector, you add sources, to which the collector connects to obtain data to send to the Sumo service.

A Sumo source is an object, configured for a specific collector, that scans a particular target periodically and sends newly available data to the collector. There are a number of source types in Sumo that work with installed collectors. Examples include:

  • File sources—Local and remote file sources collect logs from selected directories on the collector host, or a remote one. 
  • Windows event log sources—Local and remote Windows event log sources collect Windows events from the collector host, or a remote one. 
  • Windows performance monitor log sources—Local and remote Windows performance monitor log sources collect Windows performance data from the collector host, or a remote one.
  • Docker sources—Docker sources collect docker container logs, events, and stats from Docker.
  • Host metrics sources—Available for Linux, MacOS, and Windows, host metric sources collect CPU, memory, and other OS metrics.

For a list of all sources supported by installed collectors, see Sources for Installed Collectors.

Deployment options

You can install collectors and configure sources on any mix of MacOs, Windows, and Linux hosts in your environment. When deciding where to install collectors, consider your network topology, available bandwidth, and domains or user groups.

Single installed collector

An installed collector can be installed on any standard server that you use for log aggregation or other network services. For example, you might decide to centralize collection with just one collector installed on a dedicated machine, especially if all of your data can be accessed from a single network location.

Multiple installed Collectors

If you have a distributed network topology, you can install multiple collectors on multiple machines and set up any combination of sources to collect from your infrastructure.

 

Cloud or data center deployment

Installed Collectors can be deployed across a cloud or data center configuration—Collectors on each machine report to Sumo Logic independently, sending distinct log data so that you can query against any virtual machine or server in your deployment.

Installed collector message volume limitations 

In planning your Sumo deployment, keep in mind the message volume an installed collector can handle. The following tables list recommended message per second limits on hosts with MaxHeap of 128 MB. 

MacOS and Linux (64-bit) Collectors

Source type Message per second limit
Local file source 5,000
Remote file source 3,100
Syslog source 5,000

Windows 2008 (64-bit) Collectors

Source type Message per second limit
Local file source 4,250
Remote file source 250

About collector and source installation and configuration

This section is an overview of the multiple methods Sumo provides for installing and configuring collectors and sources.

Collector installation and configuration

Sumo provides multiple methods for installing a collector:

  • UI installers—You provide configuration settings during the installation dialog. The installer writes these settings to user.properties in the collector’s /config directory. 
  • Command-line installer—You supply configuration settings on the command line, or using a varfile. the installer writes these settings to user.properties in the collector’s /config directory.   
  • RPM—For Linux. You supply configuration settings in a  user.properties file that you create.
  • Binary package—For Linux. The binary package can also be used on MacOS.

For details on collector installation, see Install a Collector on Linux, Install a Collector on MacOS, and Install a Collector on Windows.  

After a collector is up and running, you can change some installed collector configuration settings by editing user.properties and restarting the collector. For more information, see user.properties parameters.

A few installed collector behaviors, such as caching, are configured in the collector.properties file in the collector’s config directory.  

You can update the configuration of an installed collector using the Collector Management API. For more information, see Collector API Methods and Examples.

Source configuration

You can set up as many sources as needed for a given collector. A source should be configured to collect similar data types. For example, you might set up three local file sources to collect router activity logs from three locations, and another local source to collect logs from a web application.

Each source is tagged with its own metadata, as described in Metadata Naming Conventions. The more sources you set up, the easier it is to isolate one of the sources in a search since each source can be identified by its metadata.

When you configure sources that read from log files, you specify a path expression that defines what files to scan. You can optionally configure a blacklist of files to exclude from collection.  

You can create sources using the Sumo web app at any time after collector installation. For source-specific instructions, see the topics below Sources for Installed Collectors.

Alternatively, you can define sources for an installed collector in a JSON file, in which case you must provide the file when starting the collector for the first time. For more information, see Use JSON to Configure Sources. Note that if you provide the sources configuration in a JSON file, you can no longer manage the sources through the Sumo web app or the Collector Management API.

Installed collectors and sources in action

This section is an overview of how installed collectors and their sources operate.

Installed collector startup

When you start up an installed collector for the first time, it registers with Sumo, and creates any sources that you have defined in a JSON source configuration file.

At startup, a collector tries to  connect to the service in US1. If your organization is in another deployment, for example, US2, Dub, or Syd, Sumo will redirect the collector to the collector service URL for that deployment.

Sources scan source data

Sources scan their target directory or data structure periodically. A file source scans target directories every two seconds. For Windows performance monitor sources, you configure the scan interval when you define the source. 

How an installed collector sends data to the Sumo service

An installed collector starts sending data to the Sumo service as soon as it is available from the sources configured on the collector. Before sending the data, a collector compresses (by a factor of 10x) and encrypts the data. A collector sends data to the Sumo service over HTTPS.

To keep track of what it has already send to the Sumo service, the collector tracks a file by its fingerprint (the first 2048 bytes of the file) and by a read pointer that indicates the last line read by the collector. The collector updates this information approximately every second. A file's fingerprint is retained for some period of time after file deletion, otherwise it is retained. 

Throttling, caching, and flushing

Ordinarily, a collector sends data to the Sumo service as fast as its connection allows. Under some circumstances, the Sumo service may instruct a collector to throttle itself, or slow the rate at which it is sending data to the service. 

To determine whether throttling is required, Sumo measures the amount of data already committed to uploading against the number of previous requests and available resources (quota) in an account. In other words, Sumo Logic compares the current ingestion with the rate of ingest using a per minute rate that can be derived from the contracted daily GB/day rate.

The Sumo service tells the collector it can speed up when throttling is no longer necessary.

For more information, see Manage Ingestion

Caching

Installed collectors cache outbound data when throttled or paused or if the connection to the Sumo service is lost. Data is cached first in memory and then on disk. By default, a collector supports caching the following amounts:

Up to 4GB total disk space, including:

  • Up to 3GB for log data
  • Up to 1GB for metric data

You can raise or lower the disk limits for collector caching. For more information, see Configure Limits for Collector Caching.

Flushing mode

Unlike the fixed size cache, which evicts old data to make room for new data, flushing mode stops collection of new data and focuses only on sending existing data (flushing the cache).

A collector enters flushing mode when less than 10% of free disk space remains on the disk where the Collector is installed. For more information, see Flushing Mode.

Collector monitoring and logging

An installed collector sends a heartbeat to the Sumo service every 15 seconds. If the Sumo service does not receive a heartbeat for 30 minutes, it considers the collector to be offline, and shows its status as red in the Collection page of the Sumo web app.

The collector uses the log4j framework. You can tailor log rotation behavior for collector.log by editing the log4j.xml file in the collector’s /config directory. For more information, see Manage Log Rotation Settings for collector.log.