Skip to main content
Sumo Logic

What if I don't want to send all the tracing data to Sumo Logic?

In case you want to selectively filter tracing data before sending this to Sumo Logic, for example for scaling, privacy, or cost optimization purposes e.g. in large environments, instances of OpenTelemetry Collector on each of the nodes as an agent, can connect to OT collectors in aggregation layer that can do smart filtering of data before forwarding this to Sumo Logic.

env multiple agents bd.png

In this scenario, we are going to use a separate config for OpenTelemetry Collector/Agent and aggregating OpenTelemetry Collector. We will perform the installation as described in:

Prepare config file for a setup with separate agent and aggregation layers

config.yaml for aggregating OpenTelemetry Collector

Use the following as a template for config.yaml and apply the following changes:

  • ENDPOINT_URL needs to be replaced with the value retrieved in Step 1, point 5.

 

receivers:
  otlp:
    protocols:
      grpc: 
        endpoint: 0.0.0.0:55680
processors:
  memory_limiter:
    check_interval: 5s
    limit_mib: 1000
    spike_limit_mib: 500
  batch:
    send_batch_size: 256
    send_batch_max_size: 512
    timeout: 5s
  queued_retry:
    num_workers: 16
    queue_size: 5000
    retry_on_failure: true
extensions:
  health_check: {}
exporters:
  zipkin:
    endpoint: ENDPOINT_URL
  logging:
    loglevel: debug
service:
  extensions: [health_check]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, queued_retry]
      # To enable verbose debugging, add “,logging” to the list of exporters
      exporters: [zipkin]

 

config.yaml for OpenTelemetry Collector/Agent

Use the following as a template for config.yaml and apply the following changes:

  • COLLECTOR_HOSTNAME needs to be replaced with the hostname of the aggregating collector (or e.g. load balancer) that was set above.
  • Refer to the pipelines/processors section and comment out the pipelines that don’t match your environment.

receivers:
  zipkin:
    endpoint: 0.0.0.0:9411
  otlp:
    protocols:
      grpc: 
        endpoint: 0.0.0.0:55680
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268
processors:
  memory_limiter:
    check_interval: 5s
    limit_mib: 1000
    spike_limit_mib: 500
  batch:
    send_batch_size: 256
    send_batch_max_size: 512
    timeout: 5s
  resourcedetection/aws:
    detectors: [ec2]
    timeout: 5s
    override: false
  resource/aws:
    attributes:
    - action: upsert
      key: cloud.namespace
      value: ec2
  resourcedetection/gcp:
    detectors: [gcp]
    timeout: 5s
    override: false
  resource/gcp:
    attributes:
    - action: upsert
      key: cloud.namespace
      value: gce
  queued_retry:
    num_workers: 16
    queue_size: 5000
    retry_on_failure: true
extensions:
  health_check: {}
exporters:
  otlp:
    endpoint: "COLLECTOR_HOSTNAME:55680"
    insecure: true
  logging:
    loglevel: debug
service:
  extensions: [health_check]
  pipelines:
    traces:
      receivers: [jaeger, zipkin, otlp]
      # For AWS EC2 environment use this:
      processors: [memory_limiter, batch, resourcedetection/aws, resource/aws, queued_retry]
      # For GCP Compute Engine environment use this:
      processors: [memory_limiter, batch, resourcedetection/gcp, resource/gcp, queued_retry]
      # For other environments use this:
      processors: [memory_limiter, batch, queued_retry]
      # To enable verbose debugging, add “,logging” to the list of exporters
      exporters: [otlp]

Filtering data at the output of OT collector in aggregation mode

Sumo Logic’s OpenTelemetry collector has a unique capability of shaping the trace data at the output according to user-defined custom rules. You can define rules in cascading fashion, assigning different pool sizes to each rule, and giving them different priorities. All of that to ensure that you will always have valuable, useful, and cost-optimized data for analysis in the backend.

See configuration and examples here: https://github.com/SumoLogic/opentelemetry-collector/tree/master/processor/samplingprocessor/tailsamplingprocessor