Skip to main content
Sumo Logic

Set up traces collection for Kubernetes environments

After installing/upgrading your K8s Sumo collector, you will be able to send your traces directly to its endpoint using Jaeger, Zipkin, and OpenCensus formats.

Traces will be enhanced with K8S metadata, similarly to the logs and metrics collected by the collector. See below for installation instructions.

Prerequisites:

  • Kubernetes 1.10+
  • Helm 2.12+

Installation steps for Sumo Logic Tracing on Kubernetes

Installation is almost the same as for the official SumoLogic Kubernetes Collection, except a tracing flag needs to be enabled and the tracing endpoint must be provided.

Tracing requires Sumo Logic Kubernetes collection 1.2.1 or higher. Refer to install/upgrade instructions for the current version.

Tracing is disabled by default. After you have successfully installed sumologic-kubernetes-collection v.1.2.1 or higher,  enable tracing by following reconfiguration command:

helm upgrade collection sumologic/sumologic \
  --namespace sumologic \
  --reuse-values \
--set sumologic.traces.enabled=true 

Desired Kubernetes installation state

After enabling and installing tracing one should have additional Kubernetes resources:

  • Deployment: collection-sumologic-otelcol
  • Pod: collection-sumologic-otelcol-<hash>-<hash>
  • Replica Set: collection-sumologic-otelcol-<hash>
  • Service: collection-sumologic-otelcol
  • Config Map: collection-sumologic-otelcol

Pointing tracing client to the collector

The spans can be sent either directly to the collector or via an intermediate agent.

The collector supports receiving spans in Zipkin, OTLP, and Jaeger formats. The following are the endpoints for each of them:

  • Jaeger GRPC: collection-sumologic-otelcol.sumologic:14250
  • Jaeger Thrift HTTP: collection-sumologic-otelcol.sumologic:14268
  • Jaeger Thrift Compact (UDP): collection-sumologic-otelcol.sumologic:6831
  • Zipkin: collection-sumologic-otelcol.sumologic:9411
  • OTLP: collection-sumologic-otelcol.sumologic:55680
Alternative: using agents in between clients and the collector

When Jaeger Agent or OpenTelemetry Collector in agent mode are used, they must be replaced with OpenTelemetry Collector (in agent mode) version provided by Sumo Logic to accurately assign source IP address, which is used for identifying the source pod when doing metadata tagging. Please refer to Appendix A for an example of DaemonSet Agent setup.

If OpenTelemetry Collector in agent mode is already used, it can be adjusted with the following changes:

  1. Switch to the image provided by Sumo Logic: 

    image: "sumologic/opentelemetry-collector:0.8.0.1"

  2. Replace otelcol command with otelcontribcol:

    - command:
    - "/otelcontribcol"

  3. Add k8s_tagger section to the processor with passthrough mode. This will allow the source IP address from the context to be retained.

    processors:
      k8s_tagger:
        passthrough: true

     

  4. Add OTLP exporter, which will send data to the central collector:

    exporters:
      otlp:
        endpoint: "collection-sumologic-otelcol.sumologic:55680"
        insecure: true

  5. Update the pipeline configuration so k8s_tagger and exporter are included. It is recommended to include the tagger just after sampling and before batching.

          traces/2:
              receivers: [...]
              processors: [memory_limiter, probabilistic_sampler, k8s_tagger, batch, queued_retry]
              exporters: [otlp]

How to verify traces are installed and working?

  • There are no Kubernetes errors in the namespace sumologic
  • There is a running pod collection-sumologic-otelcol-<hash>
  • There are no errors in collection-sumologic-otelcol-<hash> pods like the following:

2020-03-12 14:12:52 +0000 [warn]: #0 undefined method `request_uri' for #<URI::Generic >

If found, then the sumologic.traces.endpoint is not set.

  • There are no errors in collection-sumologic-otelcol-<hash> pods like the following:

{"level":"warn","ts":1590754311.890028,"caller":"queuedprocessor/queued_processor.go:191","msg":"Sender failed","component_kind":"processor","component_type":"queued_retry","component_name":"queued_retry","processor":"queued_retry","error":"Post \"ENDPOINT_URL\": unsupported protocol scheme \"\""}

If found, then the ENDPOINT_URL was not provided with the actual URL.

  • K8S metadata tags (pod, replicaset, etc.) should be applied to all spans. If there are no metadata tags and intermediate agent or collector is being used, make sure it has passthrough mode set (see above)
    If metadata tags are describing pod named “otel-collector-...” - then most probably there’s an intermediate pod acting as an agent or collector with no passthrough mode set.
  • OpenTelemetry Collector can have logging exporter enabled. This will put on the output contents of spans (with some sampling above a certain rate). To enable, apply the following flags when installing/upgrading the collector (appending logging to the list of exporters):

--set otelcol.config.exporters.logging.logLevel=debug \
--set otelcol.config.service.pipelines.traces.exporters="{zipkin,logging}" \


Having this enabled, kubectl logs -n sumologic opentelemetry-collector-abcde<ENTER ACTUAL POD ID> might yield the following output:

2020-03-09T10:47:28.861Z TraceData with 1 spans
Node service name: carpogonial
Node attributes:
2020-03-09T10:47:28.861Z Span #0
  Trace ID    : 00000000000000004abaf4a8688cee33
  ID          : 1aad0bc2b44e8219
  Parent ID   :
  Name        : Carpoidea
  Kind        : CLIENT
  Start time  : seconds:1583750845 nanos:799855000
  End time    : seconds:1583751016 nanos:332705000
  Span attributes:
        -> zipkin.remoteEndpoint.ipv6: 5ab8:31e6:a7b:6205:13cb:a3fe:c180:ca26
        -> ip: 10.1.1.1
        -> zipkin.remoteEndpoint.port: 49088
        -> zipkin.remoteEndpoint.serviceName: carpogonial
        -> ipv4: 36.110.13.238
        -> ipv6: 5ab8:31e6:a7b:6205:13cb:a3fe:c180:ca26
        -> port: 49088
        -> zipkin.remoteEndpoint.ipv4: 36.110.13.238

Appendix A: Example OpenTelemetry Collector Agent mode setup

In case you are using Jaeger or OpenTelemetry agents to receive traffic data via UDP from tracing clients, we recommend replacing these agents with the Sumo Logic version of OpenTelemetry collector in agent mode. This will ensure that k8s metadata tagging works well for tracing in addition to metrics and logs data.

The following setup creates a DaemonSet with OpenTelemetry in agent mode, which can be accessed using “otel-agent” hostname from the tracing client. This helps replace Jaeger or OpenTelemetry agents that currently do not support k8s tagging.

 

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-agent-conf
  labels:
    app: opentelemetry
    component: otel-agent-conf
data:
  otel-agent-config: |
    receivers:
      zipkin:
        endpoint: 0.0.0.0:9411
      otlp: 
        protocols:
          grpc:
            endpoint: 0.0.0.0:55680
      jaeger:
        protocols:
          grpc:
            endpoint: 0.0.0.0:14250
          thrift_compact:
            endpoint: 0.0.0.0:6831
          thrift_http:
            endpoint: 0.0.0.0:14268
    exporters:
      logging:
        loglevel: debug
      otlp:
        endpoint: "collection-sumologic-otelcol.sumologic:55680"
        insecure: true
    processors:
      batch:
      memory_limiter:
        # Same as --mem-ballast-size-mib CLI argument
        ballast_size_mib: 165
        # 80% of maximum memory up to 2G
        limit_mib: 400
        # 25% of limit up to 2G
        spike_limit_mib: 100
        check_interval: 5s
      queued_retry:
        num_workers: 4
        queue_size: 100
        retry_on_failure: true
      k8s_tagger:
        passthrough: true
    extensions:
      health_check: {}
      zpages: {}
    service:
      extensions: [health_check, zpages]
      pipelines:
        traces:
          receivers: [otlp, jaeger, zipkin]
          processors: [memory_limiter, k8s_tagger, batch, queued_retry]
          exporters: [otlp] 
          # Alternatively, to debug, replace the above with this:
          # exporters: [otlp, logging] 
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-agent
  labels:
    app: opentelemetry
    component: otel-agent
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-agent
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-agent
    spec:
      containers:
      - command:
          - "/otelcontribcol"
          - "--config=/conf/otel-agent-config.yaml"
          # Memory Ballast size should be max 1/3 to 1/2 of memory.
          - "--mem-ballast-size-mib=165"
        image: sumologic/opentelemetry-collector:0.8.0.1
        name: otel-agent
        resources:
          limits:
            cpu: 500m
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 6831 # Jaeger Thrift Compact
          protocol: UDP
        - containerPort: 8888 # Metrics
        - containerPort: 9411 # Default endpoint for Zipkin receiver.
        - containerPort: 14250 # Default endpoint for Jaeger gRPC receiver.
        - containerPort: 14268 # Default endpoint for Jaeger HTTP receiver.
        - containerPort: 55679 # ZPages endpoint.
        - containerPort: 55680 # Default OpenTelemetry receiver port.
        volumeMounts:
        - name: otel-agent-config-vol
          mountPath: /conf
        livenessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension default port.
        readinessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension default port.
      volumes:
        - configMap:
            name: otel-agent-conf
            items:
              - key: otel-agent-config
                path: otel-agent-config.yaml
          name: otel-agent-config-vol
---          
kind: Service
apiVersion: v1
metadata:
  name: otel-agent
spec:
  selector:
    app: opentelemetry
    component: otel-agent
  ports:
  - name: jaeger-thrift-compact
    port: 6831
    protocol: UDP
  - name: metrics # Default endpoint for querying metrics.
    port: 8888
  - name: zipkin # Default endpoint for Zipkin receiver.
    port: 9411
  - name: jaeger-grpc  # Default endpoint for Jaeger gRPC
    port: 14250
  - name: jaeger-thrift-http # Default endpoint for Jaeger HTTP receiver.
    port: 14268
  - name: zpages # Default endpoint for zpages
    port: 55679
  - name: otlp # Default endpoint for OTLP receiver.
    port: 55680