Skip to main content
Sumo Logic

Set up traces collection for Kubernetes environments

After installing or upgrading your Sumo Logic Kubernetes Collection, you will be able to send your traces directly to its endpoint using OpenTelemetry (as well as older formats like Jaeger or Zipkin).

Traces will be enhanced with Kubernetes metadata, similarly to the logs and metrics collected by the collector. See below for installation instructions.

Prerequisites:

  • Kubernetes 1.19+
  • Helm 3.5+

Installation process for Sumo Logic Tracing on Kubernetes

Installation is almost the same as for the official SumoLogic Kubernetes Collection, except a tracing flag (sumologic.traces.enabled=true) needs to be enabled. The process follows using a Helm chart to set all required components. It will automatically download and configure OpenTelemetry Collector, which will collect, process, and export telemetry data to Sumo Logic.

In the following installation steps, we use the release name collection and the namespace name sumologic. You can use any names you want, however, you'll need to adjust your installation commands to use your names since these names impact the OpenTelemetry Collector endpoint name.

Collection architecture

Tracing data from your services is sent through multiple local OpenTelemetry Collectors/Agents, deployed as a DaemonSet on each Node, which buffers and sends data to a OpenTelemetry Collector gateway. Finally, the data is sent to the OpenTelemetry Collector helping to shape and trim the traffic, both running as a Deployment.

k8s-setup-help.drawio (1).png

Setting up the most recent Sumo Logic Kubernetes Collection 

Refer to install/upgrade instructions for the current version. To enable tracing, sumologic.traces.enabled=true flag must be included.

Using command line

helm upgrade --install collection sumologic/sumologic \
  --namespace sumologic \
  --create-namespace \

  --set sumologic.accessId=<SUMO_ACCESS_ID> \
  --set sumologic.accessKey=<SUMO_ACCESS_KEY> \
  --set sumologic.clusterName="<MY_CLUSTER_NAME>" \
  --set sumologic.traces.enabled=true

We recommend you create a new values.yaml file for each Kubernetes cluster you wish to install collection on and setting only the properties you wish to override. For example:

sumologic:
  accessId: <SUMO_ACCESS_ID>
  accessKey: <SUMO_ACCESS_KEY>
  clusterName: <MY_CLUSTER_NAME>
  traces:
    enabled: true


Once you have the config customized you can use the following commands to install or upgrade:

helm upgrade --install collection sumologic/sumologic \
  --namespace sumologic \
  --create-namespace \
  -f values.yaml

Tracing is disabled by default. If you previously installed sumologic-kubernetes-collection 2.0 or higher without enabling tracing, it can be enabled with sumologic.traces.enabled=true.

Using command line

helm upgrade collection sumologic/sumologic \
  --namespace sumologic \
  ... \
  --set sumologic.traces.enabled=true

Using configuration file

The values.yaml file needs to have the relevant section enabled, such as:

sumologic:
  ...
  traces:
    enabled: true

After updating the configuration file, the changes can be applied with the following:

helm upgrade --install collection sumologic/sumologic \
  --namespace sumologic \
  -f values.yaml

Using OTLP HTTP is reccomended:

  • OTLP HTTP: <CHART_NAME>-sumologic-otelagent.<NAMESPACE>:4318

Alternatively, if requried, you can use other supported formats as well:

  • Jaeger GRPC: <CHART_NAME>-sumologic-otelagent.<NAMESPACE>:14250
  • Jaeger Thrift HTTP: <CHART_NAME>-sumologic-otelagent.<NAMESPACE>:14268
  • Jaeger Thrift Compact (UDP): <CHART_NAME>-sumologic-otelagent.<NAMESPACE>:6831
  • Zipkin: <CHART_NAME>-sumologic-otelagent.<NAMESPACE>:9411/api/v2/spans
  • OTLP gRPC: <CHART_NAME>-sumologic-otelagent.<NAMESPACE>:4317
  • OTLP HTTP/deprecated: <CHART_NAME>-sumologic-otelagent.<NAMESPACE>:55681

For example, when the default chart name (collection) and namespace (sumologic) is used, the endpoints are following:

  • OTLP HTTP: collection-sumologic-otelagent.sumologic:4318
  • Jaeger GRPC: collection-sumologic-otelagent.sumologic:14250
  • Jaeger Thrift HTTP: collection-sumologic-otelagent.sumologic:14268
  • Jaeger Thrift Compact (UDP): collection-sumologic-otelagent.sumologic:6831
  • Zipkin: collection-sumologic-otelagent.sumologic:9411/api/v2/spans
  • OTLP gRPC: collection-sumologic-otelagent.sumologic:4317
  • OTLP HTTP/deprecated: collection-sumologic-otelagent.sumologic:55681

Troubleshooting

Desired Kubernetes installation state

After enabling and installing tracing one should have additional Kubernetes resources:

  • otelcol - collector responsible for forwarding data to Sumo Receiver
    • Deployment: collection-sumologic-otelcol
    • Pod: collection-sumologic-otelcol-<hash>-<hash>
    • Replica Set: collection-sumologic-otelcol-<hash>
    • Service: collection-sumologic-otelcol
    • Service: collection-sumologic-otelcol-instr-metrics
    • Service: collection-sumologic-otelcol-headless
    • Config Map: collection-sumologic-otelcol
  • otelagent - collector responsible for data collection and tagging
    • Daemonset: collection-sumologic-otelagent
    • Pod on every node: collection-sumologic-otelagent-<hash>
    • Service: collection-sumologic-otelagent
    • Config Map: collection-sumologic-otelagent
  • otelgateway - collector responsible for traces load balancing
    • Deployment: collection-sumologic-otelgateway
    • Pod: collection-sumologic-otelgateway-<hash>-<hash>
    • Replica Set: collection-otelgateway-otelgat-<hash>
    • Service: collection-sumologic-otelgateway
    • Config Map: collection-sumologic-otelgateway

How to verify traces are installed and working?

  • There are no Kubernetes errors in the namespace sumologic.
  • There are running pods <CHART_NAME>-sumologic-otelcol-<hash>, <CHART_NAME>-sumologic-otelgateway-<hash>, <CHART_NAME>-sumologic-otelagent-<hash>
  • Kubernetes metadata tags (pod, replicaset, etc.) should be applied to all spans.
  • The OpenTelemetry Collector can export metrics, which include information such as the number of spans exported. To enable, apply the otelcol.metrics.enabled=true flag when installing or upgrading the Collector, for example:

    helm upgrade collection sumologic/sumologic \
      --namespace sumologic \
      ... \
      --set otelcol.metrics.enabled=true 


    After enabling, several metrics starting with otelcol_  will become available, such as otelcol_exporter_sent_spans and otelcol_receiver_accepted_spans.
     
  • OpenTelemetry Collector can have logging exporter enabled. This will put on the output contents of spans (with some sampling above a certain rate). To enable, apply the following flags when installing/upgrading the collector (appending logging to the list of exporters):

helm upgrade collection sumologic/sumologic \
  --namespace sumologic \
  ... \
  --set otelcol.config.exporters.logging.logLevel=debug \
  --set otelcol.config.service.pipelines.traces.exporters="{otlphttp,logging}" 

Having this enabled, kubectl logs -n sumologic collection-sumologic-otelcol-<ENTER ACTUAL POD ID> might yield the following output:

2020-03-09T10:47:28.861Z TraceData with 1 spans
Node service name: carpogonial
Node attributes:
2020-03-09T10:47:28.861Z Span #0
  Trace ID    : 00000000000000004abaf4a8688cee33
  ID          : 1aad0bc2b44e8219
  Parent ID   :
  Name        : Carpoidea
  Kind        : CLIENT
  Start time  : seconds:1583750845 nanos:799855000
  End time    : seconds:1583751016 nanos:332705000
  Span attributes:
        -> zipkin.remoteEndpoint.ipv6: 5ab8:31e6:a7b:6205:13cb:a3fe:c180:ca26
        -> ip: 10.1.1.1
        -> zipkin.remoteEndpoint.port: 49088
        -> zipkin.remoteEndpoint.serviceName: carpogonial
        -> ipv4: 36.110.13.238
        -> ipv6: 5ab8:31e6:a7b:6205:13cb:a3fe:c180:ca26
        -> port: 49088
        -> zipkin.remoteEndpoint.ipv4: 36.110.13.238

 

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-agent-conf
  labels:
    app: opentelemetry
    component: otel-agent-conf
data:
  otel-agent-config: |
    receivers:
      zipkin:
        endpoint: 0.0.0.0:9411
      otlp: 
        protocols:
          grpc:
            endpoint: 0.0.0.0:55680
          http:
            endpoint: 0.0.0.0:55681
      jaeger:
        protocols:
          grpc:
            endpoint: 0.0.0.0:14250
          thrift_compact:
            endpoint: 0.0.0.0:6831
          thrift_http:
            endpoint: 0.0.0.0:14268
    exporters:
      logging:
        loglevel: debug
      otlp:
        endpoint: "collection-sumologic-otelcol.sumologic:55680"
        insecure: true
    processors:
      batch:
      memory_limiter:
        # Same as --mem-ballast-size-mib CLI argument
        ballast_size_mib: 165
        # 80% of maximum memory up to 2G
        limit_mib: 400
        # 25% of limit up to 2G
        spike_limit_mib: 100
        check_interval: 5s
      queued_retry:
        num_workers: 4
        queue_size: 100
        retry_on_failure: true
      k8s_tagger:
        passthrough: true
    extensions:
      health_check: {}
      zpages: {}
    service:
      extensions: [health_check, zpages]
      pipelines:
        traces:
          receivers: [otlp, jaeger, zipkin]
          processors: [memory_limiter, k8s_tagger, batch, queued_retry]
          exporters: [otlp] 
          # Alternatively, to debug, replace the above with this:
          # exporters: [otlp, logging] 
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-agent
  labels:
    app: opentelemetry
    component: otel-agent
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-agent
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-agent
    spec:
      containers:
      - command:
          - "/otelcontribcol"
          - "--config=/conf/otel-agent-config.yaml"
          # Memory Ballast size should be max 1/3 to 1/2 of memory.
          - "--mem-ballast-size-mib=165"
        image: sumologic/opentelemetry-collector:0.12.0
        name: otel-agent
        resources:
          limits:
            cpu: 500m
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 6831 # Jaeger Thrift Compact
          protocol: UDP
        - containerPort: 8888 # Metrics
        - containerPort: 9411 # Default endpoint for Zipkin receiver.
        - containerPort: 14250 # Default endpoint for Jaeger gRPC receiver.
        - containerPort: 14268 # Default endpoint for Jaeger HTTP receiver.
        - containerPort: 55679 # ZPages endpoint.
        - containerPort: 55680 # Default OpenTelemetry gRPC receiver port.
        - containerPort: 55681 # Default OpenTelemetry HTTP receiver port.
        volumeMounts:
        - name: otel-agent-config-vol
          mountPath: /conf
        livenessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension default port.
        readinessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension default port.
      volumes:
        - configMap:
            name: otel-agent-conf
            items:
              - key: otel-agent-config
                path: otel-agent-config.yaml
          name: otel-agent-config-vol
---          
kind: Service
apiVersion: v1
metadata:
  name: otel-agent
spec:
  selector:
    app: opentelemetry
    component: otel-agent
  ports:
  - name: jaeger-thrift-compact
    port: 6831
    protocol: UDP
  - name: metrics # Default endpoint for querying metrics.
    port: 8888
  - name: zipkin # Default endpoint for Zipkin receiver.
    port: 9411
  - name: jaeger-grpc  # Default endpoint for Jaeger gRPC
    port: 14250
  - name: jaeger-thrift-http # Default endpoint for Jaeger HTTP receiver.
    port: 14268
  - name: zpages # Default endpoint for zpages
    port: 55679
  - name: otlp-grpc # Default endpoint for OTLP gRPC receiver.
    port: 55680
  - name: otlp-http # Default endpoint for OTLP HTTP receiver.
    port: 55681