Kubernetes Collection v3.0.0 - How to Upgrade
Requirements​
helm3
kubectl
jq
docker
Set the following environment variables that our commands will make use of:
export NAMESPACE=...
export HELM_RELEASE_NAME=...
Migrating the configuration​
We've made some breaking changes to our configuration file format, but most of them will be handled automatically by our migration tool.
You can get your current configuration from the cluster by running:
helm get values --namespace "${NAMESPACE}" --output yaml "${HELM_RELEASE_NAME}" > user-values.yaml
Afterwards, you can download the migration tool and run it directly. Set the OS and ARCH variables to your operating system and architecture.
OS=linux ARCH=amd64; curl -L "https://github.com/SumoLogic/sumologic-kubernetes-tools/releases/download/v2.22.0/update-collection-v3-sumo-${OS}_${ARCH}" -o update-collection-v3
chmod +x update-collection-v3
./update-collection-v3 -in user-values.yaml -out new-values.yaml
You can also run it with Docker:
docker run \
--rm \
-v $(pwd):/values \
-i sumologic/kubernetes-tools:2.22.0 \
update-collection-v3 -in /values/user-values.yaml -out /values/new-values.yaml
You should have new-values.yaml
in your working directory which can be used for the upgrade. Pay attention to the migration script output - it may notify you of additional manual steps you need to carry out.
Before you run the upgrade command, review the manual steps below and carry out the ones relevant to your use case.
Metrics migration​
If you do not have metrics collection enabled, skip straight to the next major section.
The metrics migration involves one major manual step that everyone needs to do, which is upgrading kube-prometheus-stack.
Upgrade kube-prometheus-stack​
When?: If you have metrics enabled at all.
Carry out the following:
-
Upgrade Prometheus CRDs:
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.59.2/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagerconfigs.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.59.2/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.59.2/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.59.2/example/prometheus-operator-crd/monitoring.coreos.com_probes.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.59.2/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.59.2/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.59.2/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.59.2/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yamlotherwise you'll get the following error:
Error: UPGRADE FAILED: error validating "": error validating data: ValidationError(Prometheus.spec): unknown field "shards" in com.coreos.monitoring.v1.Prometheus.spec
-
Patch the
kube-state-metrics
Deployment with new labels:kubectl get deployment \
--namespace="${NAMESPACE}" \
--selector 'app.kubernetes.io/name=kube-state-metrics' \
-o json | \
jq ". | .items[].spec.selector.matchLabels[\"app.kubernetes.io/instance\"] |= \"${HELM_RELEASE_NAME}\"" | \
kubectl apply \
--namespace="${NAMESPACE}" \
--force \
--filename -otherwise you'll get an error:
Error: UPGRADE FAILED: cannot patch "collection-kube-state-metrics" with kind Deployment: Deployment.apps "collection-kube-state-metrics" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"collection", "app.kubernetes.io/name":"kube-state-metrics"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
-
Patch the
prometheus-node-exporter
Daemonset with new labels:
kubectl get daemonset \
--namespace "${NAMESPACE}" \
--selector "app=prometheus-node-exporter,release=${HELM_RELEASE_NAME}" \
-o json | \
jq ". | .items[].spec.selector.matchLabels[\"app.kubernetes.io/instance\"] |= \"${HELM_RELEASE_NAME}\"" | \
jq ". | .items[].spec.template.metadata.labels[\"app.kubernetes.io/instance\"] |= \"${HELM_RELEASE_NAME}\"" | \
jq ". | .items[].spec.selector.matchLabels[\"app.kubernetes.io/name\"] |= \"prometheus-node-exporter\"" | \
jq ". | .items[].spec.template.metadata.labels[\"app.kubernetes.io/name\"] |= \"prometheus-node-exporter\"" | \
jq '. | del(.items[].spec.selector.matchLabels["release"])' | \
jq '. | del(.items[].spec.template.metadata.labels["release"])' | \
jq '. | del(.items[].spec.selector.matchLabels["app"])' | \
jq '. | del(.items[].spec.template.metadata.labels["app"])' | \
kubectl apply \
--namespace="${NAMESPACE}" \
--force \
--filename -
otherwise you'll get an error:
Error: UPGRADE FAILED: cannot patch "collection-prometheus-node-exporter" with kind DaemonSet: DaemonSet.apps "collection-prometheus-node-exporter" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"collection", "app.kubernetes.io/name":"prometheus-node-exporter"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
- If you overrode any of the
repository
keys under thekube-prometheus-stack
key, follow thekube-prometheus-stack
migration doc on that.
Otelcol StatefulSet​
When?: If you're using otelcol
as the metrics metadata provider already.
Run the following command to manually delete otelcol StatefulSets:
kubectl delete sts --namespace=${NAMESPACE} --cascade=orphan -lapp=${HELM_RELEASE_NAME}-sumologic-otelcol-metrics
The reason this is necessary is that the Service name for this StatefulSet has changed, and Kubernetes forbids modification of this value on existing StatefulSets.
Additional Service Monitors​
When?: If you're using kube-prometheus-stack.prometheus.additionalServiceMonitors
.
If you're using kube-prometheus-stack.prometheus.additionalServiceMonitors
, you have to remove all Sumo Logic related service monitors from the list, because they are now covered by sumologic.metrics.serviceMonitors
configuration. This will make your configuration more clear.
Remote Write Migration​
When?: If you're using kube-prometheus-stack.prometheus.prometheusSpec.remoteWrite
.
If you're using kube-prometheus-stack.prometheus.prometheusSpec.remoteWrite
you should move all non-default configurations to
kube-prometheus-stack.prometheus.prometheusSpec.additionalRemoteWrite
and leave
kube-prometheus-stack.prometheus.prometheusSpec.remoteWrite
unchanged.
In addition please ensure that url
for all remoteWrite
configurations starts with
http://$(METADATA_METRICS_SVC).$(NAMESPACE).svc.cluster.local.:9888
instead of
http://$(FLUENTD_METRICS_SVC).$(NAMESPACE).svc.cluster.local.:9888
Custom metrics filtering and modification​
When?: If you added extra configuration to Fluentd metrics
If you're adding extra configuration to fluentd metrics, you will likely want to do analogous modifications in OpenTelemetry.
Logs migration​
If you do not have log collection enabled, skip straight to the next major section.
Replacing Fluent Bit with OpenTelemetry Collector​
When?: If you're using fluent-bit
as the log collector, which is the default.
Migration with (small) data loss​
On upgrade, the Fluent Bit DaemonSet will be deleted, and a new OpenTelemetry Collector Daemonset will be created. If a log file were to be rotated between the Fluent Bit Pod disappearing and the OpenTelemetry Collector Pod starting, logs added to that file after Fluent Bit was deleted will not be ingested. If you're ok with this minor loss of data, you can proceed without any manual intervention.
Migration with data duplication​
If you'd prefer to ingest duplicated data for a period of time instead, with OpenTelemetry Collector and Fluent Bit running side by side, enable the following settings:
sumologic:
logs:
collector:
allowSideBySide: true
fluent-bit:
enabled: true
After the upgrade, once OpenTelemetry Collector is running, you can disable Fluent Bit again and proceed without any data loss.
Migration without data duplication and data loss (advanced)​
If you want to migrate without losing data or ingesting duplicates, you can go with a more complex solution. The idea is to have two separated groups of nodes. One for Fluent Bit and one for OpenTelemetry Collector.
The node group for Fluent Bit should contain all existing nodes. The second group of nodes is dedicated to all new pods.
Let's consider an example for that. We added workerGroup: old-worker-group
label to all existing nodes, and then applied the following configuration:
sumologic:
logs:
collector:
allowSideBySide: true
fluent-bit:
enabled: true
nodeSelector:
## run Fluent-Bit on all nodes matching label: workerGroup=old-worker-group
workerGroup: old-worker-group
otellogs:
daemonset:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
## run OpenTelemetry Logs Collector on all nodes which do not match label: workerGroup=old-worker-group
- key: workerGroup
operator: NotIn
values:
- old-worker-group
After upgrading the Helm Chart, we drained all nodes with workerGroup: old-worker-group
, and then we set OpenTelemetry Collector as only logs collector in the cluster by removing fluent-bit
and otellogs
unnecessary configuration sections.
sumologic:
logs:
collector:
allowSideBySide: true
Otelcol StatefulSet​
When?: If you're using otelcol
as the logs metadata provider already.
Run the following command to manually delete otelcol StatefulSets:
kubectl delete sts --namespace=${NAMESPACE} --cascade=orphan -lapp=${HELM_RELEASE_NAME}-sumologic-otelcol-logs
The reason this is necessary is that the Service name for this StatefulSet has changed, and Kubernetes forbids modification of this value on existing StatefulSets.
Custom logs filtering and processing​
When?: If you added extra configuration to Fluentd logs
If you added extra configuration to Fluentd logs, you will likely want to do analogous modifications in OpenTelemetry Collector.
See the Logs modifications doc.
Tracing migration​
Trace collection is now enabled by default. If you'd like to keep it disabled, set:
sumologic:
traces:
enabled: false
If you do not have tracing collection enabled, you can skip straight to the end and upgrade using Helm.
Replace special configuration values marked by 'replace' suffix​
When?: If you used any configuration values for traces with the *.replace
suffix
The mechanism to replace special configuration values for traces marked by the 'replace' suffix was removed and the following special values in the configuration are no longer automatically replaced, and they need to be changed:
processors.source.collector.replace
processors.source.name.replace
processors.source.category.replace
processors.source.category_prefix.replace
processors.source.category_replace_dash.replace
processors.source.exclude_namespace_regex.replace
processors.source.exclude_pod_regex.replace
processors.source.exclude_container_regex.replace
processors.source.exclude_host_regex.replace
processors.resource.cluster.replace
The above special configuration values can be replaced either to direct values or be set as reference to other parameters from values.yaml
.
OpenTelemetry Collector Deployments​
-
Otelcol Deployment
When?: If tracing is enabled
Warning >
otelcol
Deployment was moved totracesSampler
Deployment.Run the following command to manually delete otelcol Deployment and ConfigMap:
kubectl delete deployment --namespace=${NAMESPACE} --cascade=orphan ${HELM_RELEASE_NAME}-sumologic-otelcol
kubectl delete cm --namespace=${NAMESPACE} --cascade=orphan ${HELM_RELEASE_NAME}-sumologic-otelcol -
Otelagent DaemonSet
When?: If you're using
otelagent
(otelagent.enabled=true
)Warning >
otelagent
DaemonSet was replaced byotelcolInstrumentation
StatefulSet.Run the following command to manually delete otelagent DamemonSet and ConfigMap:
kubectl delete ds --namespace=${NAMESPACE} --cascade=orphan ${HELM_RELEASE_NAME}-sumologic-otelagent
kubectl delete cm --namespace=${NAMESPACE} --cascade=orphan ${HELM_RELEASE_NAME}-sumologic-otelagent -
Otelgateway Deployment
When?: If you're using
otelgateway
(otelgateway.enabled=true
)Warning >
otelgateway
Deployment was moved totracesGateway
Deployment.Run the following command to manually delete otelgateway Deployment and ConfigMap:
kubectl delete deployment --namespace=${NAMESPACE} --cascade=orphan ${HELM_RELEASE_NAME}-sumologic-otelgateway
kubectl delete cm --namespace=${NAMESPACE} --cascade=orphan ${HELM_RELEASE_NAME}-sumologic-otelgateway
Running the helm upgrade​
Once you've taken care of any manual steps necessary for your configuration, run the helm upgrade:
helm upgrade --namespace "${NAMESPACE}" "${HELM_RELEASE_NAME}" sumologic/sumologic --version=3.0.0 -f new-values.yaml
After you're done, please review the full list of changes, as some of them may impact you even if they do not require additional action.
Known issues​
Cannot delete pod if using Tailing Sidecar Operator​
If you are using Tailing Sidecar Operator and see the following error:
Error from server: admission webhook "tailing-sidecar.sumologic.com" denied the request: there is no content to decode
Please try to remove the pod later.
OpenTelemetry Collector doesn't read logs from the beginning of files​
This is done by design. We are not going to read logs from before the collection has been installed.
In order to keep old behavior (can result in logs duplication for some cases), please use the following configuration:
metadata:
logs:
config:
merge:
receivers:
filelog/containers:
start_at: beginning