Collect logs and metrics for the Istio App
This page provides instructions for collecting logs and metrics for the Sumo App for Istio. Logs and metrics are collected with the Sumo Logic Helm chart. Istio sample metrics and sample log messages are also provided, along with a query sample.
Log and Metrics Types
Collection overview
Configure log and metrics collection with the Sumo Logic Helm chart, using one of the following options:
Kubernetes collection is already set up
Log Collection:
-
Enable Access Logging to write logs to stdout.
The Sumologic-Kubernetes-Collection will automatically capture the logs from stdout and will send the logs to Sumologic.
Metrics Collection:
-
If you did install using the Sumo Logic Helm chart:
-
Update the helm chart values file with the following config:
-
Add following additionalScrapeConfigs section to prometheusSpec field of
sumologic-istio.yaml
. These configs will scrape Istio endpoints for metrics. You can read more about the above scrape configs here
-
-
- job_name: 'istiod' kubernetes_sd_configs: - role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: istiod;http-monitoring - job_name: 'envoy-stats' metrics_path: /stats/prometheus kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_container_port_name] action: keep regex: '.*-envoy-prom'
-
Add following rules to remoteWrite section of
sumologic-istio.yaml
. These remote write configs make sure only metrics used by Sumo Logic Istio App are forwarded to Sumo Logic by Sumo Helm Chart.
- url: http://$(FLUENTD_METRICS_SVC).$(NAMESPACE).svc.cluster.local:9888/prometheus.metrics.istio remoteTimeout: 5s writeRelabelConfigs: - action: keep regex: (?:galley_validation_(passed|failed|config_updates|config_update_error)) sourceLabels: [__name__] - url: http://$(FLUENTD_METRICS_SVC).$(NAMESPACE).svc.cluster.local:9888/prometheus.metrics.istio remoteTimeout: 5s writeRelabelConfigs: - action: keep regex: (?:istio_(response_(bytes_sum|bytes_bucket)|requests_total|request_(duration_milliseconds_sum|duration_milliseconds_bucket|bytes_sum|bytes_bucket)|build|agent_(process_virtual_memory_bytes|process_max_fds|pilot_xds_pushes|pilot_xds_expired_nonce|pilot_xds|pilot_virt_services|pilot_proxy_queue_time_sum|pilot_endpoint_not_ready|num_outgoing_requests|num_failed_outgoing_requests|go_threads|go_memstats_heap_inuse_bytes|go_memstats_heap_alloc_bytes|go_memstats_gc_cpu_fraction|go_memstats_alloc_bytes_total|go_memstats_alloc_bytes|go_gc_duration_seconds))) sourceLabels: [__name__]
-
Upgrade the sumo logic helm chart by running the following
helm upgrade --install <my-release-name> sumologic/sumologic -f sumologic-istio.yaml
Kubernetes collection has not been set up
Log Collection:
-
Enable Access Logging to write logs to stdout.
The Sumologic-Kubernetes-Collection will automatically capture the logs from stdout and will send the logs to Sumologic.
Metric Collection:
-
Deploy using Helm
-
Add additionalScrapeConfigs and remoteWrite rules to values.yaml
- Add this additionalScrapeConfigs section to prometheusSpec field of
values.yaml
. These configs will scrape Istio endpoints for metrics. These configs will scrape Istio endpoints for metrics. You can read more about above scrape configs here
- Add this additionalScrapeConfigs section to prometheusSpec field of
- job_name: 'istiod' kubernetes_sd_configs: - role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: istiod;http-monitoring - job_name: 'envoy-stats' metrics_path: /stats/prometheus kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_container_port_name] action: keep regex: '.*-envoy-prom'
-
Add these rules to remoteWrite section of
values.yaml
. This will send scraped metrics to sumo.
- url: http://$(FLUENTD_METRICS_SVC).$(NAMESPACE).svc.cluster.local:9888/prometheus.metrics.istio remoteTimeout: 5s writeRelabelConfigs: - action: keep regex: (?:galley_validation_(passed|failed|config_updates|config_update_error)) sourceLabels: [__name__] - url: http://$(FLUENTD_METRICS_SVC).$(NAMESPACE).svc.cluster.local:9888/prometheus.metrics.istio remoteTimeout: 5s writeRelabelConfigs: - action: keep regex: (?:istio_(response_(bytes_sum|bytes_bucket)|requests_total|request_(duration_milliseconds_sum|duration_milliseconds_bucket|bytes_sum|bytes_bucket)|build|agent_(process_virtual_memory_bytes|process_max_fds|pilot_xds_pushes|pilot_xds_expired_nonce|pilot_xds|pilot_virt_services|pilot_proxy_queue_time_sum|pilot_endpoint_not_ready|num_outgoing_requests|num_failed_outgoing_requests|go_threads|go_memstats_heap_inuse_bytes|go_memstats_heap_alloc_bytes|go_memstats_gc_cpu_fraction|go_memstats_alloc_bytes_total|go_memstats_alloc_bytes|go_gc_duration_seconds))) sourceLabels: [__name__]
-
Upgrade the sumo logic helm chart by running the following,
helm upgrade --install <my-release-name> sumologic/sumologic -f sumologic-istio.yaml
Validation Steps:
-
Do port forward via your terminal :
kubectl port-forward prometheus-my-release-kube-prometheus-prometheus-0 9090
-
Open http://127.0.0.1:9090/config in a web browser and make sure the following remotewrite configs are present:
- url: http://my-release-sumologic-fluentd-metrics.default.svc.cluster.local:9888/prometheus.metrics.istio remote_timeout: 5s write_relabel_configs: - source_labels: [__name__] separator: ; regex: (?:galley_validation_(passed|failed|config_updates|config_update_error)) replacement: $1 action: keep queue_config: capacity: 2500 max_shards: 200 min_shards: 1 max_samples_per_send: 500 batch_send_deadline: 5s min_backoff: 30ms max_backoff: 100ms - url: http://my-release-sumologic-fluentd-metrics.default.svc.cluster.local:9888/prometheus.metrics.istio remote_timeout: 5s write_relabel_configs: - source_labels: [__name__] separator: ; regex: (?:istio_(response_(bytes_sum|bytes_bucket)|requests_total|request_(duration_milliseconds_sum|duration_milliseconds_bucket|bytes_sum|bytes_bucket)|build|agent_(process_virtual_memory_bytes|process_max_fds|pilot_xds_pushes|pilot_xds_expired_nonce|pilot_xds|pilot_virt_services|pilot_proxy_queue_time_sum|pilot_endpoint_not_ready|num_outgoing_requests|num_failed_outgoing_requests|go_threads|go_memstats_heap_inuse_bytes|go_memstats_heap_alloc_bytes|go_memstats_gc_cpu_fraction|go_memstats_alloc_bytes_total|go_memstats_alloc_bytes|go_gc_duration_seconds))) replacement: $1 action: keep queue_config: capacity: 2500 max_shards: 200 min_shards: 1 max_samples_per_send: 500 batch_send_deadline: 5s min_backoff: 30ms max_backoff: 100ms
Above remotewrite configs make sure only metrics used by Sumo Logic Istio App are forwarded to Sumo Logic by Sumo Helm Chart.
-
Open http://127.0.0.1:9090/config in a web browser and make sure the following scrape configs are present :
- job_name: istiod honor_timestamps: true scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics scheme: http relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istiod;http-monitoring replacement: $1 action: keep kubernetes_sd_configs: - role: endpoints namespaces: names: - istio-system - job_name: envoy-stats honor_timestamps: true scrape_interval: 30s scrape_timeout: 10s metrics_path: /stats/prometheus scheme: http relabel_configs: - source_labels: [__meta_kubernetes_pod_container_port_name] separator: ; regex: .*-envoy-prom replacement: $1 action: keep kubernetes_sd_configs: - role: pod
Query Sample
Query Sample from Dashboard "Istio - Logs" ; Panel "Non 200 Response Codes" :
namespace=istio-system cluster={{cluster}} | json field=_raw "log" as log_message | parse regex field=log_message "\[(?<start_time>.+)\] \"(?<req>.+?)\" (?<response_code>.+?) (?<response_flags>.+?) (?<response_code_details>.+?) (?<con_term_details>.+?) \"(?<upstream_fail_reason>.+?)\" (?<bytes_recvd>.+?) (?<bytessent>.+?) (?<duration>.+?) (?<resp>.+?) \"(?<req_fwd_for>.+?)\" \"(?<user_agent>.+?)\" \"(?<request_id>.+?)\" \"(?<request_authority>.+?)\" \"(?<upstream_host>.+?)\" (?<upstream_cluster>.+?) (?<upstream_loacl_address>.+?) (?<downstream_local_address>.+?) (?<downstream_remote_address>.+?) (?<requested_server_name>.+?) (?<route_name>.+?)" | where response_code != "200" | timeslice 1h | count by response_code, _timeslice | transpose row _timeslice column response_code