Google Cloud TPU
Tensor Processing Units (TPUs) are Google's custom developed Application Specific Integrated Circuits (ASICs) used to accelerate machine learning workloads. For more details, refer to the GCP documentation.
Log and metric types
Setup
You can collect the logs and metrics for Sumo Logic's Google Cloud TPU integration by following the below steps.
Configure logs collection
- Collect Audit Logs using the Google Cloud Platform source. These Audit Logs can be accessed based on the permissions and roles. To enable logging for Google TPU, refer to Google documentation. For more detail on TPU operations being audited, refer to audited operations. While creating the sync in GCP, as part of the Choose logs to include in sink section, you can use the following query:
(resource.type=audited_resource resource.labels.service=tpu.googleapis.com)
- Collect Platform Logs using the Google Cloud Platform source. Cloud TPU Worker logs contain information about a specific Cloud TPU worker in a specific zone, for example the amount of memory available on the Cloud TPU worker (system_available_memory_GiB). While creating the sync in GCP, as part of the Choose logs to include in sink section, you can use the following query:
(resource.type=tpu_worker)
Configure metrics collection
- Collect GCP Metrics using the GCP Metric source. Under the Services dropdown, select Cloud TPU. For Google Cloud TPU metrics and dimensions, refer to Google Cloud TPU metrics.