Google Cloud Dataproc
Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. For more details, refer to the GCP documentation
Log and metric types
Setup
You can collect the logs and metrics for Sumo Logic's Google Cloud Dataproc integration by following the below steps.
Configure logs collection
-
Collect Audit Logs using the Google Cloud Platform source. These Audit Logs can be accessed based on the permissions and roles. To enable logging for Google Dataproc, refer to Google documentation. For more detail on Dataproc operations being audited, refer to audited operations. While creating the sync in GCP, as part of the Choose logs to include in sink section, you can use the following query:
(resource.type=audited_resource resource.labels.service=dataproc.googleapis.com)
-
Collect Platform Logs using the Google Cloud Platform source. Dataproc platform logs include job logs and cluster logs. Here are the permissions required to access job and cluster logs. While creating the sync in GCP, as part of the Choose logs to include in sink section, you can use the following query:
(resource.type=(cloud_dataproc_cluster OR cloud_dataproc_job))
Configure metrics collection
- Collect GCP Metrics using the GCP Metric source. Under the Services dropdown, select Cloud Dataproc. For Google Dataproc metrics and dimensions, refer to Google Dataproc metrics.