Azure Machine Learning

Azure Machine Learning is a cloud service for accelerating and managing the machine learning project lifecycle that includes training, deploying, and monitoring models. This integration helps in monitoring the training runs, model deployments, and resource utilization of the nodes present in your workspace.
Log and metric types
For Azure Machine Learning, you can collect the following logs and metrics:
- Resource logs. These logs contain information about clusters, nodes, jobs, deployments, models, and pipeline events. To learn more about the different log category types and schemas collected for Azure Machine Learning, refer to the documentation.
- Platform Metrics for Azure Machine Learning. These metrics are available in Microsoft.MachineLearningServices/workspaces namespace. For more information on supported metrics and dimensions, refer to the Azure documentation.
Setup
Azure service sends monitoring data to Azure Monitor, which can then stream data to Eventhub. Sumo Logic supports:
- Logs collection from Azure Monitor using our Azure Event Hubs source.
- Metrics collection using our Azure Metrics Source.
You must explicitly enable diagnostic settings for each Machine Learning Workspace you want to monitor. You can forward logs to the same event hub provided they satisfy the limitations and permissions as described here.
When you configure the event hubs source or HTTP source, plan your source category to ease the querying process. A hierarchical approach allows you to make use of wildcards. For example: Azure/MachineLearning/Logs
, Azure/MachineLearning/Metrics
.
Configure collector
Create a hosted collector if not already configured and tag the tenant_name
field. You can get the tenant name using the instructions here. Make sure you create the required sources in this collector.
Configure metrics collection
To set up the Azure Metrics source in Sumo Logic, refer to Azure Metrics Source.
The Sumo Logic Metrics source is currently in beta. To participate, contact your Sumo Logic account executive.
Configure logs collection
In this section, you will configure a pipeline for shipping diagnostic logs from Azure Monitor to an Event Hub.
Diagnostic logs
- To set up the Azure Event Hubs source in Sumo Logic, refer to Azure Event Hubs Source for Logs.
- To create the diagnostic settings in Azure portal, refer to the Azure documentation. Perform the steps below for each Azure Machine Learning namespace that you want to monitor.
- Choose
Stream to an event hub
as the destination. - Select
allLogs
. - Use the Event Hub namespace and Event Hub name configured in the previous step in the destination details section. You can use the default policy
RootManageSharedAccessKey
as the policy name.
- Choose
- Tag the location field in the source with the right location value.
Activity logs (optional)
To collect activity logs, follow the instructions here. If you are already collecting activity logs for a subscription, you can skip this step.
Since this source includes logs from multiple regions, do not tag it with the location tag.
Installing the Azure Machine Learning app
To install the app, do the following:
Next-Gen App: To install or update the app, you must be an account administrator or a user with Manage Apps, Manage Monitors, Manage Fields, Manage Metric Rules, and Manage Collectors capabilities depending upon the different content types part of the app.
- Select App Catalog.
- In the 🔎 Search Apps field, run a search for your desired app, then select it.
- Click Install App.
note
Sometimes this button says Add Integration.
- Click Next in the Setup Data section.
- In the Configure App section of your respective app, complete the following field.
- Index. Specify value for _index if the collection is configured with custom partition. Learn more. Default value is set to
sumologic_default
(default partition)
- Index. Specify value for _index if the collection is configured with custom partition. Learn more. Default value is set to
- Click Next. You will be redirected to the Preview & Done section.
Post-installation
Once your app is installed, it will appear in your Installed Apps folder, and dashboard panels will start to fill automatically.
Each panel slowly fills with data matching the time range query received since the panel was created. Results will not immediately be available but will be updated with full graphs and charts over time.
As part of the app installation process, the following fields will be created by default:
tenant_name
. This field is tagged at the collector level. You can get the tenant name using the instructions here.location
. The region to which the resource name belongs to.subscription_id
. ID associated with a subscription where the resource is present.resource_group
. The resource group name where the Azure resource is present.provider_name
. Azure resource provider name (for example, Microsoft.Network).resource_type
. Azure resource type (for example, storage accounts).resource_name
. The name of the resource (for example, storage account name).service_type
. Type of the service that can be accessed with a Azure resource.service_name
. Services that can be accessed with an Azure resource (for example, in Azure Container Instances the service is Subscriptions).
Viewing the Azure Machine Learning dashboards
All dashboards have a set of filters that you can apply to the entire dashboard. Use these filters to drill down and examine the data to a granular level.
- You can change the time range for a dashboard or panel by selecting a predefined interval from a drop-down list, choosing a recently used time range, or specifying custom dates and times. Learn more.
- You can use template variables to drill down and examine the data on a granular level. For more information, see Filtering Dashboards with Template Variables.
- Many of the Next-Gen apps allow you to provide the Index at the installation time and a default value for this key (sumologic_default). Based on your input, the app dashboards will be parameterized with a dashboard variable, allowing you to change the data partition queried by all panels. This restricts the query scope of all the dashboard queries to a specific data partition.
Overview
The Azure Machine Learning - Overview dashboard provides comprehensive details on events, operations and details such as overall number of errors, failures, model usage and quota utilization, operation types, ingress and egress of network data

Model
The Azure Machine Learning - Model dashboard provides information on model details and model inference operations related to your Azure Machine Learning.

Compute
The Azure Machine Learning - Compute dashboard provides details on compute operations, events and usage such as CPU, GPU, Disk or memory to your Azure Machine Learning.

Data Events
The Azure Machine Learning - Data Events dashboard provides details on all Data events such as DateSet, DataStore, DataLabel read/change events with results details and also environment data events related to your Azure Machine Learning.

Administrative Operations
The Azure Machine Learning - Administrative Operations dashboard provides details on the operational activities and status of your Azure Machine Learning resources.
Use this dashboard to:
- Monitor the distribution of operation types and their success rates to ensure proper functioning of your Machine Learning.
- Identify potential issues by analyzing the top operations causing errors and correlating them with specific users or applications.
- Track recent write and delete operations to maintain an audit trail of changes made to your Machine Learning.

Policy and Recommendations
The Azure Machine Learning - Policy and Recommendations dashboard provides details on policy events and recommendations for your Azure Machine Learning resources.
Use this dashboard to:
- Monitor the success and failure rates of policy events to ensure proper configuration and compliance.
- Track and analyse recent recommendations to improve the performance and security of your VM setup.
- Identify trends in policy events and recommendations over time to proactively address potential issues.

Jobs and Pipelines
The Azure Machine Learning - Jobs and Pipelines dashboard provides details on operations, events and failures in jobs and pipelines of your Azure Machine Learning.

Quota
The Azure Machine Learning - Quota dashboard provides details on quota related to your Azure Machine Learning such as Quota Utilization, Active Node, Active Cores, Idle Cores, etc.

Run
The Azure Machine Learning - Run dashboard provides details on running experiments such as failed runs, errors in runs, completed or in-progress or started run

Create monitors for Azure Machine Learning app
From your App Catalog:
- From the Sumo Logic navigation, select App Catalog.
- In the Search Apps field, search for and then select your app.
- Make sure the app is installed.
- Navigate to What's Included tab and scroll down to the Monitors section.
- Click Create next to the pre-configured monitors. In the create monitors window, adjust the trigger conditions and notifications settings based on your requirements.
- Scroll down to Monitor Details.
- Under Location click on New Folder.
note
By default, monitor will be saved in the root folder. So to make the maintenance easier, create a new folder in the location of your choice.
- Enter Folder Name. Folder Description is optional.
tip
Using app version in the folder name will be helpful to determine the versioning for future updates.
- Click Create. Once the folder is created, click on Save.
Azure Machine Learning alerts
These alerts are metric based and will work for all Machine Learning.
Alert Name | Description | Alert Condition | Recover Condition |
---|---|---|---|
Azure Machine Learning - Failed Runs | This alert is triggered when Failed Runs are detected in Machine Learning workspace. | Count >= 1 | Count < 1 |
Upgrade/Downgrade the Azure Machine Learning app (optional)
To update the app, do the following:
Next-Gen App: To install or update the app, you must be an account administrator or a user with Manage Apps, Manage Monitors, Manage Fields, Manage Metric Rules, and Manage Collectors capabilities depending upon the different content types part of the app.
- Select App Catalog.
- In the Search Apps field, search for and then select your app.
Optionally, you can identify apps that can be upgraded in the Upgrade available section. - To upgrade the app, select Upgrade from the Manage dropdown.
- If the upgrade does not have any configuration or property changes, you will be redirected to the Preview & Done section.
- If the upgrade has any configuration or property changes, you will be redirected to the Setup Data page.
- In the Configure section of your respective app, complete the following fields.
- Field Name. If you already have collectors and sources set up, select the configured metadata field name (eg _sourcecategory) or specify other custom metadata (eg: _collector) along with its metadata Field Value.
- Click Next. You will be redirected to the Preview & Done section.
Post-update
Your upgraded app will be installed in the Installed Apps folder and dashboard panels will start to fill automatically.
See our Release Notes changelog for new updates in the app.
To revert the app to a previous version, do the following:
- Select App Catalog.
- In the Search Apps field, search for and then select your app.
- To version down the app, select Revert to < previous version of your app > from the Manage dropdown.
Uninstalling the Azure Machine Learning app (optional)
To uninstall the app, do the following:
- Select App Catalog.
- In the 🔎 Search Apps field, run a search for your desired app, then select it.
- Click Uninstall.
Troubleshooting
HTTP Logs and Metrics Source used by Azure Functions
To troubleshoot metrics collection, follow the instructions in Collect Metrics from Azure Monitor > Troubleshooting metrics collection.