How to setup an Azure Machine Learning Workspace securely🛡️🔒🔑

Published at

1/11/2025

TLDR

In this post, I'm covering how to set up an Azure ML Workspace in a secure way. The focus is set on networking and integrating the Service with other Azure Services. If you're not interested in the details of this, and just look for a blueprint or template, you can check out this Github Repo.

Motivation
Objective
Architecture
Deployment
Prospect

Motivation

In a recent project of mine, I've set up an Azure Machine Learning Workspace (AMLW). Upon research, I've realized that there is no easy blueprint combining the setup of a Workspace with secured networking and integration with other Azure services. Additionally, the available documentation was not always 100% clear to me.

Usually, in these cases, I'd just go ahead and set up the resources in the Azure Portal to understand how they interact and can be integrated. However, in the case of the AMLW, I was not able to achieve the target architecture with the options available in the portal.

These difficulties led me to post about the secure setup of this resource here. Perhaps some large language models (LLMs) are processing this blog post and provide answers to desperate Cloud Engineers.

Objective

The idea of this post is to generalize my learnings with the AMLW and create a blueprint from the secure and integrated setup for others to use. The focus will be on the following topics:

Networking integration with other Azure services.
Configuration of a Managed Identity for the networking setup to function properly.

Architecture

The architecture of this blueprint as shown below contains:

Azure Machine Learning Workspace (the point of the post)
Storage Account (essential resource to AMLW)
Container Registry (essential resource to AMLW)
Key Vault (essential resource to AMLW)
Solution Vnet (to secure all Azure Services)
OpenAI Workspace (the Azure Service to integrate with AMLW)
Application Insights

The Storage Account, Container Registry, and Key Vault are essential resources required for the deployment of the AMLW. These resources are integrated into both the Solution Virtual Network (VNet) and the Workspace Managed VNet. The OpenAI Workspace is utilized in this architecture to demonstrate how the AMLW can be integrated with various other Platform as a Service (PaaS) offerings on Azure. Additionally, the Application Insights instance and the Jumphost Virtual Machine (VM) serve as supporting resources to facilitate network access and enhance observability of the solution. The Jumphost VM is accessible via Azure Bastion. Aside from the VM and the AMLW, all networking integrations are configured using Private Endpoints.

Deployment

The Deployment of the resources will be done using a bicep Resource Group deployment. For that, a Resource Group needs to be set up.

Deploy Resource Group

Setting up the Resource Group using the Azure CLI:

$rgName = "ml-secure-blueprint"
$location = "germanywestcentral"
az group create --name $rgName --location $location

The Details of the Azure Machine Learning Workspace

In the bicep module of the AMLW, along with standard properties such as name, location, identity, and the resource IDs for essential services like Key Vault, Storage Account, and Container Registry, the most intriguing component is the managedNetwork object. This object primarily includes configuration for the AMLW's networking options. In this instance, I have set AllowInternetOutbound to ensure that outbound traffic is not restricted.

For integration with other Azure services, you can define custom outbound rule objects. In my case, I named the rule allowOpenAi, but you can choose any name you prefer. The type must be set to PrivateEndpoint, and within the destination object, you can specify your target Resource ID and the subresource target, which you can find on this site.

resource machineLearningWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-07-01-preview' = {
  name: machineLearningWorkspaceName
  location: location
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${managedIdentityId}': {}
    }
  }
  properties: {
    applicationInsights: applicationInsightsId
    storageAccount: storageAccountId
    containerRegistry: containerRegistryId
    keyVault: keyVaultId
    imageBuildCompute: 'cpu-compute'
    primaryUserAssignedIdentity: managedIdentityId
    publicNetworkAccess: 'Disabled'
    managedNetwork: {
      isolationMode: 'AllowInternetOutbound'
      outboundRules: {
        allowOpenAi: {
          type: 'PrivateEndpoint'
          destination: {
            serviceResourceId: openAiWorkspaceId
            sparkEnabled: true
            subresourceTarget: 'account'
          }
        }
      }
    }
  }
  sku: {
    name: 'Basic'
    tier: 'Basic'
  }
}

Configuration of the Managed Identity

The configuration of the Managed Identity of the AMLW to set up the network details correctly is shown in the snippet below. The important part is that the Identity has the Azure AI Enterprise Network Connection Approver Role assigned in a context where it is authorized not only to all the essential resources of the AMLW but also to all resources that you want to integrate the AMLW with. I chose to assign this role to the entire resource group because only these resources are within this resource group.

@description('The managed identity name.')
param managedIdentityName string

@description('The main location.')
param location string

@description('The network connection approver role definition id.')
var networkConnectionApproverRoleDefinitionId = 'b556d68e-0be0-4f35-a333-ad7ee1ce17ea'

resource managedIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: managedIdentityName
  location: location
}

resource connectionApproverAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(resourceGroup().id, networkConnectionApproverRoleDefinitionId, managedIdentity.name)
  properties: {
    principalId: managedIdentity.properties.principalId
    roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions', networkConnectionApproverRoleDefinitionId)
    principalType: 'ServicePrincipal'
  }
}

Deploy Bicep Script Resources

The deployment is being created using the Azure CLI. Besides setting the password for the Jumphost VM, you can configure several parameters in the app-parameters.json file.

$vmPassword = "<your password here>"
az deployment group create `
    --resource-group $rgName `
    --template-file app-infrastructure.bicep `
    --parameters @app-parameters.json `
    --parameters "vmAdminPassword=$vmPassword" `
    --name $rgName

The parameters file might look like this:

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "instance": {
            "value": "dev"
        },
        "prefix": {
            "value": "blueprint"
        },
        "location": {
            "value": "westeurope"
        }
    }
}

Deploy the ML Workspace Network

To complete the deployment, you need to initiate the network deployment for the AMLW. This step is necessary because, as explained here, the managed virtual network for an AMLW is not automatically created during its initial deployment, it is provisioned only when required. To ensure that the managed network is deployed, simply execute the script shown below to force its creation.

$subscriptionId = "<your Subscription ID>"
$mlWorkspaceName = "<name of your ML Workspace>"
az ml workspace provision-network `
    --subscription $subscriptionId `
    --resource-group $rgName `
    --name $mlWorkspaceName

After this script finishes, the Private Endpoints will be automatically set up for:

Key Vault
Container Registry
Storage Account
OpenAI Workspace

You can verify their status in the Azure Portal. From the AMLW, the OpenAi Workspace Private Endpoint should appear like this:

And on the OpenAI Workspace you should find the following configuration:

Prospect

With your baseline networking setup established, you can now enhance your project by adding additional capabilities.

One example could be implementing an Ingress solution. Depending on your use case and whether you wish to provide public access, this can be achieved using Azure Front Door or Application Gateway in conjunction with API Management. If you intend to grant access to users within your organization, you may consider setting up an ExpressRoute or a Site-to-Site VPN connection to your Solution VNet.

Another capability you might need is a deployment agent. For more information on this, you can refer to one of my previous posts here.

PS: I've written this post without any AI Tools besides from spell checking and improving my wording.

dev-resources.site