dev-resources.site
for different kinds of informations.
How to setup an Azure Machine Learning Workspace securely🛡️🔒🔑
TLDR
In this post, I'm covering how to set up an Azure ML Workspace in a secure way. The focus is set on networking and integrating the Service with other Azure Services. If you're not interested in the details of this, and just look for a blueprint or template, you can check out this Github Repo.
Table of contents
- Motivation
- Objective
- Architecture
- Deployment
- Prospect
Motivation
In a recent project of mine, I've set up an Azure Machine Learning Workspace (AMLW). Upon research, I've realized that there is no easy blueprint combining the setup of a Workspace with secured networking and integration with other Azure services. Additionally, the available documentation was not always 100% clear to me.
Usually, in these cases, I'd just go ahead and set up the resources in the Azure Portal to understand how they interact and can be integrated. However, in the case of the AMLW, I was not able to achieve the target architecture with the options available in the portal.
These difficulties led me to post about the secure setup of this resource here. Perhaps some large language models (LLMs) are processing this blog post and provide answers to desperate Cloud Engineers.
Objective
The idea of this post is to generalize my learnings with the AMLW and create a blueprint from the secure and integrated setup for others to use. The focus will be on the following topics:
- Networking integration with other Azure services.
- Configuration of a Managed Identity for the networking setup to function properly.
Architecture
The architecture of this blueprint as shown below contains:
- Azure Machine Learning Workspace (the point of the post)
- Storage Account (essential resource to AMLW)
- Container Registry (essential resource to AMLW)
- Key Vault (essential resource to AMLW)
- Solution Vnet (to secure all Azure Services)
- OpenAI Workspace (the Azure Service to integrate with AMLW)
- Application Insights
The Storage Account, Container Registry, and Key Vault are essential resources required for the deployment of the AMLW. These resources are integrated into both the Solution Virtual Network (VNet) and the Workspace Managed VNet. The OpenAI Workspace is utilized in this architecture to demonstrate how the AMLW can be integrated with various other Platform as a Service (PaaS) offerings on Azure. Additionally, the Application Insights instance and the Jumphost Virtual Machine (VM) serve as supporting resources to facilitate network access and enhance observability of the solution. The Jumphost VM is accessible via Azure Bastion. Aside from the VM and the AMLW, all networking integrations are configured using Private Endpoints.
Deployment
The Deployment of the resources will be done using a bicep Resource Group deployment. For that, a Resource Group needs to be set up.
Deploy Resource Group
Setting up the Resource Group using the Azure CLI:
$rgName = "ml-secure-blueprint"
$location = "germanywestcentral"
az group create --name $rgName --location $location
The Details of the Azure Machine Learning Workspace
In the bicep module of the AMLW, along with standard properties such as name, location, identity, and the resource IDs for essential services like Key Vault, Storage Account, and Container Registry, the most intriguing component is the managedNetwork
object. This object primarily includes configuration for the AMLW's networking options. In this instance, I have set AllowInternetOutbound
to ensure that outbound traffic is not restricted.
For integration with other Azure services, you can define custom outbound rule objects. In my case, I named the rule allowOpenAi
, but you can choose any name you prefer. The type must be set to PrivateEndpoint
, and within the destination
object, you can specify your target Resource ID and the subresource target, which you can find on this site.
resource machineLearningWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-07-01-preview' = {
name: machineLearningWorkspaceName
location: location
identity: {
type: 'UserAssigned'
userAssignedIdentities: {
'${managedIdentityId}': {}
}
}
properties: {
applicationInsights: applicationInsightsId
storageAccount: storageAccountId
containerRegistry: containerRegistryId
keyVault: keyVaultId
imageBuildCompute: 'cpu-compute'
primaryUserAssignedIdentity: managedIdentityId
publicNetworkAccess: 'Disabled'
managedNetwork: {
isolationMode: 'AllowInternetOutbound'
outboundRules: {
allowOpenAi: {
type: 'PrivateEndpoint'
destination: {
serviceResourceId: openAiWorkspaceId
sparkEnabled: true
subresourceTarget: 'account'
}
}
}
}
}
sku: {
name: 'Basic'
tier: 'Basic'
}
}
Configuration of the Managed Identity
The configuration of the Managed Identity of the AMLW to set up the network details correctly is shown in the snippet below. The important part is that the Identity has the Azure AI Enterprise Network Connection Approver Role assigned in a context where it is authorized not only to all the essential resources of the AMLW but also to all resources that you want to integrate the AMLW with. I chose to assign this role to the entire resource group because only these resources are within this resource group.
@description('The managed identity name.')
param managedIdentityName string
@description('The main location.')
param location string
@description('The network connection approver role definition id.')
var networkConnectionApproverRoleDefinitionId = 'b556d68e-0be0-4f35-a333-ad7ee1ce17ea'
resource managedIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
name: managedIdentityName
location: location
}
resource connectionApproverAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(resourceGroup().id, networkConnectionApproverRoleDefinitionId, managedIdentity.name)
properties: {
principalId: managedIdentity.properties.principalId
roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions', networkConnectionApproverRoleDefinitionId)
principalType: 'ServicePrincipal'
}
}
Deploy Bicep Script Resources
The deployment is being created using the Azure CLI. Besides setting the password for the Jumphost VM, you can configure several parameters in the app-parameters.json
file.
$vmPassword = "<your password here>"
az deployment group create `
--resource-group $rgName `
--template-file app-infrastructure.bicep `
--parameters @app-parameters.json `
--parameters "vmAdminPassword=$vmPassword" `
--name $rgName
The parameters file might look like this:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"instance": {
"value": "dev"
},
"prefix": {
"value": "blueprint"
},
"location": {
"value": "westeurope"
}
}
}
Deploy the ML Workspace Network
To complete the deployment, you need to initiate the network deployment for the AMLW. This step is necessary because, as explained here, the managed virtual network for an AMLW is not automatically created during its initial deployment, it is provisioned only when required. To ensure that the managed network is deployed, simply execute the script shown below to force its creation.
$subscriptionId = "<your Subscription ID>"
$mlWorkspaceName = "<name of your ML Workspace>"
az ml workspace provision-network `
--subscription $subscriptionId `
--resource-group $rgName `
--name $mlWorkspaceName
After this script finishes, the Private Endpoints will be automatically set up for:
- Key Vault
- Container Registry
- Storage Account
- OpenAI Workspace
You can verify their status in the Azure Portal. From the AMLW, the OpenAi Workspace Private Endpoint should appear like this:
And on the OpenAI Workspace you should find the following configuration:
Prospect
With your baseline networking setup established, you can now enhance your project by adding additional capabilities.
One example could be implementing an Ingress solution. Depending on your use case and whether you wish to provide public access, this can be achieved using Azure Front Door or Application Gateway in conjunction with API Management. If you intend to grant access to users within your organization, you may consider setting up an ExpressRoute or a Site-to-Site VPN connection to your Solution VNet.
Another capability you might need is a deployment agent. For more information on this, you can refer to one of my previous posts here.
PS: I've written this post without any AI Tools besides from spell checking and improving my wording.
Featured ones: