dev-resources.site
for different kinds of informations.
Investigating az-cli performance on the hosted Azure Pipelines and GitHub Runners
I've been building a few more workflows and pipelines over the past few days and had been experimenting with the az-cli. And I've been running into all kinds of performance issues.
Azure CLI is a great nifty tool to chat to Azure as well as Azure DevOps and there's a AzureCLI@v2
task in Azure DevOps that preconfigures your Azure subscription and all.
While testing I got increasingly frustrated by how slow az
is on GitHub Actions and Azure Pipelines hosted runners. But only on Windows. So, I've been digging into why.
Over the past week I've found quite a few issues, files a few issues and submitted a couple of pull requests. I'll walk you through my findings, they may help improve the performance of your workflows too!
Case matters
It turns out that the casing of your commands matters when it comes to az
. You wouldn't expect it, az version
will take about 12 seconds to run on the hosted runner including the overhead of starting a fresh powershell session.
But az Version
will take about one minute. Why? Well, it turns out that az
caches all the commands that are available from its many plugins, and that it looks up the command by key to see if it recognises what you've passed in. It performs that lookup case sensitive.
Issue filed:
command index lookup is case sensitive #27497
Describe the bug
When you run
az vErSiOn
a command index rebuild is triggered.
az version
does not trigger a command index rebuild
Related command
any command
Errors
cli.azure.cli.core: No module found from index for '['VeRSIoN', '--debug']'
cli.azure.cli.core: Loading all modules and extensions
Issue script & Debug output
Command
C:\Users\JesseHouwing>az VeRSIoN --debug
Output:
cli.knack.cli: Command arguments: ['VeRSIoN', '--debug']
cli.knack.cli: __init__ debug log:
Enable color in terminal.
Enable VT mode.
cli.knack.cli: Event: Cli.PreExecute []
cli.knack.cli: Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x01C3A4F0>, <function OutputProducer.on_global_arguments at 0x01E3B730>, <function CLIQuery.on_global_arguments at 0x01E58388>]
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableCreate []
cli.azure.cli.core: No module found from index for '['VeRSIoN', '--debug']'
cli.azure.cli.core: Loading all modules and extensions
cli.azure.cli.core: Discovered command modules: ['acr', 'acs', 'advisor', 'ams', 'apim', 'appconfig', 'appservice', 'aro', 'backup', 'batch', 'batchai', 'billing', 'botservice', 'cdn', 'cloud', 'cognitiveservices', 'config', 'configure', 'consumption', 'container', 'containerapp', 'cosmosdb', 'databoxedge', 'dla', 'dls', 'dms', 'eventgrid', 'eventhubs', 'extension', 'feedback', 'find', 'hdinsight', 'identity', 'interactive', 'iot', 'keyvault', 'kusto', 'lab', 'managedservices', 'maps', 'marketplaceordering', 'monitor', 'mysql', 'netappfiles', 'network', 'policyinsights', 'privatedns', 'profile', 'rdbms', 'redis', 'relay', 'resource', 'role', 'search', 'security', 'servicebus', 'serviceconnector', 'servicefabric', 'signalr', 'sql', 'sqlvm', 'storage', 'synapse', 'util', 'vm']
cli.azure.cli.core: Loading command modules:
cli.azure.cli.core: Name Load Time Groups Commands
cli.azure.cli.core: acr 0.142 34 144
cli.azure.cli.core: acs 0.027 7 54
cli.azure.cli.core: advisor 0.003 3 6
cli.azure.cli.core: ams 0.019 22 100
cli.azure.cli.core: apim 0.008 14 68
cli.azure.cli.core: appconfig 0.004 9 47
cli.azure.cli.core: appservice 0.079 73 260
cli.azure.cli.core: aro 0.014 1 10
cli.azure.cli.core: backup 0.004 16 59
cli.azure.cli.core: batch 0.032 34 102
cli.azure.cli.core: batchai 0.004 10 30
cli.azure.cli.core: billing 0.009 19 52
cli.azure.cli.core: botservice 0.004 12 42
cli.azure.cli.core: cdn 0.006 39 133
cli.azure.cli.core: cloud 0.002 1 7
cli.azure.cli.core: cognitiveservices 0.003 10 33
cli.azure.cli.core: config 0.003 2 7
cli.azure.cli.core: configure 0.002 2 5
cli.azure.cli.core: consumption 0.026 8 9
cli.azure.cli.core: container 0.014 1 11
cli.azure.cli.core: containerapp 0.130 36 115
cli.azure.cli.core: cosmosdb 0.016 58 192
cli.azure.cli.core: databoxedge 0.007 5 27
cli.azure.cli.core: dla 0.004 23 62
cli.azure.cli.core: dls 0.004 7 41
cli.azure.cli.core: dms 0.003 3 22
cli.azure.cli.core: eventgrid 0.005 25 96
cli.azure.cli.core: eventhubs 0.018 13 20
cli.azure.cli.core: extension 0.002 1 7
cli.azure.cli.core: feedback 0.002 1 2
cli.azure.cli.core: find 0.002 1 1
cli.azure.cli.core: hdinsight 0.009 8 39
cli.azure.cli.core: identity 0.004 2 11
cli.azure.cli.core: interactive 0.001 1 1
cli.azure.cli.core: iot 0.096 19 82
cli.azure.cli.core: keyvault 0.009 22 133
cli.azure.cli.core: kusto 0.004 3 14
cli.azure.cli.core: lab 0.004 11 34
cli.azure.cli.core: managedservices 0.003 3 8
cli.azure.cli.core: maps 0.002 5 13
cli.azure.cli.core: marketplaceordering 0.006 1 2
cli.azure.cli.core: monitor 0.307 21 67
cli.azure.cli.core: mysql 0.104 14 49
cli.azure.cli.core: netappfiles 0.006 17 96
cli.azure.cli.core: network 0.130 103 337
cli.azure.cli.core: policyinsights 0.016 9 17
cli.azure.cli.core: privatedns 0.023 14 60
cli.azure.cli.core: profile 0.002 2 8
cli.azure.cli.core: rdbms 0.024 44 185
cli.azure.cli.core: redis 0.003 5 27
cli.azure.cli.core: relay 0.034 7 8
cli.azure.cli.core: resource 0.015 51 227
cli.azure.cli.core: role 0.003 17 61
cli.azure.cli.core: search 0.003 7 22
cli.azure.cli.core: security 0.006 48 104
cli.azure.cli.core: servicebus 0.019 12 17
cli.azure.cli.core: serviceconnector 0.054 16 235
cli.azure.cli.core: servicefabric 0.018 27 76
cli.azure.cli.core: signalr 0.003 8 30
cli.azure.cli.core: sql 0.017 56 215
cli.azure.cli.core: sqlvm 0.024 4 20
cli.azure.cli.core: storage 0.039 58 272
cli.azure.cli.core: synapse 0.015 54 246
cli.azure.cli.core: util 0.002 3 7
cli.azure.cli.core: vm 0.049 57 265
cli.azure.cli.core: Total (65) 1.653 1219 4752
cli.azure.cli.core: Loading extensions:
cli.azure.cli.core: Name Load Time Groups Commands Directory
cli.azure.cli.core: azure-devops 0.150 60 192 C:\Users\JesseHouwing\.azure\cliextensions\azure-devops
cli.azure.cli.core: containerapp 0.015 31 95 C:\Users\JesseHouwing\.azure\cliextensions\containerapp
cli.azure.cli.core: init 0.003 1 1 C:\Users\JesseHouwing\.azure\cliextensions\init
cli.azure.cli.core: Total (3) 0.169 92 288
cli.azure.cli.core: Loaded 1266 groups, 4947 commands.
cli.azure.cli.core: Updated command index in 0.005 seconds.
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x040095C8>]
cli.azure.cli.core.azlogging: metadata file logging enabled - writing logs to 'C:\Users\JesseHouwing\.azure\commands\2023-09-28.20-42-39.VeRSIoN.11404.log'.
az_command_data_logger: command args: version --debug
cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument.<locals>.add_subscription_parameter at 0x04036730>]
cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad []
cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument.<locals>.add_ids_arguments at 0x040446A0>, <function register_cache_arguments.<locals>.add_cache_arguments at 0x04044898>]
cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded []
cli.knack.cli: Event: CommandInvoker.OnPreParseArgs [<function _documentdb_deprecate at 0x05179F58>]
cli.knack.cli: Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x01E3B778>, <function CLIQuery.handle_query_parameter at 0x01E583D0>, <function register_ids_argument.<locals>.parse_ids_arguments at 0x04044850>, <function handler at 0x052DA4F0>, <function DevCommandsLoader.post_parse_args at 0x07439D60>]
cli.knack.cli: Event: CommandInvoker.OnTransformResult [<function _resource_group_transform at 0x04032DF0>, <function _x509_from_base64_to_hex_transform at 0x04032E38>]
cli.knack.cli: Event: CommandInvoker.OnFilterResult []
{
"azure-cli": "2.53.0",
"azure-cli-core": "2.53.0",
"azure-cli-telemetry": "1.1.0",
"extensions": {
"azure-devops": "0.26.0",
"containerapp": "0.3.29",
"init": "0.1.0"
}
}
cli.knack.cli: Event: Cli.SuccessfulExecute []
cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x040096E8>]
az_command_data_logger: exit code: 0
cli.__main__: Command ran in 2.197 seconds (init: 0.315, invoke: 1.882)
telemetry.main: Begin splitting cli events and extra events, total events: 1
telemetry.client: Accumulated 0 events. Flush the clients.
telemetry.main: Finish splitting cli events and extra events, cli events: 1
telemetry.save: Save telemetry record of length 3292 in cache
telemetry.main: Begin creating telemetry upload process.
telemetry.process: Creating upload process: "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\python.exe C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\Lib\site-packages\azure\cli\telemetry\__init__.pyc C:\Users\JesseHouwing\.azure"
telemetry.process: Return from creating process
telemetry.main: Finish creating telemetry upload process.
Expected behavior
The command is compared to the index case insensitive and the command index is not invalidated.
Environment Summary
C:\Users\JesseHouwing>az version { "azure-cli": "2.53.0", "azure-cli-core": "2.53.0", "azure-cli-telemetry": "1.1.0", "extensions": { "azure-devops": "0.26.0", "containerapp": "0.3.29", "init": "0.1.0" } }
Additional context
No response
For now, make sure you call az devops
and not az DevOps
.
Darn you autocorrect for correcting devops to DevOps all the time!
az devops login
vs AZURE_DEVOPS_EXT_PAT
To authenticate to Azure DevOps, you have a few options to chose from. You can use AzureCLI@2
with an Azure service connection or add az devops login
to your script or pass a token through the AZURE_DEVOPS_EXT_PAT
environment variable.
Configuring your Azure Service Connection is the slowest of the bunch. Followed by az devops login
where as the environment variable should be the fastest of the bunch.
But it turns out to be much more complex. That's because az devops login
will try to install python's keyring
package the first time it runs, and on the Hosted runners, the command has never been called before, thus adding 10-20s to the first call.
Collaborated on a pull request to force keyring
installation during image creation:
[windows] warmup "az devops" for the first run #8410
as Jesse Houwing has found "az devops" warms up during first run which takes ~40 sec.
warm up is done by running "az devops --help" with config redirected to persistent "C:" location, also keyring installation requires "fake" login
original PR: https://github.com/actions/runner-images/pull/8294
Description
warming up "az devops" first invocation. due to security reasons we cannot accept original PR: https://github.com/actions/runner-images/pull/8294
Related issue:
https://github.com/actions/runner-images/issues/8296
Check list
- [x] Related issue / work item is attached
- [ ] Tests are written (if applicable)
- [ ] Documentation is updated (if applicable)
- [ ] Changes are tested and related VM images are successfully generated
And filed an issue to make sure this package is included in az devops
by default:
[Feature Request] Install keyring by default or offer a way to install keyring without causing error messages #1367
Is your feature request related to a problem? Please describe.
When calling az devops login
for the very first time, it installs pythons keyring package. For some reason that's not included in the default setup. This adds needless time to a first invocation of az devops login
on the GitHub Hosted Runners and Azure Pipelines.
You can force the installation of keyring using
echo "dummy" | az devops login
az devops logout
On the GitHub/Azure Pipelines hosted runner the provisioning of the image happens on a non-interactive session and az devops login
fails with an error along those lines.
ERROR: (1312, 'CredRead', 'A specified logon session does not exist. It may already have been terminated')
Unfortunatly, az devops
doesn't provide a clean way to install keyring during installation or through a command other than catching the failed login.
Describe the solution you'd like Either azdevops should preinstall keyring during normal installation of the extension. It's unclear why it's not included in the first place.
Or there should be a clean way to run az devops install-dependencies
or something to have it download its dependencies in a provisioning scenario.
I grabbed the following lines from the code:
$pythonPath = resolve-path (join-path ([System.IO.Path]::GetDirectoryName((Get-Command az).Path)) "..\python.exe")
Invoke-ValidateCommand -Command "& '$pythonPath' -m pip install keyring~=17.1.1 --target '$env:AZURE_EXTENSION_DIR\azure-devops' -vv --disable-pip-version-check --no-cache-dir"
and it works. But since there is no way to grab the correct version of keyring from anywhere other than the sourcecode of az devops, it's a very brittle contract.
For now we're looking into just swallowing any message printed by az devops login
. But that's of course also a brittle contract.
cli.azext_devops.dev.common.pip_helper: installing keyring~=17.1.1
cli.azure.cli.core.extension.operations: Running: ['C:\\Program Files\\Microsoft SDKs\\Azure\\CLI2\\python.exe', '-m', 'pip', 'install', 'keyring~=17.1.1', '--target', 'C:\\Users\\WDAGUtilityAccount\\.azure\\cliextensions\\azure-devops', '-vv', '--disable-pip-version-check', '--no-cache-dir']
Additional context
I know I could also set an environment variable with an access token instead of calling az devops login
, unfortunately that also adds a 10 second delay on the GitHub Hosted Runner / Azure Pipelines runner as it tries to run az login
, sees it fail and only then reverts to using the token supplied on the commandline.
So you might think that setting AZURE_DEVOPS_EXT_PAT
will be the fastest, but what happens under the hood might be surprising: az devops
will first try to run az login
and az devops login
before attempting to use the environment variable, even if you explicitly pass it in. There should be a way to force the configuration used.
Filed an issue:
az cli seems to spend 10 seconds looking for account config, even if it doesn't need them #27439
Describe the bug
While looking into what causes the az-cli to load slowly on the hosted agents of GitHub Actions and Azure pipelines I noticed that az tries to look for accounts and throws a couple of exceptions even if the account isn't needed.
In this case the access token to access Azure DevOps is supplied through the environment, yet az is trying to load accounts and spends unneeded time there.
It's hard to tell from the logs, bet between the last logged line and the error handling there is a jump of about 10 seconds in the Azure pipelines log. Would be lovely if that time waste could be prevented.
Related command
Az DevOps pipelines show ...
Errors
2023-09-15T18:37:12.7002991Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x0000025BD8AEC430>] 2023-09-15T18:37:12.7003914Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument..add_subscription_parameter at 0x0000025BD8AEF0A0>] 2023-09-15T18:37:22.0011676Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad [] 2023-09-15T18:37:22.0012102Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument..add_ids_arguments at 0x0000025BD8B51090>, <function register_cache_arguments..add_cache_arguments at 0x0000025BD8B511B0>] 2023-09-15T18:37:22.0026227Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded [] 2023-09-15T18:37:22.0026678Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreParseArgs [] 2023-09-15T18:37:22.3327775Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x0000025BD63AFF40>, <function CLIQuery.handle_query_parameter at 0x0000025BD63F92D0>, <function register_ids_argument..parse_ids_arguments at 0x0000025BD8B51120>, <function DevCommandsLoader.post_parse_args at 0x0000025BD8B72B90>] 2023-09-15T18:37:22.3348603Z DEBUG: cli.azext_devops.dev.common.services: PAT is present which can be used against this instance 2023-09-15T18:37:22.3375557Z DEBUG: cli.azure.cli.core: Current cloud config: 2023-09-15T18:37:22.3375819Z AzureCloud 2023-09-15T18:37:22.3376906Z DEBUG: cli.azext_devops.dev.common.services: az login is not present 2023-09-15T18:37:22.3382543Z DEBUG: cli.azext_devops.dev.common.services: There are no active accounts. 2023-09-15T18:37:22.3382797Z Traceback (most recent call last): 2023-09-15T18:37:22.3383541Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 537, in get_current_account_user 2023-09-15T18:37:22.3384147Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 546, in get_subscription 2023-09-15T18:37:22.3384535Z knack.util.CLIError: Please run 'az login' to setup account. 2023-09-15T18:37:22.3384768Z 2023-09-15T18:37:22.3385009Z During handling of the above exception, another exception occurred: 2023-09-15T18:37:22.3385200Z 2023-09-15T18:37:22.3385466Z Traceback (most recent call last): 2023-09-15T18:37:22.3385884Z File "C:\Program Files\Common Files\AzureCliExtensionDirectory\azure-devops\azext_devops\dev\common\services.py", line 61, in _get_credentials 2023-09-15T18:37:22.3386261Z token_from_az_login = get_token_from_az_logins(organization, pat_token_present) 2023-09-15T18:37:22.3386725Z File "C:\Program Files\Common Files\AzureCliExtensionDirectory\azure-devops\azext_devops\dev\common\services.py", line 118, in get_token_from_az_logins 2023-09-15T18:37:22.3387095Z dummy_user = profile.get_current_account_user() # noqa: F841 2023-09-15T18:37:22.3387524Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 539, in get_current_account_user 2023-09-15T18:37:22.3387878Z knack.util.CLIError: There are no active accounts. 2023-09-15T18:37:22.3388195Z INFO: cli.azext_devops.dev.common.services: received PAT from environment variable 2023-09-15T18:37:22.3388587Z INFO: cli.azext_devops.dev.common.services: Creating connection with personal access token.
Issue script & Debug output
2023-09-15T18:37:12.7002991Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x0000025BD8AEC430>] 2023-09-15T18:37:12.7003914Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument..add_subscription_parameter at 0x0000025BD8AEF0A0>] 2023-09-15T18:37:22.0011676Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad [] 2023-09-15T18:37:22.0012102Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument..add_ids_arguments at 0x0000025BD8B51090>, <function register_cache_arguments..add_cache_arguments at 0x0000025BD8B511B0>] 2023-09-15T18:37:22.0026227Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded [] 2023-09-15T18:37:22.0026678Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreParseArgs [] 2023-09-15T18:37:22.3327775Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x0000025BD63AFF40>, <function CLIQuery.handle_query_parameter at 0x0000025BD63F92D0>, <function register_ids_argument..parse_ids_arguments at 0x0000025BD8B51120>, <function DevCommandsLoader.post_parse_args at 0x0000025BD8B72B90>] 2023-09-15T18:37:22.3348603Z DEBUG: cli.azext_devops.dev.common.services: PAT is present which can be used against this instance 2023-09-15T18:37:22.3375557Z DEBUG: cli.azure.cli.core: Current cloud config: 2023-09-15T18:37:22.3375819Z AzureCloud 2023-09-15T18:37:22.3376906Z DEBUG: cli.azext_devops.dev.common.services: az login is not present 2023-09-15T18:37:22.3382543Z DEBUG: cli.azext_devops.dev.common.services: There are no active accounts. 2023-09-15T18:37:22.3382797Z Traceback (most recent call last): 2023-09-15T18:37:22.3383541Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 537, in get_current_account_user 2023-09-15T18:37:22.3384147Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 546, in get_subscription 2023-09-15T18:37:22.3384535Z knack.util.CLIError: Please run 'az login' to setup account. 2023-09-15T18:37:22.3384768Z 2023-09-15T18:37:22.3385009Z During handling of the above exception, another exception occurred: 2023-09-15T18:37:22.3385200Z 2023-09-15T18:37:22.3385466Z Traceback (most recent call last): 2023-09-15T18:37:22.3385884Z File "C:\Program Files\Common Files\AzureCliExtensionDirectory\azure-devops\azext_devops\dev\common\services.py", line 61, in _get_credentials 2023-09-15T18:37:22.3386261Z token_from_az_login = get_token_from_az_logins(organization, pat_token_present) 2023-09-15T18:37:22.3386725Z File "C:\Program Files\Common Files\AzureCliExtensionDirectory\azure-devops\azext_devops\dev\common\services.py", line 118, in get_token_from_az_logins 2023-09-15T18:37:22.3387095Z dummy_user = profile.get_current_account_user() # noqa: F841 2023-09-15T18:37:22.3387524Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 539, in get_current_account_user 2023-09-15T18:37:22.3387878Z knack.util.CLIError: There are no active accounts. 2023-09-15T18:37:22.3388195Z INFO: cli.azext_devops.dev.common.services: received PAT from environment variable 2023-09-15T18:37:22.3388587Z INFO: cli.azext_devops.dev.common.services: Creating connection with personal access token.
Expected behavior
Account info is only loaded when needed. Lack of account data doesn't result in a swallowed error.
Environment Summary
azure-cli 2.52.0
core 2.52.0 telemetry 1.1.0
Extensions: azure-devops 0.26.0
Dependencies: msal 1.24.0b1 azure-mgmt-resource 23.1.0b2
Python location 'C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe' Extensions directory 'C:\Program Files\Common Files\AzureCliExtensionDirectory'
Python (Windows) 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
Additional context
First run of az on the hosted runner always causes a rebuild of the command index
Similar to the case matters issue above, the first time az
is invoked on the hosted runners a command index rebuild is triggered. Why? Because the command index doesn't exist in the user profile of the user that runs your workflow. These files are normally stored in your profile directory under ~/.azure
and ~/.azure-devops
, but since your workflow runs in a pristine user profile, these cached files are gone.
Submitted a pull request to move these folders to a persistent location:
Runs az and az-devops during warmup #8294
Description
Speed up performance of az
and az devops
. In my own experiments savings are sometimes more than 60 seconds.
-
This is done by changing the default settings of
az
based on the recommended settings fromaz init
for automation workflows:- Enables the settings recommended by
az init
for automation workflows. - Disables auto-upgrade
- Disables colors
- Disables surveys
- Disables progress bars
- Disables recommendations
- Enables the settings recommended by
-
It invokes
az
andaz devops
during the image build to ensure Python pre-caches its objects. -
It moves the
.azure
and.azure-devops
and.azure-devops/cache
folders into the CommonFiles directory similarly to how the runner already moved installed extensions to the CommonFiles directory.
First call performance, especially of az devops
commands will still be slower than subsequent calls due to the fact that az devops
fetches and caches information about the Azure DevOps Organization that it connects to. These calls still add about a 15s overhead. Users that want to speed up az devops
further could consider caching the $env:AZURE_DEVOPS_CACHE_DIR
.
Links to sources used to make this pull request:
-
az init
automation recommended settings. - I left out the one that only outputs errors as it is hard to see what the workflow/pipeline's doing. Ideally that setting would tigger on the diagnostics flag of the workflow. -
az-cli
configuration directory environment variable is defined here -
az devops
configuration environment variable is defined here -
az devops
cache environment variable is defined in multiple places, it sometimes keys off of the configuration environment variable but sometimes it's hardcoded against the home directory if the environment variable isn't defined. Filed: https://github.com/Azure/azure-devops-cli-extension/issues/1366
Technically changing any default configuration settings can be breaking. My personal view on these settings:
-
az config set auto-upgrade.enable=false
- should not break anyone. The hosted image is updated regularly enough that it should not be needed to auto-upgrade az-cli during a workflow/pipeline run. -
az config set core.error_recommendation=off
- should not break anyone. Worst case it makes debugging scripts a little harder, though running the same script locally should still provide recommendations. -
az config set core.disable_progress_bar=true
- progress bars don't work in the hosted runner anyway. -
az config set core.no_color=true
- will make the logs less pretty, but also fixes issues where certain ascii colors aren't coming through correctly. Should not really break anyone. -
az config set core.survey_message=false
- This setting suppresses the survey links that the CLI sometimes outputs, they have no place in a pipeline or workflow anyway.
I had already removed the settings from the suggestions made by az init
and left them on default:
-
az config set logging.enable_log_file=no
- I personally don't rely on the log file, I suppose someone could. This might be the most breaking change in the list.. There is anecdotal evidence in other github issues that suggests file logging slows down az-cli. I'd prefer to make it opt in. The less the agent does by default the better. -
az config set only_show_errors=true
- technically not a breaking change and making the log less chatty. I removed this cause I feel it makes it harder to see what the cli is and isn't doing. Ideally this flag would be set based on the diagnostic settings of the pipeline itself. -
az config set collect_telemetry=false
- though there is a lot of anecdotal evidence that telemetry slows downaz
or some of its extensions, I imagine the cli team would like to get telemetry from the hosted runners.
People who want to save a little more time can opt to add these settings to their pipeline/workflow.
Related issue:
Fixes: #8296?
Check list
- [x] Related issue / work item is attached
- [ ] Tests are written (if applicable)
- [ ] Documentation is updated (if applicable)
- [x] Changes are tested and related VM images are successfully generated
First run of az on hosted runner causes python compilation
Because the image generation code never actually runs az
and az devops
and because the ~/.azure
directory and ~/.azure-devops
aren't made available to the user that runs the workflow; the first run compiles several python modules and caches the result.
This is also fixed by setting environment variables and calling az --help
and az devops --help
during image generation.
Submitted a pull request to move these folders to a persistent location and warm-up az
and az devops
:
%{ embed https://github.com/actions/runner-images/pull/8294 %}
Some of the data that az devops
needs isn't stored under ~/.azure
, but has its own config path under ~/.azure-devops
. While trying to redirect that folder and the underlying ./python/cache
folder I found out that not all the ways you can redirect these folders have the exact same effect, because the cache folder location is computed in different ways.
So, I filed another issue:
Bug: the azure-devops cache folder location is computed differently in different places #1366
The Azure DevOps CLI has a way to redirect the config and the cache directory to a different location.
In most code paths it's enough to only redirect the CONFIG directory and in that case the cache folder will automatically be placed underneath the config folder.
But there are a few places in which the cache folder is placed under the home directory instead.
I'd expect a single consistent behavior.
https://github.com/search?q=repo%3AAzure%2Fazure-devops-cli-extension+os.getenv&type=code
Default configuration of az
The default configuration of az
leaves several settings enabled or unconfigured:
- Telemetry
- Disk logging
- Progress bars
- Console colors
- Survey prompts
- Auto-upgrade
There is a pretty unknown az init
command which will prompt whether you want to run in an interactive or an automation environment and it forces a number of these settings to off.
So, I filed a pull request to do the same thing on the Hosted Runner:
Runs az and az-devops during warmup #8294
Description
Speed up performance of az
and az devops
. In my own experiments savings are sometimes more than 60 seconds.
-
This is done by changing the default settings of
az
based on the recommended settings fromaz init
for automation workflows:- Enables the settings recommended by
az init
for automation workflows. - Disables auto-upgrade
- Disables colors
- Disables surveys
- Disables progress bars
- Disables recommendations
- Enables the settings recommended by
-
It invokes
az
andaz devops
during the image build to ensure Python pre-caches its objects. -
It moves the
.azure
and.azure-devops
and.azure-devops/cache
folders into the CommonFiles directory similarly to how the runner already moved installed extensions to the CommonFiles directory.
First call performance, especially of az devops
commands will still be slower than subsequent calls due to the fact that az devops
fetches and caches information about the Azure DevOps Organization that it connects to. These calls still add about a 15s overhead. Users that want to speed up az devops
further could consider caching the $env:AZURE_DEVOPS_CACHE_DIR
.
Links to sources used to make this pull request:
-
az init
automation recommended settings. - I left out the one that only outputs errors as it is hard to see what the workflow/pipeline's doing. Ideally that setting would tigger on the diagnostics flag of the workflow. -
az-cli
configuration directory environment variable is defined here -
az devops
configuration environment variable is defined here -
az devops
cache environment variable is defined in multiple places, it sometimes keys off of the configuration environment variable but sometimes it's hardcoded against the home directory if the environment variable isn't defined. Filed: https://github.com/Azure/azure-devops-cli-extension/issues/1366
Technically changing any default configuration settings can be breaking. My personal view on these settings:
-
az config set auto-upgrade.enable=false
- should not break anyone. The hosted image is updated regularly enough that it should not be needed to auto-upgrade az-cli during a workflow/pipeline run. -
az config set core.error_recommendation=off
- should not break anyone. Worst case it makes debugging scripts a little harder, though running the same script locally should still provide recommendations. -
az config set core.disable_progress_bar=true
- progress bars don't work in the hosted runner anyway. -
az config set core.no_color=true
- will make the logs less pretty, but also fixes issues where certain ascii colors aren't coming through correctly. Should not really break anyone. -
az config set core.survey_message=false
- This setting suppresses the survey links that the CLI sometimes outputs, they have no place in a pipeline or workflow anyway.
I had already removed the settings from the suggestions made by az init
and left them on default:
-
az config set logging.enable_log_file=no
- I personally don't rely on the log file, I suppose someone could. This might be the most breaking change in the list.. There is anecdotal evidence in other github issues that suggests file logging slows down az-cli. I'd prefer to make it opt in. The less the agent does by default the better. -
az config set only_show_errors=true
- technically not a breaking change and making the log less chatty. I removed this cause I feel it makes it harder to see what the cli is and isn't doing. Ideally this flag would be set based on the diagnostic settings of the pipeline itself. -
az config set collect_telemetry=false
- though there is a lot of anecdotal evidence that telemetry slows downaz
or some of its extensions, I imagine the cli team would like to get telemetry from the hosted runners.
People who want to save a little more time can opt to add these settings to their pipeline/workflow.
Related issue:
Fixes: #8296?
Check list
- [x] Related issue / work item is attached
- [ ] Tests are written (if applicable)
- [ ] Documentation is updated (if applicable)
- [x] Changes are tested and related VM images are successfully generated
These changes are currently under consideration as they don't have the same impact as the other changes mentioned above ( some could be breaking ). You could set these in your own workflows if you'd like.
And filed a pull request to turn off auto-upgrade in az init
as well:
configures survey messages and telemetry #6793
This checklist is used to make sure that common guidelines for a pull request are followed.
Related command
Adds settings for [core.survey_message](https://github.com/Azure/azure-cli/issues/25242)
and core.collect_telemetry
to az init
.
Adds the following defaults:
- Manual Mode
- Survey=on
- Telemetry=on
- Auto-update=on
- Automation Mode
- Survey=off
- Telemetry=off
- Auto-update=off
General Guidelines
- [ ] Have you run
azdev style <YOUR_EXT>
locally? (pip install azdev
required) - [x] Have you run
python scripts/ci/test_index.py -q
locally?
For new extensions:
- [x] My extension description/summary conforms to the Extension Summary Guidelines.
About Extension Publish
There is a pipeline to automatically build, upload and publish extension wheels.
Once your pull request is merged into main branch, a new pull request will be created to update src/index.json
automatically.
You only need to update the version information in file setup.py and historical information in file HISTORY.rst in your PR but do not modify src/index.json
.
AzureCLI@2
does a version update check
When you use the AzureCLI@2
task, the first thing it does is running
az --version
Which in turn performs and update check and caches the results, but as you can guess by now, that cached result isn't available in the pristine profile of the user running your workflow.
I feel that AzureCLI@2
has no business doing an update check anyway and it turns out that there is a second command az version
(note the lack of --
). It mostly does the same thing but skips the update check.
Filed another pull request:
Replaces `az --version` with `az version` #19039
az version
doesn't perform an update check and is faster.
Fixes: #19006
Task name: AzureCLI
Description: Replaces the call to az --version
with az version
.
Documentation changes required: N
Added unit tests: N
Attached related issue: #19006
Checklist:
- [x] Task version was bumped - please check instruction how to do it
- [x] Checked that applied changes work as expected
And while it's nice that the alternative command exists, it would seem better to me to make the version-check configurable:
`az --version` performs an update check #27437
Describe the bug
When running az --version
an update check is performed against az-cli and the extensions.
Azure Pipelines' AzureCLI task runs az --version
to capture the version details in the logs.
It would save quite a bit of CPU if az-cli would not perform an update check in a CI/CD scenario.
There already is an core.auto-upgrade=false option, would it not make sense to piggy back on that or to add another config setting to turn off update checking.
Related command
az --version
Errors
N/a
Issue script & Debug output
N/a
Expected behavior
No update check is performed in CI.
Environment Summary
2.52.0
Additional context
No response
Auto-detecting your Azure DevOps collection and Project is slower
Many az devops
commands need to know the location of your Azure DevOps organization and the name of the project you're connecting to. If you don't explicitly pass these in, az devops
will try to run git remote --verbose
to look up the details. This adds another little bit of overhead.
Since Azure Pipelines already knows the organization and the project and sets environment variables for you, you might as well pass them in explicitly:
- pwsh: |
az pipelines runs show --id $env:BUILD_BUILDID --query "definition.id" `
--organization $env:SYSTEM_COLLECTIONURI `
--project $env:SYSTEM_TEAMPROJECT
It would be nice if auto-detect
would use these environment variables if they are available. Haven't filed an issue for that yet.
On Azure Pipelines everything runs inside PowerShell 5.1
In Azure Pipelines everything except Node based tasks run inside PowerShell 5.1. Even a Windows Shell task (CmdLine@2
) runs inside of PowerShell 5.1:
Starting: CmdLine
==============================================================================
Task : Command line
Description : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
Version : 2.212.0
Author : Microsoft Corporation
Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
==============================================================================
##[debug]VstsTaskSdk 0.9.0 commit 6c48b16164b9a1c9548776ad2062dad5cd543352
##[debug]Entering D:\a\_tasks\CmdLine_d9bafed4-0b18-4f58-968d-86655b4d2ce9\2.212.0\cmdline.ps1.
========================== Starting Command Output ===========================
##[debug]Entering Invoke-VstsTool.
##[debug] Arguments: '/D /E:ON /V:OFF /S /C "CALL "D:\a\_temp\5cf768ae-b47d-41b4-a176-02a9957e0e28.cmd""'
##[debug] FileName: 'C:\Windows\system32\cmd.exe'
##[debug] WorkingDirectory: 'D:\a\1\s'
Unlike GitHub Actions, even pwsh
runs inside of PowerShell 5.1 in Azure Pipelines. This makes script:
and pwsh:
and a tiny bit slower on Azure Pipelines than using powershell:
But under certain conditions (like the one below), this tiny bit slower can add a quite noticeable 6s to 10s additional delay.
I requested that a proper PowerShell Core Execution handler is added to the Azure Pipelines Agent.
[enhancement]: Add PowerShell Core 7 execution handler #4455
Describe your feature request here
All tasks that use the PowerShell and PowerShell3 execution handler rely on PowerShell 5.1 as the main execution engine of the task. Any task that relies on PowerShell Core must call PowerShell Core either from PowerShell 5.1 or from Node.
Back in 2015 when the agent was first introduced and PowerShell Core was mostly a gimmick, it made sense, maintaining 2 different but mostly equal execution handlers only add maintenance overhead.
Fast-forward 8 years and PowerShell Core is now the gold standard. It's faster, cross platform and most new features come to it, including fancy language features. Most of all its cross platform and it doesn't depend on 4MB of random node modules.
It's time to introduce a proper PowerShellCore
execution handler that doesn't rely on PowerShell 5.1, that has its own VstsTaskSdk and that runs cross-platform.
Why?
- It's faster. In the past days I've been looking at hosted agent performance and
pwsh:
as well asAzureCLI
andAzurePowerShell
andPowerShell
tasks configured to use PowerShell Core add a few seconds overhead to each task execution. - It's cross platform. I'm no star in Bash nor a hero in Typescript, but I know my way around the CLR and have written quite a bit of PowerShell, let me take my skills to Linux and Mac as well.
- It's more secure than Node based Tasks. Node based tasks easily carry 100s of additional Node Packages to do things that are built-in to PowerShell Core. XML parsing? Json parsing? Zipping archives, XSLT, JsonPath, REST API calls, you name it. In my research I can't find any PowerShell based extension with security vulns, yet at the moment there is no Node based tasks that doesn't trigger at least 2 critical warnings.
- It's where the current state of the market is at. Even PowerShell itself tries to tell me every time I accidentally open the wrong shell:
Install the latest PowerShell for new features and improvements! https://aka.ms/PSWindows
. - GitHub Actions already has support for PowerShell core as default script handler and you can use PowerShell Core in composite actions to build your tasks. Having the same option in Azure Pipelines would make it easier to port tasks between the wo.
Windows might be trying to be helpful
While chatting with an engineer on the actions/runner-images team, we talked about a few issues that have been plaguing the hosted runner recently. It turns out that Windows Server has added a few more ways to ensure that Windows Update always runs and to run a few jobs to keep Windows performing at its best.
The hosted runner images are updated very frequently and when the image runs, it only needs to run one job before it's deleted. As such, these services are disabled during the generation of the Windows runner images.
Until recently that is.
A change in Windows has caused the "Storage Service" and the "Windows Update Service" to be re-enabled and restarted as soon as the image boots up. And if you're unlucky they slow down the hosted runner while doing their intended job.
A new image will be rolling out shortly, one that again turns off these services , this time a little more forceful:
Agents very slow #8380
Description
Since September 21 agents are very slow. Even simple tasks that usually take 1-2 seconds now take 25-30 seconds. A comparison of some tasks from two runs can be seen below: On the left side we have a run before September 21 (simple tasks taking 1-2 seconds to complete), on the right side we have the same tasks running on September 25 (same tasks taking 25-30 seconds to complete.
Platforms affected
- [X] Azure DevOps
- [ ] GitHub Actions - Standard Runners
- [ ] GitHub Actions - Larger Runners
Runner images affected
- [ ] Ubuntu 20.04
- [ ] Ubuntu 22.04
- [ ] macOS 11
- [ ] macOS 12
- [ ] macOS 13
- [X] Windows Server 2019
- [X] Windows Server 2022
Image version and build link
Issue started from version 20230918.1.0
Is it regression?
20230912.1.0
Expected behavior
Tasks are fast to execute
Actual behavior
Task execution is very slow
Repro steps
[windows] disable StorSvc for runners #8388
Description
StorSvc service wakes up in 700-800 sec and scans all files, which affects runner a lot. Let us try to disable it
Related issue:
Check list
- [ ] Related issue / work item is attached
- [ ] Tests are written (if applicable)
- [ ] Documentation is updated (if applicable)
- [ ] Changes are tested and related VM images are successfully generated
In the meantime, you can use this snippet in your workflow to turn these services back off:
Azure Pipelines:
# 🔁 Azure Pipelines
steps:
- pwsh: |
$services = @("WaasMedicSvc", "StorSvc", "wuauserv")
$services | %{
$action = "''/30000/''/30000/''/30000"
$output = sc.exe failure $_ actions=$action reset=4000
stop-service $_ -force
if ($_ -ne "WaasMedicSvc")
{
Set-Service -StartupType Disabled $_
}
if ((get-service $_).Status -ne "Stopped")
{
$id = Get-CimInstance -Class Win32_Service -Filter "Name LIKE '$_'" |
Select-Object -ExpandProperty ProcessId
$process = Get-Process -Id $id
$process.Kill()
}
}
$services | %{
write-host $_ $((get-service $_).Status)
}
- checkout: self
GitHub Actions:
# ▶️ GitHub Actions
steps:
- script: |
$services = @("WaasMedicSvc", "StorSvc", "wuauserv")
$services | %{
$action = "''/30000/''/30000/''/30000"
$output = sc.exe failure $_ actions=$action reset=4000
stop-service $_ -force
if ($_ -ne "WaasMedicSvc")
{
Set-Service -StartupType Disabled $_
}
if ((get-service $_).Status -ne "Stopped")
{
$id = Get-CimInstance -Class Win32_Service -Filter "Name LIKE '$_'" |
Select-Object -ExpandProperty ProcessId
$process = Get-Process -Id $id
$process.Kill()
}
}
$services | %{
write-host $_ $((get-service $_).Status)
}
- uses: actions/checkout@v4
Some people might blame Microsoft in general for these issues, but of course this is an interaction caused by changes made by 2 teams with hugely different goals. Where the Windows team is trying to make Windows as secure and up to date as possible, the Runners team is trying to create an image of Windows that is as lean as it can be, up-to-date, secure, yet can run 100s of different tools ranging from .NET 4 to .NET 8 and from Python to Rust as well as Node and PHP. Strange interactions are almost impossible to prevent.
az devops
caches instance level data
Whenever you connect to an Azure DevOps Organization, az devops
will make 2 calls:
OPTIONS /{{organization}/_apis
GET /{organization}/_apis/ResourceAreas
These calls return all the endpoints and their calling convention. But these are unlikely to change often. I've seen these calls adding 10-30s to the first call to az devops
that make an API call. So, I went looking for ways to cache the data.
I haven't found the ideal solution yet, but this is a working prototype that will make az devops
reuse the data from a previous workflow run:
# ⚠️ WARNING work in progress
steps:
- task: Cache@2
inputs:
key: '".az-devops"'
path: 'C:\Users\VssAdministrator\.azure-devops'
displayName: "Restore .azure-devops"
- pwsh: |
(dir C:\Users\VssAdministrator\.azure-devops\python-sdk\cache\*.json) | %{$_.LastWriteTime = Get-Date}
displayName: "Reset last modified on .azure-devops/cache"
Unfortunately, it doesn't work under all circumstances yet as the files may not always be there. And the pull request that will change the location of the .azure-devops
folder, will change the paths in the snippet above.
It might be worth creating a custom task that caches these files. Unfortunately, custom tasks can't seem to access the cache in Azure Pipelines (or at least not through a mechanism in the agent or task-lib)
I've filed an issue to allow tasks to cache data:
[enhancement]: Add support for caching from an arbitrary task #4440
Describe your feature request here
For my azure pipelines extension, I'd like to be able to cache a few files to speed up future executions.
Right now, that's only possible if the user adds a Cache@2
task to their pipeline where needed preceded by a script step to calculate cache keys and decide whether to run the step or not.
This logic is MUCH easier to do inside my own task, as I'd be able to handle everything from there.
My proposal is to add a tasklib function similar to the ones used by GitHub Action's setup-node step.
Ideally it would use the same efficient mechanism as the current Cache steps, but would allow me to have influence over whether to overwrite the cache at the end of the job or not.
I'd use this to also cache files that can't compute the cache key based on the contents of the repo. Instead, I'd use the return value of an API call or check whether the files to cache have changed since the last run.
Conclusion
You can see results of all these changes below:
_Compare the execution time with the cached files (-on
) and without them (-off
). _
The combined set of changes reduces the time it takes to run az
for the first time in a workflow by about 1 minute. Even with the added overhead of setting up the user profile the whole workflow is faster.
Some of these issues aren't exclusive to Windows either. The case sensitivity issue, keyring installation, Azure CLI's default settings and auto-detection of Azure DevOps Organization apply to Ubuntu and MacOS runners too.
If you want to tinker a bit with these things yourself, here is an Azure Pipelines file that I used to test most of these scenarios.
The Hosted Runner is a complex thing. It's built for convenience and has 100s of tools preinstalled on it. Many of these tools have funny interactions with each other. All of those are running an operating system that is also constantly changing.
Many of the tools we rely on to develop and deploy our code are flexible, extensible, and configurable and aren't always designed to run on a temporary server that executes the tool just once.
At the same time, az
must run thousands, if not millions of times each day on the Windows Hosted runners and the issues above cause it to regularly run more than a minute longer than needed. I'm hoping that the work I've put in this week will not only speed up your workflows, but will save us all money. And - above all - it will hopefully reduce carbon emissions and water usage.
This investigation dug mostly into a single tool and its extensions. There may be an unknown number of similar issues lurking in the set of tools you rely on. Please do take the time to investigate and file issues, submit pull requests, and help improve these systems for all of us.
A huge shoutout to ilia-shipitsin for looking into these issues and testing them on actual hosted runners.
If these fixes are making your pipelines a little faster, consider sponsoring me to allow me to do these kinds of things more often.
Featured ones: