Azure DevOps Self-Hosted VMSS Agents Part 1

Creating Ubuntu 20.04 Azure DevOps Microsoft Agent Image in Azure Compute Gallery Image.

TLDR

Running the commands in the Create sections of each post will create the following:

  • An Ubuntu 20.04 Linux VM image stored in an Azure Compute Gallery which will be the same image as used in Microsoft hosted agents i.e. ubuntu-latest OR ubuntu-20.04.
  • Azure Virtual Network to host a Virtual Machine Scale Set.
  • Azure Virtual Machine Scale Set based on the Ubuntu 20.04 Linux VM image, assigned with an Azure AD Managed Identity.
  • A new Azure DevOps project.
  • A new Azure DevOps AzureRM service connection.
  • A new Azure virtual machine scale set agents pool using the VMSS created above.
  • New Azure DevOp Pipelines:
    1. Basic VMSS test pipeline.
    2. Example basic Terraform pipeline configured to use Azure Managed Identity.
    3. Scheduled pipeline to run the Ubuntu 20.04 Linux VM image process.

Overview

A while back I came across a very interesting blog post: Use Azure DevOps to create self-hosted Azure DevOps build agents, just like Microsoft does. This introduced me to GitHub Actions Virtual Environments (Update 2022-09-17: these are now known as runner-images), which is the open-sourced image creation process used for GitHub Actions hosted runners and Azure DevOps Pipelines Microsoft-hosted agents. Note that the Azure DevOps builds of the runner-images repo are publicly accessible here.

Similar to the goal of the above blog post I wanted to automate the process of consuming and building the runner-images images, with minimal changes to the source code, and store the images in an Azure Compute Gallery. I also wanted to automate this process as much as possible including the creation of an Azure Virtual Machine Scale Set and an Azure virtual machine scale set agents.

There are a few reasons why self-hosting agents in this fashion might be a requirement:

  • Limited build minutes for Microsoft hosted agents.
  • Maximum pipeline execution time exceeds that allowed by Microsoft hosted agents.
  • Line of sight to resources that are only accessible internally.
  • You want to run some Infrastructure as Code tool e.g. Terraform and for compliance reasons you want the remote state stored in Azure storage so that it is restricted to internal access only.
  • A VM feature capability requirement not available in Microsoft hosted agents.
  • Autoscaling of self-hosted agents based on Azure DevOps Pipelines demand.
  • Scaling includes scaling to 0 when there is no demand. This is especially handy if you are using your own Azure account and don't want to have a VM hanging around being paid for (like this blog series).
  • The option of using your own VM images to meet your CI/CD requirements.
  • Containerized agents don't give you exactly what you are looking for.

This series of related posts is aimed to demonstrate the above goal with an example of building an Ubuntu 20.04 Azure DevOps Self-Hosted VMSS agent pool.

The series of posts will consist of:

Requirements

The base requirements are defined in the instructions for building runner-images i.e.:

Build Agent requirements

I am assuming that based on the fact you are reading this that you can handle setting up these prerequisites, so I won't detail that here. Although, to make things easier I have included a devcontainer in the azdo-vmss-virtual-environments repository that points to a container image, this includes everything that is needed to work through the examples.

Connectivity requirements

There is also a requirement that the system running the script is able to communicate outbound on 22/tcp (SSH) to the VM that is created in Azure to run the packer build.

Azure Requirements

As mentioned in the runner-images Service principal section you will need an Azure Subscription with full read-write permissions. To keep things simple I would suggest targeting a subscription that you are able to create a service principal and assign the Owner role to (or alternatively Contributor plus User Access Administrator), for example:

1
2az ad sp create-for-rbac -n "sp-virtual-environments-images" --role Owner --scopes /subscriptions/00000000-0000-0000-0000-000000000000

Once the service principal has been created you will use the outputs from the az ad sp create-for-rbac command to set environment variables prior to executing the scripts that construct the various Azure resources:

1
2 export ARM_SUBSCRIPTION_ID=00000000-0000-0000-0000-000000000000
3 export ARM_TENANT_ID=00000000-0000-0000-0000-000000000000
4 export ARM_CLIENT_ID=00000000-0000-0000-0000-000000000000
5 export ARM_CLIENT_SECRET=AAABjkwhs7862782626_BsGGjkskj_MaGv"

Note: Prefixing the command with a space means that the command will not appear in the bash history.

Created Azure Resources

By default, the following resources will be created in the target Azure subscription as part of the scripted execution:

ResourceResources namesDescription
Resource Grouppkr-Resource-Group-1vyb5m4alw (example)Temporary resource group created by runner-images script for Packer build resources
Public IP Addresspkrip1vyb5m4alw (example)Temporary Public IP address to allow access to the build VM (restricted by source IP) by the NSG below
Regular Network Interfacepkrni1vyb5m4alw (example)Temporary NIC for the packer build VM
Network security grouppkrsg1vyb5m4alw (example)Temporary NSG to restrict access to the packer build VM only from the source IP address of where the script is run from
Virtual machinepkrvm1vyb5m4alw (example)Temporary VM used for the packer image build
Virtual networkpkrvn1vyb5m4alw (example)Temporary VNet and subnet to host the packer build VM
Resource Grouprg-ve-imagesResource group to store the storage account that contains the packer generated .VHD
Storage accountrgveimages001Storage account mentioned above. Note that this is automatically generated based on the naming of the parent resource group
Resource Grouprg-ve-acg-01Resource group to store the Azure Compute Gallery
Azure compute galleryacg_01Azure Compute Gallery resource
VM image definitionubuntu20VM image definition
VM image version1.0.0VM image version. This is an automatically incrementing version in semantic version format, starting by default as 1.0.0

Many of the above values can be changed as well as other settings by exporting environment variables prior to running the script.

Environment variableDefault valueDescription
AZ_RESOURCE_GROUP_NAMErg-ve-imagesResource group where generated packer .VHD images will be created
AZ_LOCATIONuksouthLocation for all created resources
AZ_ACG_RESOURCE_GROUP_NAMErg-ve-acg-01Name of the resource group where the Azure Compute Gallery will be created
AZ_ACG_NAMEacg_01Azure Compute Gallery name
AZ_VMSS_RESOURCE_GROUP_NAMErg-vmss-azdo-agents-01Resource group where the VM Scale Set will be created
AZ_VMSS_NAMEvmss-azdo-agents-01The name of the target VMSS. This will be queried to check the current VM Image version
VE_REPOhttps://github.com/actions/virtual-environments.gitThe source repository of the runner-images code
VE_IMAGE_PUBLISHERactionsPublisher name assigned to the VM image definition
VE_IMAGE_OFFERvirtual-environmentsOffer name assigned to the VM image definition
VE_IMAGE_SKUUbuntu2004SKU name assigned to the VM image definition
VE_IMAGE_DEFubuntu20The VM image definition name
VE_IMAGES_TO_KEEP2How many versions of the image should be kept, older versions will be automatically deleted
VE_IMAGES_VERSION_START1.0.0The starting version number to be used if no existing image versions exist
VE_RELEASEubuntu20/latestA specific release, commit or latest tagged version of the runner-images repo

Note: If changing the default values for one script the change must be made consistently to all scripts.
Note: There are certain Azure Compute Gallery naming requirements described in Create a gallery.
Note: Tagged releases can be obtained from https://github.com/actions/virtual-environments/tags.

Security Considerations

There are some security considerations when running self-hosted agents. From a positive perspective using self-hosted agents can help from a compliance perspective. In that access to Azure resources can be made available internally and restrictions put in place on those resources that are common compliance requirements, for example, making storage accounts only accessible via Private endpoints.

Also consider the potentially negative side of things. Placing the agents internally has the benefit that it can access internal resources, but this can also obviously be a negative. Consider very carefully using self-hosted agents for public repositories, in fact don't do it unless there is a very good reason why you should. The potential for abuse on public repositories is particularly highlighted if you consider what could be exploited if you run pull request automation on self-hosted agents, don't do this!

Another aspect that is worth considering in regards to using the runner-images images in a self-hosted approach is that they contain a lot of packages and as we know software packages can contain vulnerabilities. You might want to consider having levels of trust for your self-hosted images and only expose internal resources based on that level of trust. For example, runner-images images are only allowed internal access to required storage accounts to store Terraform remote state or build artifacts. Then only allow images that have a much smaller security footprint and are regularly scanned for vulnerabilities access to internal systems. It is up to you based on your compliance and security requirements, I just wanted to highlight these considerations at this point.

Now that you are aware of all of the above we can go ahead and build our image. As mentioned in the Build Agent requirements there are a few things that need to be in place for this all to work. You might already have the prerequisites installed on your target system or able to use Docker Desktop. Visual Studio Code, Remote Containers and use the .devcontainer approach. To keep things simple I will just describe the process below running within a Docker container, so you can follow if you are able to run Linux based containers, but feel free to use any method you are comfortable with.

Note: in later posts in this series we will show how this can be done in Azure DevOps Pipelines.

First run our deployment environment:

1
2docker run -it --rm ghcr.io/tonyskidmore/azure-tools:latest /bin/bash

Within the container clone the repository that contains the scripts that will generate our Azure resources

1
2git clone https://github.com/tonyskidmore/azdo-vmss-virtual-environments.git
3cd azdo-vmss-virtual-environments

Export the necessary environment variables that the scripts use to authenticate to Azure, substituting the values below with your Azure service principal details (as per: Azure Requirements)

1
2 export ARM_SUBSCRIPTION_ID=00000000-0000-0000-0000-000000000000
3 export ARM_TENANT_ID=00000000-0000-0000-0000-000000000000
4 export ARM_CLIENT_ID=00000000-0000-0000-0000-000000000000
5 export ARM_CLIENT_SECRET=AAABjkwhs7862782626_BsGGjkskj_MaGv

Launch the script to build the image and create the Azure resources:

1
2scripts/ve-image-create.sh

Update 2022-09-17:

The runner-images also now supports Ubuntu 22.04, which can be built either by exporting the following environment variables or supplying them inline (as shown below):

1
2VE_IMAGE_SKU=Ubuntu2204 VE_IMAGE_DEF=Ubuntu2204 VE_RELEASE=ubuntu22/latest scripts/ve-image-create.sh

At the time of writing this update there was an issue: /imagegeneration/SoftwareReport/SoftwareReport.Generator.ps1 : The term 'zstd' is not recognized as a name of a cmdlet that was preventing a successful build of the Ubuntu 20.04 image, so building the Ubuntu 22.04 would be a temporary workaround to get a working image build until that issue is fully resolved.

Always check the runner-images Azure DevOps project if you get errors when trying to build the images yourself, as there could be a current problem. Also, check the runner-images issues to see if the error that you have encountered is already known about or raise an issue if it is not.

After the script has successfully completed, which normally takes 2-3 hours, the Azure Compute Gallery and other resources should have been created as per the Created Azure Resources section.

You are now ready to move on to the creation of the VM Scale Set!