Azure Automation with Hybrid Workers and Runbooks

Today’s Weather: A brisk 32 F in Dallas, TX @ 3:30PM. Sunny, but I’m glad to be inside.

Background

I’ve been spending a lot of time with Terraform recently and wanted to build something with some real world use. I had done some light reading on Azure Automation Runbooks and Hybrid Workers, and decided to give it a try

All built with Terraform, NO GUI clicking required.

Here’s a link to the Github repo in case you want to check it out. I’ll be breaking it down block by block here.

Architcture & Design

Test

Azure Automation Accounts and Key Vaults are PaaS services that run in Azure’s managed infra. If you wanted to execute code and authenticate with certificates in Key Vault, you wouldn’t even need to use a VM. However, I wanted to disable public access to my Key Vault.

Once a Key Vault is set to private, you will need to connect it to your subnet with a Private Endpoint, and use a VM with the Hybrid Worker Extension to run your code.

The Automation Account holds the runbooks (in this case containing PowerShell and Azure CLI scripts) and passes them to the VM for execution. The Automation Account also defines my Hybrid Worker Groups and Hybrid Workers. I only have one VM in this example, but in theory you could have dozens of VMs in dozens of Worker Groups to distribute the load.

At runtime, the Automation Account first authenticates with the Key Vault via RBAC linked to the Automation Account’s Managed Identity. The Automation Account then passes the script to the VM for actual execution. The VM reaches out to the Key Vault via the Private Endpoint to retrieve any secrets or certificates requested in the runbook. From there the VM runs whatever script it was sent, and returns any outputs to the Automation Account job log.

One interesting thing of note here is the way authentication to the Key Vault works. When I was testing this, I was giving the VM’s Managed Identity the “Key Vault Secrets User” role, but it kept failing authentication. It seems that even though the script is running on the VM, and the VM is actually reaching out to the Key Vault for secrets, the Automation Accounts Managed Identity is what needs to be given those “Secrets User” or “Certificates User” roles.

Costs

These services are fairly low cost. If you use the serverless version of the automation runbooks, you get 500 free minutes of runtime each month. So your compute would be totally free if you kept it under that amount. However, we are using an Azure VM for compute, so we can’t take advantage of that.

I defined several schedules to keep costs as low as possible. Lets say I want my script to run once a week at 3pm. I have a runbook that will start the VM a few minutes ahead of 3pm. The script then runs at 3pm and then another runbook deallocates the VM a few minutes later. While this may not be ideal for larger workloads, deploying it this way will keep costs as low as possible.

Code Breakdown

First our opening Terraform block. Mostly standard. We will use the data “azurerm_client_config” “current” {} later to reference our Azure connection context

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
terraform {
 required_providers {
   azurerm = {
     source  = "hashicorp/azurerm"
     version = "~> 4.0"
   }
 }
 required_version = ">=1.1.0"
 }


data "azurerm_client_config" "current" {}

provider "azurerm" {
 features {}
 subscription_id = "e7cc5b12-3e04-4af0-a26f-30657aa9395f"
}

provider "random" {}

The following block creates our Resource Group, Virtual Network, and Subnet. We also use the “random” provider to generate a random string to use later

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#----------------------------------------------------
# Core
#----------------------------------------------------
resource "azurerm_resource_group" "rg" {
  name     = "hybrid-worker-test-rg"
  location = var.location
}

resource "azurerm_virtual_network" "vnet" {
  name                = "hybrid-worker-test-vnet"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  address_space       = ["10.0.0.0/24"]
}

resource "azurerm_subnet" "subnet" {
  name                              = "hybrid-worker-test-subnet1"
  resource_group_name               = azurerm_resource_group.rg.name
  virtual_network_name              = azurerm_virtual_network.vnet.name
  address_prefixes                  = ["10.0.0.0/26"]
  private_endpoint_network_policies = "Disabled"  
}

resource "random_uuid" "random" {}

Now we create our Key Vault, add a test secret to it and create a few role assignments. We give the Automation Account the “Virtual Machine Contributor” role so it can start/deallocate the VM. We use the earlier mentioned azurerm_client_config to reference the account used to authenticate to Azure so we can give that account the Key Vault Administrator role. This will allow us to actually add secrets to the Key Vault.

Lastly we give the Automation Account Secrets Reader, so our runbooks can access Key Vault secrets.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#----------------------------------------------------
# Key Vault and IAM
#----------------------------------------------------
resource "azurerm_key_vault" "kv" {
  name                          = "hybrid-worker-test-kv"
  location                      = azurerm_resource_group.rg.location
  resource_group_name           = azurerm_resource_group.rg.name
  tenant_id                     = data.azurerm_client_config.current.tenant_id
  sku_name                      = "standard"
  rbac_authorization_enabled    = true
  public_network_access_enabled = false
}

resource "azurerm_key_vault_secret" "secret" { 
  name         = "test-secret"
  value        = "Hello there! "
  key_vault_id = azurerm_key_vault.kv.id
}

resource "azurerm_role_assignment" "aa_vm" {
  scope                = azurerm_windows_virtual_machine.vm.id
  role_definition_name = "Virtual Machine Contributor"
  principal_id         = azurerm_automation_account.aa.identity[0].principal_id
}

resource "azurerm_role_assignment" "kv_self_admin" {
  scope                = azurerm_key_vault.kv.id 
  role_definition_name = "Key Vault Secrets Officer"
  principal_id         = data.azurerm_client_config.current.object_id

resource "azurerm_role_assignment" "aa_secrets_reader" {
  scope                = azurerm_key_vault.kv.id
  role_definition_name = "Key Vault Secrets User"
  principal_id         = azurerm_automation_account.aa.identity[0].principal_id
}

Now we create the VM. The important parts here relate to the blocks for “powershell_modules and “hybridworkerextension. Because we are running the scripts on an actual VM, that VM must have the necessary PowerShell modules that you plan to reference in your runbooks. Instead of connecting to the VM after creation and manually installing the modules you need, lines 124 - 127 pass a PowerShell script to the VM to download the modules I need for this test case.

Lines 131 - 141 install the Hybrid Worker Extension on the VM, which allows us to add our VM as a Hybrid Worker in the Automation Account

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#----------------------------------------------------
# Compute
#----------------------------------------------------
resource "azurerm_windows_virtual_machine" "vm" {
  name                = "hybridworkertest-vm"
  computer_name       = "workervm"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  zone                = var.vm_zone
  size                = var.vm_size
  admin_username      = var.vm_admin_un 
  admin_password      = var.vm_admin_pw
  network_interface_ids = [
    azurerm_network_interface.vnic.id
  ]

  os_disk {
    caching              = "None"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "MicrosoftWindowsServer"
    offer     = "windowsserver"
    sku       = "2022-datacenter-azure-edition"
    version   = "latest"
  }
  identity {
    type = "SystemAssigned"
  }


}

resource "azurerm_virtual_machine_extension" "powershell_modules" {
  name                 = "PowerShellModules"
  virtual_machine_id   = azurerm_windows_virtual_machine.vm.id
  publisher            = "Microsoft.Compute"
  type                 = "CustomScriptExtension"
  type_handler_version = "1.10"

  settings = jsonencode({
    commandToExecute = "powershell -ExecutionPolicy Unrestricted -Command \"Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force; Install-Module Az.Accounts -Scope AllUsers -Force -AllowClobber; Install-Module Az.KeyVault -Scope AllUsers -Force -AllowClobber\""

  })

}

resource "azurerm_virtual_machine_extension" "hybridworkerextension" {
  name                 = "HybridWorkerExtension"
  virtual_machine_id   = azurerm_windows_virtual_machine.vm.id
  publisher            = "Microsoft.Azure.Automation.HybridWorker"
  type                 = "HybridWorkerForWindows"
  type_handler_version = "1.1"

  settings = jsonencode({
    "AutomationAccountURL" = azurerm_automation_account.aa.hybrid_service_url
  })
}

Next we create a Virtual NIC for the VM and create a Private Endpoint linking our Key Vault to our subnet.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#----------------------------------------------------
# Networking
#----------------------------------------------------
resource "azurerm_network_interface" "vnic" {
  name                = "hybrid-worker-test-vnic"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name

  ip_configuration {
    name                          = "internal"
    subnet_id                     = azurerm_subnet.subnet.id
    private_ip_address_allocation = "Dynamic"
  }

}

resource "azurerm_private_endpoint" "pe" {
  name                = "hybrid-worker-test-pe"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  subnet_id           = azurerm_subnet.subnet.id

  private_service_connection {
    name                           = "connection-to-kv"
    is_manual_connection           = false
    private_connection_resource_id = azurerm_key_vault.kv.id
    subresource_names              = ["vault"]
  }
}

Lastly we have the automation and scheduling blocks. Here we create the automation account, create runbooks within the automation account, define schedules, and apply those schedules to the runbooks. Instead of showing the entire block, which you surely wont read, I will just include the creation of the Automation Account, a runbook, and a schedule.

To apply a schedule to a runbook, you first create the schedule and then use azurerm_automation_job_schedule to define which runbook the schedule should be applied on.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
resource "azurerm_automation_account" "aa" {
  name                = "hybrid-worker-test-aa"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  sku_name            = var.automation_account_sku
  identity {
    type = "SystemAssigned"
  }
}

resource "azurerm_automation_runbook" "startvm" {
  name                    = "startvm"
  location                = azurerm_resource_group.rg.location
  resource_group_name     = azurerm_resource_group.rg.name
  automation_account_name = azurerm_automation_account.aa.name
  log_verbose             = true
  log_progress            = true
  description             = "Allocates VM before script runtime"
  runbook_type            = "PowerShell"

  content = <<-EOT
  Connect-AzAccount -Identity
  $vmName = "hybridworkertest-vm"
  $resourceGroup = "hybrid-worker-test-rg"

  Write-Output "Starting VM"
  Start-AzVM -ResourceGroupName $resourceGroup -Name $vmName
  EOT
}

resource "azurerm_automation_schedule" "start_vm_schedule" {
  name                    = "Start-vm-schedule"
  resource_group_name     = azurerm_resource_group.rg.name
  automation_account_name = azurerm_automation_account.aa.name
  frequency               = "OneTime"
  start_time              = var.start_vm_time
  timezone                = "America/Chicago"
}

resource "azurerm_automation_job_schedule" "start_vm_job" {
  resource_group_name     = azurerm_resource_group.rg.name
  automation_account_name = azurerm_automation_account.aa.name
  runbook_name            = azurerm_automation_runbook.startvm.name
  schedule_name           = azurerm_automation_schedule.start_vm_schedule.name
}

Lessons Learned

This was my largest Terraform project so far. I wrote all of it under a single main.tf file and it quickly became a huge mess. I relied on the awesome Terraform Style Guide to help me clean it up. I also split my providers and backend sections into different .tf files to reduce the noise in the main.tf file.

Going forward I plan to be much more modular with my Terraform projects, and trying to to cram everything into an unreadable main.tf

Thanks for reading!

Licensed under CC BY-NC-SA 4.0
Built with Hugo
Theme Stack designed by Jimmy