Write your own custom Azure Policies with Terraform

Previously on this blog we've talked about dealing with Azure Policy using Infrastructure as Code, sometimes referred to as Policy as Code. You can read about that more if you are interested here:

While that post talked about using the built-in policy definitions that Microsoft supplies this post will cover custom policies where the built-in does not really suit our needs or if we just want to make small adjustments which the policy definition from Microsoft does not cover.
Of course we will do this using IaC and Terraform. We will need to author our own policy definition instead of referencing an existing one with a data block and we will need to create a policy assignment for it.
We will make use of a key feature which Terraform has that is called jsonencode()
which will transform our Terraform/HCL code into JSON which is what Azure Policy Definitions accept. There is also a function called jsondecode()
which does the opposite, turns JSON into HCL.
The scenario
We need to author a policy where we as an organization have a requirement to ensure that our storage accounts in production environments make use of at least zone-redundant storage accounts to increase our resilience in Azure. There is a built in policy already for enforcing the use of both zone redundant and geo-reduntant storage accounts on their own but no combined one.
We will create a new one that will enforce a list of account replication types that will cover both, our list will consist of the following replication types:
"Standard_GRS",
"Standard_RAGRS",
"Standard_GZRS",
"Standard_RAGZRS",
"Standard_ZRS
"Premium_ZRS"
This will be applied to our production resource group with a custom policy definition, avoiding having to create two separate assignments for enforcing account replication on our storage accounts.
Our starter code:
First as usual we need to create our base project. All of the code for this project can be found in the Github repo HERE
In our main.tf
I will enter the following:
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "4.14.0"
}
}
}
provider "azurerm" {
features {}
subscription_id = var.azure_subscription_id
}
resource "azurerm_resource_group" "this" {
name = "rg-${var.environment}-${var.location_short}-custompolicy"
location = var.location
}
Inside my variables.tf
variable "azure_subscription_id" {
type = string
description = "Azure Subscription ID"
}
variable "environment" {
type = string
description = "The environment for the deployed resources"
}
variable "location" {
type = string
description = "The Azure region where resources will be deployed"
default = "swedencentral"
}
variable "location_short" {
type = string
description = "The location short of the deployed resources resources will be deployed"
default = "sc"
}
I will also create a file here: variables/prod.tfvars
where I fill out some values for the deployment:
azure_subscription_id = "-your-sub-id-"
environment = "prod"
The policy definition
With all the starter code out of the way we will begin with our azurerm_policy_definition
resource and here we will make use of the jsonencode()
function to help us author the code.
A pro tip! I will go into the Azure Portal into any existing policy definition file I can find and copy the parts that I need which will help me not having to remember the syntax. First the base information however:
resource "azurerm_policy_definition" "this" {
name = "allowed_storage_replication_type"
policy_type = "Custom"
display_name = "Allow only storage accounts of zone or geo redundant replication"
description = "This is a custom policy which will combine and allow both zone and geo redundant storage accounts"
mode = "Indexed"
}
Important that policy_type is set to custom here
Now we need the beefy parameters
part and the policy_rule
part, here I will copy from any existing definition in the portal:

Now select it and scroll down into the JSON definition and copy the parameters
section:

Now inside your azurerm_policy_definition
block add the parameters
property and write the following for value before your paste in anything: jsonencode({})
and press enter and now paste. Replace the parameter name and values with your values and replace all :
symbols with =
so it works with Terraform. The end product in my scenario looks like this:
parameters = jsonencode({
"effect" = {
"type" = "String",
"metadata" = {
"displayName" = "Effect",
"description" = "This parameter lets you choose the effect of the policy. If you choose Audit (default), the policy will only audit resources for compliance. If you choose Deny, the policy will deny the creation of non-compliant resources. If you choose Disabled, the policy will not enforce compliance (useful, for example, as a second assignment to ignore a subset of non-compliant resources in a single resource group)."
},
"allowedValues" = [
"Audit",
"Deny",
"Disabled"
],
"defaultValue" = "Audit"
}
})
Now do the same for policy_rule
. My entire definition now looks like this:
resource "azurerm_policy_definition" "this" {
name = "allowed_storage_replication_type"
policy_type = "Custom"
display_name = "Allow only storage accounts of zone or geo redundant replication"
description = "This is a custom policy which will combine and allow both zone and geo redundant storage accounts"
mode = "Indexed"
parameters = jsonencode({
"effect" = {
"type" = "String",
"metadata" = {
"displayName" = "Effect",
"description" = "This parameter lets you choose the effect of the policy. If you choose Audit (default), the policy will only audit resources for compliance. If you choose Deny, the policy will deny the creation of non-compliant resources. If you choose Disabled, the policy will not enforce compliance (useful, for example, as a second assignment to ignore a subset of non-compliant resources in a single resource group)."
},
"allowedValues" = [
"Audit",
"Deny",
"Disabled"
],
"defaultValue" = "Audit"
}
})
policy_rule = jsonencode({
"if" = {
"allOf" = [
{
"field" = "type",
"equals" = "Microsoft.Storage/storageAccounts"
},
{
"not" = {
"field" = "Microsoft.Storage/storageAccounts/sku.name",
"in" = [
"Standard_GRS",
"Standard_RAGRS",
"Standard_GZRS",
"Standard_RAGZRS",
"Standard_ZRS",
"Premium_ZRS"
]
}
}
]
},
"then" = {
"effect" = "[parameters('effect')]"
}
})
}

terraform fmt
does not throw errors after you've replaced : with = to make sure formatting and syntax is correctNow finally we just need to apply this as a policy assignment so this definition can enforce its rules somewhere. I will assign it to the resource group we made earlier:
resource "azurerm_resource_group_policy_assignment" "this" {
name = "allowed_storage_sku"
display_name = azurerm_policy_definition.this.display_name
resource_group_id = azurerm_resource_group.this.id
policy_definition_id = azurerm_policy_definition.this.id
description = azurerm_policy_definition.this.description
parameters = jsonencode({
"effect" = {
"value" = "Audit"
}
})
}
effect is set to Audit. The definition allows Audit
, Deny
and Disabled
The final test
Having applied this policy I will write the following code to create three storage accounts. Two that follow our definition and one with local redundant storage which our policy does not like.
locals {
storage_account_replication_type = [
"LRS",
"GRS",
"ZRS"
]
}
resource "azurerm_storage_account" "this" {
for_each = toset(local.storage_account_replication_type)
name = lower("st${var.environment}${var.location_short}${each.key}")
resource_group_name = azurerm_resource_group.this.name
location = azurerm_resource_group.this.location
account_tier = "Standard"
account_replication_type = each.key
}
The LRS
storage account should be marked non-compliant

Once deployed I will wait a while for Azure Policy to scan it for compliance. You cant trigger this by running the following az cli command az policy state trigger-scan -g rg-prod-sc-custompolicy
And in the portal after a while, since the policy is in Audit
mode it will just mark the resource as non-compliant

LRS
storage account is marked as non-compliantIf I destroy all the storage accounts and update my policy definition to have the effect of Deny
instead I should not be allowed to even create the storage accounts.


Upon re-creating the storage accounts I get hit with an error for the LRS
storage account

Conclusion
Hopefully you can see just how powerful this can. Any change will be documented in source control, hopefully reviewed by a peer first in a pull request and then rolled out into production.
This way you could have the same assignment but with effect Audit
in a development and/or acceptance environment first and then Deny
in production, and on and on..
Finally I wish you happy holidays and hope you get some well deserved rest!
About me
