AdvancedLesson 9 of 16

AKS Foundation with Bicep

Provision a production-ready AKS cluster — networking, identity, and node pool — as auditable, repeatable Bicep.

Simple Explanation (ELI5)

AKS is a managed Kubernetes service but it still depends on real Azure infrastructure: a virtual network for pod and service CIDRs, a managed identity so the cluster can create load balancers and disks on your behalf, and a node pool that defines what VM sizes run your workloads. Doing this through the Azure portal means clicking 25 screens with no history or repeatability. One Bicep file captures all of that as code, so every environment gets the exact same cluster topology every time.

Why Do We Need It?

AKS Dependency Chain

Resources Deployed in Order
VNet + Subnet
User-Assigned Identity
AKS Cluster
Node Pool Running

Technical Explanation

AKS requires the following resources in order:

  1. VNet and subnet — the subnet must be large enough for nodes, pods (Azure CNI), and internal load balancers. A /24 handles small clusters; production should use /22 or larger.
  2. User-assigned managed identity (UAI) — the cluster acts as this identity. It needs Network Contributor on the subnet to create internal load balancers, and AcrPull on any attached container registry. Never use a service principal; managed identities rotate credentials automatically.
  3. AKS managed cluster — references the subnet ID and UAI resource ID as dependencies. Bicep resolves these from symbolic references automatically.

Key AKS Properties Table

PropertyWhat it controlsProd guidance
agentPoolProfiles.vmSizeNode VM sizeStandard_D4s_v3 or larger
agentPoolProfiles.countInitial node count3 for HA (one per availability zone)
agentPoolProfiles.enableAutoScalingCluster autoscalertrue in prod; set minCount and maxCount
networkProfile.networkPluginkubenet vs Azure CNIazure for pod-level network policies
oidcIssuerProfile.enabledWorkload Identity federationtrue — required for Workload Identity
identity.typeCluster identity modelUserAssigned — explicit and auditable

Full Bicep: VNet, Identity, and AKS Cluster

bicep
@description('Environment name used in resource naming')
@allowed(['dev', 'stage', 'prod'])
param environment string = 'dev'

param location string = resourceGroup().location

@description('Number of nodes in the system node pool')
@minValue(1)
@maxValue(100)
param nodeCount int = 2

@description('VM size for cluster nodes')
param nodeVmSize string = 'Standard_D2s_v3'

// ── Networking ─────────────────────────────────────────────────────────────
resource vnet 'Microsoft.Network/virtualNetworks@2023-09-01' = {
  name: 'vnet-aks-${environment}'
  location: location
  properties: {
    addressSpace: {
      addressPrefixes: ['10.1.0.0/16']
    }
    subnets: [
      {
        name: 'snet-aks-nodes'
        properties: {
          addressPrefix: '10.1.0.0/22'   // 1022 usable IPs — fits 100-node Azure CNI cluster
        }
      }
    ]
  }
}

var aksSubnetId = vnet.properties.subnets[0].id

// ── Managed Identity ────────────────────────────────────────────────────────
resource aksIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: 'id-aks-${environment}'
  location: location
}

// ── AKS Cluster ─────────────────────────────────────────────────────────────
resource aks 'Microsoft.ContainerService/managedClusters@2024-02-01' = {
  name: 'aks-platform-${environment}'
  location: location
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${aksIdentity.id}': {}
    }
  }
  properties: {
    dnsPrefix: 'aksplatform${environment}'
    agentPoolProfiles: [
      {
        name: 'system'
        count: nodeCount
        vmSize: nodeVmSize
        mode: 'System'
        vnetSubnetID: aksSubnetId
        enableAutoScaling: false
        osType: 'Linux'
      }
    ]
    networkProfile: {
      networkPlugin: 'azure'
      serviceCidr: '10.2.0.0/24'
      dnsServiceIP: '10.2.0.10'
    }
    oidcIssuerProfile: {
      enabled: true
    }
    securityProfile: {
      workloadIdentity: {
        enabled: true
      }
    }
  }
}

output aksName string = aks.name
output aksId string = aks.id
output kubeletIdentityObjectId string = aks.properties.identityProfile.kubeletidentity.objectId

Hands-on Steps

  1. Create a dedicated resource group: az group create -n rg-aks-dev -l eastus
  2. Build and validate: az bicep build --file main.bicep then az deployment group validate -g rg-aks-dev --template-file main.bicep
  3. Run what-if to preview all 3 resources before deploy: az deployment group what-if -g rg-aks-dev --template-file main.bicep
  4. Deploy: az deployment group create -g rg-aks-dev --template-file main.bicep --name aks-deploy
  5. Connect: az aks get-credentials -g rg-aks-dev -n aks-platform-dev then kubectl get nodes
  6. After landing, assign the identity Network Contributor on the subnet if internal load balancers are needed.
🔒
Identity Rule

Never create AKS clusters with a service principal. Service principal credentials expire. Use a user-assigned managed identity so the cluster credential is managed by Azure and rotated automatically.

Debugging Scenarios

Runbook 1: Deployment Fails with SubnetIsTooSmall

Symptom: ARM error SubnetIsTooSmall: Subnet snet-aks-nodes does not have enough available IP addresses

  1. Check the networkPlugin: Azure CNI assigns one IP per pod, not just per node. A /24 (254 IPs) supports roughly 25 nodes with default pod density.
  2. Increase the subnet CIDR to /22 (1022 IPs) for workloads above 50 nodes.
  3. If the subnet already exists and is too small, you must create a new subnet, update the Bicep, and redeploy. Azure does not allow in-place subnet resizing that would conflict with existing allocations.

Runbook 2: Identity Does Not Have Sufficient Permissions

Symptom: AKS creates successfully but service type LoadBalancer pods stay in Pending indefinitely. Azure portal shows reconcileLoadBalancer: failed to ensure load balancer

  1. Check whether the cluster managed identity has the Network Contributor role on the subnet or resource group.
  2. Assign it: az role assignment create --assignee <kubelet-object-id> --role "Network Contributor" --scope <subnet-id>
  3. To prevent this in Bicep, add a roleAssignment resource block after the identity declaration so the role is always deployed with the cluster.

Runbook 3: Regional Quota Exceeded

Symptom: Deployment fails with QuotaExceeded: Cores quota for Standard DSv3 Family in EastUS has been exceeded

  1. Run az vm list-usage --location eastus --query "[?contains(name.value,'DSv3')]" -o table to check quota.
  2. Request quota increase in Azure portal under Subscriptions - Usage + quotas, or change to a region with available quota.
  3. For non-production use, switch nodeVmSize to Standard_B2s which consumes burstable quota instead.

Interview Questions

Beginner

Why do AKS clusters need a virtual network?

Nodes run as VMs inside a subnet. Pods (in Azure CNI mode) consume IPs from the same subnet. The VNet also hosts internal load balancers used by Kubernetes services.

What is a managed identity and why use it for AKS?

A managed identity is an Azure AD identity whose credentials are managed by the platform. AKS uses it to create load balancers and pull images. Unlike a service principal, a managed identity never needs manual credential rotation.

What does dnsPrefix control on an AKS cluster?

It sets the hostname prefix for the Kubernetes API server FQDN. It must be unique within the Azure region and can only contain alphanumeric characters and hyphens.

What is oidcIssuerProfile used for?

It enables the AKS OIDC Issuer endpoint, which is required for Workload Identity federation. This allows pods to authenticate to Azure services using a service account token instead of a stored credential.

Intermediate

What is the difference between kubenet and Azure CNI network plugins?

With kubenet, nodes get subnet IPs and pods get a private overlay range — simpler but limited for network policies. With Azure CNI, every pod gets a real subnet IP, enabling direct routing and Azure Network Policies, but requiring a larger subnet.

Why does subnet sizing matter more with Azure CNI than kubenet?

Azure CNI allocates one IP per pod. On a 30-node cluster with 30 pods per node, you need 900+ IPs. A /24 subnet would be exhausted; /22 or larger is recommended for production clusters.

What happens if the AKS identity lacks Network Contributor on the subnet?

The cluster creates successfully but any Kubernetes Service of type LoadBalancer stays in Pending because AKS cannot create the Azure load balancer without permission to write to the network resource group.

How do you handle identity role assignments in Bicep?

Declare a Microsoft.Authorization/roleAssignments resource that references the identity principalId and the target scope. This ensures the role is always present after a fresh deployment without manual portal steps.

Scenario-based

You need identical AKS clusters in dev and prod with different node sizes. How do you structure the Bicep?

Use an environment param with @allowed, then drive nodeVmSize and nodeCount through conditional variables or a parameter file. One main.bicep file plus dev.parameters.json and prod.parameters.json handles both environments cleanly.

A deploy team made manual portal changes to the AKS node pool size. How do you reconcile this with Bicep?

Run what-if first to see the delta. Decide whether to accept the manual change by updating the Bicep, or override it by redeploying the Bicep as-is. ARM's incremental mode does not delete unmanaged resources, but it will reconcile the properties Bicep declares.

An auditor asks how you ensure every AKS cluster uses managed identity and Workload Identity. What do you show?

I show the Bicep template where identity.type is set to UserAssigned and oidcIssuerProfile.enabled is set to true, alongside the CI/CD pipeline that enforces all cluster deployments go through this template. Azure Policy can enforce this further.

Real-world Usage

Enterprise AKS platform teams provision one Bicep template that creates the cluster alongside networking and identity. The template outputs the cluster name, kubelet identity object ID, and OIDC issuer URL. The CI/CD pipeline captures those outputs and uses them to configure Workload Identity bindings, attach container registries, and install Helm charts — all without hard-coded resource names.

Summary

AKS provisioning with Bicep requires understanding three interdependent layers: networking (VNet and subnet sizing), identity (user-assigned managed identity and role assignments), and cluster configuration (node pools, network plugin, OIDC). Getting each layer right in code prevents the most common production failures and makes every cluster an auditable, repeatable artifact.