Advanced Lesson 8 of 14

Security & RBAC

Implement defense-in-depth on AKS — from Azure AD authentication and Kubernetes RBAC to network policies, workload identity, Azure Policy enforcement, and secrets management with Key Vault.

🧒 Simple Explanation (ELI5)

Imagine your AKS cluster is a tall office building. Security doesn't rely on a single lock — it works in layers:

Every layer can fail on its own, but together they make the building very hard to break into. That's defense-in-depth.

🔧 Technical Explanation

1. Azure AD (Entra ID) Integration

AKS supports two modes of Azure AD integration:

FeatureLegacy Azure ADAKS-managed Azure AD
SetupManual app registrations (server + client)Single flag; Microsoft manages apps
Enable commandDeprecatedaz aks update -g myRG -n myAKS --enable-aad
Admin groupManual ClusterRoleBinding--aad-admin-group-object-ids
Conditional AccessSupportedSupported
bash
# Enable AKS-managed Azure AD integration
az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --enable-aad \
  --aad-admin-group-object-ids "$(az ad group show -g AKS-Admins --query id -o tsv)"

# Fetch kubeconfig (interactive login prompt)
az aks get-credentials -g myResourceGroup -n myAKSCluster

# First kubectl command triggers Azure AD device-code login
kubectl get nodes
💡
kubelogin

For non-interactive scenarios (CI/CD, automation), install kubelogin and convert the kubeconfig: kubelogin convert-kubeconfig -l azurecli. This lets pipelines authenticate with Azure CLI credentials instead of device-code flow.

Once Azure AD is enabled, every API request to the cluster carries an Azure AD token. The API server validates the token and extracts the user's Object ID and group memberships. You then map these to Kubernetes RBAC objects.

2. Kubernetes RBAC on AKS

Building on the Kubernetes RBAC concepts you already know, AKS adds Azure AD subjects to bindings:

yaml
# ClusterRoleBinding mapping an Azure AD group to cluster-admin
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: aad-admins-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: "a1b2c3d4-aaaa-bbbb-cccc-123456789abc"   # Azure AD group Object ID
yaml
# Namespace-scoped Role + RoleBinding for a dev team
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: team-alpha
  name: developer-role
rules:
- apiGroups: ["", "apps"]
  resources: ["pods", "deployments", "services", "configmaps"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: team-alpha
  name: team-alpha-devs
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: developer-role
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: "e5f6a7b8-dddd-eeee-ffff-abcdef012345"   # Azure AD group for Team Alpha
Local Accounts

Disable local accounts on production clusters: az aks update -g myRG -n myAKS --disable-local-accounts. This forces all access through Azure AD — no fallback --admin kubeconfig.

3. Managed Identities

Identity TypeScopeUse Case
System-assigned (cluster)Control planeAKS manages node pools, load balancers, disks
Kubelet identityNode pool VMSSPull images from ACR, read secrets
Workload identityIndividual podPod-level access to Azure resources via federated credentials

Workload Identity (Recommended)

Workload identity replaces the deprecated AAD Pod Identity. It uses Kubernetes service account token projection and Azure AD federated credentials — no CRDs, no NMI pods, no host network requirements.

bash
# 1. Enable workload identity on the cluster
az aks update -g myRG -n myAKS --enable-oidc-issuer --enable-workload-identity

# 2. Get the OIDC issuer URL
OIDC_ISSUER=$(az aks show -g myRG -n myAKS --query "oidcIssuerProfile.issuerUrl" -o tsv)

# 3. Create a user-assigned managed identity
az identity create -g myRG -n myapp-identity
CLIENT_ID=$(az identity show -g myRG -n myapp-identity --query clientId -o tsv)

# 4. Create a Kubernetes service account annotated with the identity
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp-sa
  namespace: default
  annotations:
    azure.workload.identity/client-id: "$CLIENT_ID"
EOF

# 5. Create the federated credential (trust between Azure AD identity and K8s SA)
az identity federated-credential create \
  --name myapp-fedcred \
  --identity-name myapp-identity \
  --resource-group myRG \
  --issuer "$OIDC_ISSUER" \
  --subject system:serviceaccount:default:myapp-sa \
  --audience api://AzureADTokenExchange
Workload Identity Flow
Pod (SA: myapp-sa)
Projected Token
Azure AD Token Exchange
Azure Resource (Key Vault, Storage...)

4. Azure Policy for AKS

Azure Policy extends Gatekeeper (OPA) on AKS. Enable the add-on and assign built-in policy definitions:

bash
# Enable Azure Policy add-on
az aks enable-addons --addons azure-policy -g myRG -n myAKS

# Assign a built-in policy: "Kubernetes cluster should not allow privileged containers"
POLICY_DEF="/providers/Microsoft.Authorization/policyDefinitions/95edb821-ddaf-4404-9732-666045e056b4"
az policy assignment create \
  --name "no-privileged-pods" \
  --policy "$POLICY_DEF" \
  --scope "/subscriptions/$SUB_ID/resourceGroups/myRG/providers/Microsoft.ContainerService/managedClusters/myAKS" \
  --params '{"effect":{"value":"Deny"}}'

Common built-in policy definitions for AKS:

PolicyEffectWhat It Enforces
No privileged containersDenyBlocks securityContext.privileged: true
Enforce resource limitsDenyEvery container must set CPU/memory limits
Allowed container registriesDenyOnly images from approved ACR registries
No host networkingDenyPrevents hostNetwork: true
Read-only root filesystemAuditWarns if readOnlyRootFilesystem not set

5. Network Policies

AKS supports two network policy engines:

EngineNetwork PluginFeatures
Azure NPMAzure CNISupports Kubernetes NetworkPolicy; uses IPTables/IPSets
CalicoAzure CNI or kubenetKubernetes + extended Calico policies (GlobalNetworkPolicy, DNS rules)
yaml
# Deny-all ingress baseline for a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress

---
# Allow only frontend → backend on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
⚠️
Start with Deny-All

Always apply a deny-all policy first, then layer explicit allow rules. This ensures no pod can receive unexpected traffic even if a new workload is deployed without a corresponding policy.

6. Pod Security Standards

Since PodSecurityPolicy is removed in K8s 1.25+, AKS uses the built-in Pod Security Admission (PSA) controller:

bash
# Label a namespace to enforce the "restricted" Pod Security Standard
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/audit=restricted
LevelAllows
PrivilegedEverything (no restrictions)
BaselineMinimally restrictive — blocks hostNetwork, hostPID, privileged containers
RestrictedHardened — requires non-root, drops ALL capabilities, read-only rootfs recommended

7. Secrets Management — Azure Key Vault Provider

The Secrets Store CSI Driver with the Azure Key Vault provider mounts Key Vault secrets directly into pods as files, and optionally syncs them as Kubernetes Secrets.

bash
# Enable the add-on
az aks enable-addons --addons azure-keyvault-secrets-provider -g myRG -n myAKS

# Verify driver pods are running
kubectl get pods -n kube-system -l app=secrets-store-csi-driver
yaml
# SecretProviderClass referencing Key Vault
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: kv-secrets
  namespace: default
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    useVMAssignedIdentity: "false"
    clientID: ""
    keyvaultName: "myKeyVault"
    objects: |
      array:
        - |
          objectName: db-connection-string
          objectType: secret
        - |
          objectName: tls-cert
          objectType: secret
    tenantId: ""
  secretObjects:                          # Optional: sync as K8s Secret
  - secretName: db-secret-k8s
    type: Opaque
    data:
    - objectName: db-connection-string
      key: connectionString
yaml
# Pod mounting Key Vault secrets
apiVersion: v1
kind: Pod
metadata:
  name: myapp
  labels:
    azure.workload.identity/use: "true"
spec:
  serviceAccountName: myapp-sa
  containers:
  - name: app
    image: myacr.azurecr.io/myapp:v1
    volumeMounts:
    - name: secrets-store
      mountPath: "/mnt/secrets"
      readOnly: true
    env:
    - name: DB_CONN
      valueFrom:
        secretKeyRef:
          name: db-secret-k8s
          key: connectionString
  volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: "kv-secrets"

8. Image Security

AKS Security Layers (Defense-in-Depth)
Azure AD / Entra ID
API Server Auth
Kubernetes RBAC
Network Policy
Pod Security
Key Vault Secrets

⌨️ Hands-on

Lab 1: Azure AD Group → ClusterRoleBinding

bash
# Create an Azure AD group for read-only users
az ad group create --display-name "AKS-Viewers" --mail-nickname "aks-viewers"
VIEWER_GROUP_ID=$(az ad group show -g "AKS-Viewers" --query id -o tsv)

# Add a user to the group
az ad group member add --group "AKS-Viewers" --member-id "<USER_OBJECT_ID>"

# Create a ClusterRoleBinding for the group
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: aad-viewers
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: "$VIEWER_GROUP_ID"
EOF

# Test — have the viewer user run:
#   az aks get-credentials -g myRG -n myAKS
#   kubelogin convert-kubeconfig -l azurecli
#   kubectl get pods --all-namespaces        # ✅ works
#   kubectl delete pod nginx -n default      # ❌ Forbidden

Lab 2: Key Vault Secret Mounted in a Pod

bash
# 1. Create Key Vault and store a secret
az keyvault create -n myKV -g myRG --location eastus
az keyvault secret set --vault-name myKV --name db-password --value "S3cure!P@ss"

# 2. Grant the workload identity access to Key Vault secrets
IDENTITY_CLIENT_ID=$(az identity show -g myRG -n myapp-identity --query clientId -o tsv)
az keyvault set-policy --name myKV \
  --secret-permissions get \
  --spn "$IDENTITY_CLIENT_ID"

# 3. Apply the SecretProviderClass (from YAML above — update clientID, tenantId, keyvaultName)
kubectl apply -f secret-provider-class.yaml

# 4. Deploy the pod
kubectl apply -f pod-with-kv.yaml

# 5. Verify the secret is mounted
kubectl exec myapp -- cat /mnt/secrets/db-password
# Output: S3cure!P@ss

Lab 3: Apply a Deny-All Network Policy

bash
# Deploy two test pods
kubectl run frontend --image=nginx -l app=frontend -n production
kubectl run backend --image=nginx -l app=backend -n production
kubectl expose pod backend --port=80 -n production

# Before policy — frontend can reach backend
kubectl exec frontend -n production -- curl -s --max-time 3 backend
# ✅ HTML response

# Apply deny-all
kubectl apply -f deny-all-ingress.yaml

# After policy — frontend is blocked
kubectl exec frontend -n production -- curl -s --max-time 3 backend
# ❌ Timeout

# Apply allow rule for frontend → backend
kubectl apply -f allow-frontend-to-backend.yaml

# Now frontend can reach backend again
kubectl exec frontend -n production -- curl -s --max-time 3 backend
# ✅ HTML response

🐛 Debugging Scenarios

Scenario 1: "Forbidden: User does not have access to the resource"

Symptom: A developer authenticated via Azure AD runs kubectl get pods -n team-alpha and receives a 403 Forbidden error.

bash
# Step 1: Confirm the user's Azure AD identity
az ad signed-in-user show --query '{name:displayName, objectId:id}'

# Step 2: Check which groups the user belongs to
az ad user get-member-objects --id "<USER_OBJECT_ID>"

# Step 3: Look at all RoleBindings in the target namespace
kubectl get rolebindings -n team-alpha -o yaml | grep -A5 "subjects"

# Step 4: Compare group Object IDs — does the binding reference the right group?
# Common mistake: using the group's displayName instead of its Object ID

# Step 5: Check cluster-level bindings too
kubectl get clusterrolebindings -o yaml | grep -B5 "<GROUP_OR_USER_ID>"

# Step 6: Verify Azure AD integration is enabled
az aks show -g myRG -n myAKS --query "aadProfile"

# Fix: Either add the user to the correct Azure AD group,
# or create/update the RoleBinding with the right group Object ID.

Scenario 2: "SecretProviderClass mount failed"

Symptom: Pod is stuck in ContainerCreating with a volume mount error referencing the Secrets Store CSI Driver.

bash
# Step 1: Describe the pod to see the exact error
kubectl describe pod myapp
# Look for events like "FailedMount" or "rpc error"

# Step 2: Check CSI driver pods are running
kubectl get pods -n kube-system -l app=secrets-store-csi-driver
kubectl get pods -n kube-system -l app=secrets-store-provider-azure

# Step 3: Check the SecretProviderClass YAML
kubectl get secretproviderclass kv-secrets -o yaml
# Verify: keyvaultName, tenantId, clientID are correct

# Step 4: If using workload identity — verify federated credential
az identity federated-credential list --identity-name myapp-identity -g myRG -o table
# Check issuer URL and subject match

# Step 5: Check Key Vault access policy
az keyvault show -n myKV --query "properties.accessPolicies"
# Verify the managed identity has GET permission for secrets

# Step 6: Check pod has the workload identity label and service account
kubectl get pod myapp -o yaml | grep -A2 "serviceAccountName"
kubectl get pod myapp -o yaml | grep "azure.workload.identity"

# Fix: Common causes are wrong clientID in SecretProviderClass,
# missing Key Vault access policy, or missing pod label/SA annotation.

Scenario 3: "Pod rejected by policy — admission webhook denied"

Symptom: kubectl apply returns an error like admission webhook "validation.gatekeeper.sh" denied the request or a message from Azure Policy.

bash
# Step 1: Read the full error message — it names the constraint
# e.g., "azurepolicy-k8sazurev2noprivilege-..." or "container-restricted-..."

# Step 2: List Gatekeeper constraints
kubectl get constraints
kubectl get constrainttemplates

# Step 3: Describe the specific constraint that blocked the pod
kubectl describe constraint azurepolicy-k8sazurev2noprivilege-xxxxx
# Shows which policy definition it maps to and the enforcement action

# Step 4: Check Azure Policy assignments on the cluster
az policy assignment list --scope "/subscriptions/$SUB_ID/resourceGroups/myRG" \
  --query "[?contains(policyDefinitionId,'Container')].{name:displayName,effect:parameters.effect.value}" -o table

# Step 5: Review your pod spec against the policy
#   - Is securityContext.privileged set to true?
#   - Are resource limits missing?
#   - Is the image from a disallowed registry?

# Step 6: For PSA rejections, check namespace labels
kubectl get namespace production --show-labels | grep pod-security

# Fix: Update the pod spec to comply with the policy, or
# (temporarily) change the policy effect from Deny to Audit for debugging.

🎯 Interview Questions

Beginner

Q: What is RBAC in Kubernetes and why is it needed on AKS?

RBAC (Role-Based Access Control) is a method of regulating access to Kubernetes resources based on the roles of individual users or groups. On AKS, RBAC is critical because it integrates with Azure AD — Azure AD authenticates users and groups, then Kubernetes RBAC authorizes their actions (get, create, delete) on specific resources in specific namespaces. Without RBAC, anyone with cluster credentials has full access.

Q: How does Azure AD integration work with AKS?

When Azure AD integration is enabled, kubeconfig contains an Azure AD auth provider instead of a static token. When a user runs kubectl, they're prompted to authenticate with Azure AD (or use kubelogin for non-interactive flow). The API server validates the Azure AD token, extracts the user's Object ID and group memberships, and then Kubernetes RBAC rules determine what the user can do. This provides enterprise SSO and conditional access for the cluster.

Q: What are managed identities in AKS and what types exist?

Managed identities eliminate the need for storing credentials. AKS uses three types: (1) System-assigned identity for the cluster control plane to manage Azure resources. (2) Kubelet identity for node pools to pull images from ACR and access node-level resources. (3) Workload identity (user-assigned) for individual pods to access Azure services like Key Vault or Storage with pod-level granularity using federated credentials.

Q: What is a NetworkPolicy in Kubernetes?

A NetworkPolicy is a Kubernetes resource that controls the traffic allowed to and from pods. By default, all pod-to-pod communication is allowed. NetworkPolicies act as a firewall — you select pods via labels and define ingress/egress rules specifying which sources or destinations are permitted. On AKS, you need Azure NPM (with Azure CNI) or Calico as the network policy engine for these to be enforced.

Q: What are Pod Security Standards (PSS)?

Pod Security Standards define three levels of security policies: Privileged (unrestricted), Baseline (minimally restrictive, blocks known privilege escalations), and Restricted (heavily restricted, enforces best practices). Since PodSecurityPolicy was removed in K8s 1.25, the built-in Pod Security Admission (PSA) controller enforces these standards per namespace using labels.

Intermediate

Q: Compare Workload Identity with the deprecated AAD Pod Identity. Why did Microsoft replace it?

AAD Pod Identity required CRDs (AzureIdentity, AzureIdentityBinding), the NMI DaemonSet running on host network, and IMDS interception. It was complex, had race conditions, and host-network access was a security concern. Workload Identity uses standard Kubernetes service account token projection and Azure AD federated credentials — no extra CRDs, no DaemonSet, no host network. It's simpler, more secure, and follows the OIDC federation standard supported by other cloud providers too.

Q: How does the Azure Key Vault Provider for Secrets Store CSI Driver work?

The CSI Driver runs as a DaemonSet on each node. When a pod references a SecretProviderClass volume, the driver calls the Azure provider, which authenticates to Key Vault (via workload identity or managed identity), fetches the specified secrets/keys/certs, and mounts them as files in the pod's filesystem. Optionally, it can also sync them as Kubernetes Secret objects so they can be used as env vars. Secrets are refreshed based on a configurable rotation poll interval.

Q: How does Azure Policy for AKS work under the hood?

Azure Policy for AKS installs Gatekeeper (OPA Constraint Framework) on the cluster via the azure-policy add-on. Policy definitions in Azure are translated into Gatekeeper ConstraintTemplates and Constraints by a sync component. When a resource is created/updated, the Gatekeeper admission webhook evaluates it against the constraints. Violations are either denied (block the request), audited (allow but flag), or mutated (auto-fix). Compliance results are reported back to Azure Policy for centralized visibility.

Q: What's the difference between Azure NPM and Calico for network policies on AKS?

Azure NPM is Microsoft's implementation supporting standard Kubernetes NetworkPolicy API. It works only with Azure CNI and uses IPTables/IPSets. Calico supports both Azure CNI and kubenet, and extends NetworkPolicy with Calico-specific CRDs like GlobalNetworkPolicy and DNS-based rules. Calico is generally preferred for advanced scenarios, while Azure NPM is simpler for basic ingress/egress control with no additional components.

Q: How would you enforce that all images deployed to AKS come only from your organization's ACR?

Use Azure Policy with the built-in definition "Kubernetes cluster containers should only use allowed images" (or "Ensure only allowed container images in Kubernetes cluster"). Assign it to the AKS cluster scope with the allowedContainerImagesRegex parameter set to match your ACR hostname, e.g., ^myacr\.azurecr\.io/.+$. Set the effect to Deny. This installs a Gatekeeper constraint that rejects any pod with an image from a non-matching registry. Combine with ACR content trust for signed-image verification.

Scenario-Based

Q: A developer reports "I can see pods in the default namespace but get Forbidden in the team-alpha namespace." How do you troubleshoot?

1. Confirm their Azure AD identity: az ad signed-in-user show. 2. Check which Azure AD groups they belong to: az ad user get-member-objects. 3. List RoleBindings in team-alpha: kubectl get rolebindings -n team-alpha. 4. Compare the group Object ID in the binding against the user's groups. They likely have a ClusterRoleBinding for default namespace (or cluster-wide view) but no RoleBinding scoped to team-alpha. Fix: add their Azure AD group's Object ID to a RoleBinding in team-alpha with the appropriate Role.

Q: A pod that needs to read secrets from Key Vault is stuck in ContainerCreating. The events show "failed to mount secrets store objects." What steps do you take?

1. kubectl describe pod — read the exact CSI error message. 2. Verify CSI driver pods are running in kube-system. 3. Check the SecretProviderClass: is keyvaultName, tenantId, clientID correct? 4. If using workload identity: verify the pod has the label azure.workload.identity/use: "true", the SA has the correct annotation, and the federated credential exists with matching issuer/subject. 5. Check Key Vault access policy grants GET to the managed identity. 6. Test Key Vault access directly: az keyvault secret show --vault-name myKV --name db-password using the identity. The most common cause is a mismatch between the clientID in SecretProviderClass and the actual managed identity.

Q: Your deployment is rejected with "Container image myacr.azurecr.io/app:latest is not allowed by policy." But it IS from your ACR. What went wrong?

The Azure Policy for allowed registries uses a regex matcher. Check the policy assignment's allowedContainerImagesRegex parameter. Common issues: (1) The regex requires a specific tag pattern and :latest is not matched. (2) Init containers or sidecar images (e.g., istio-proxy, secrets-store-provider) also need to match. (3) The policy might allow myacr.azurecr.io but the image uses the FQDN myacr.azurecr.io/app vs myacr.azurecr.io/team/app. Fix: update the regex to cover all image paths, tags, and system sidecars. Check existing constraints: kubectl get constraints and describe the offending one.

Q: After applying a deny-all NetworkPolicy, the application's health probes start failing and the pods restart. Why?

Kubelet health probes (liveness, readiness, startup) come from the node's IP, not from another pod. A deny-all ingress NetworkPolicy blocks ALL sources including the node. The fix: add an ingress rule allowing traffic from the node CIDR or the pod's own node IP to the probe port. With Azure CNI, the node IP is in the same VNet, so you need a from: [ipBlock: {cidr: "10.240.0.0/16"}] rule (using your node subnet CIDR) for the probe ports.

Q: A production cluster with Azure AD and local accounts disabled has gone into a state where all Azure AD admin group members are deleted. How do you recover?

This is a disaster recovery scenario. Since local accounts are disabled, you can't use --admin kubeconfig. Steps: 1. Re-create the Azure AD admin group (or create a new one) in Azure AD. 2. Use az aks update with --aad-admin-group-object-ids pointing to the new group — this is an ARM-level operation, not a kubectl one, so it doesn't need cluster auth. 3. Add an emergency-break-glass admin user to the group. 4. Fetch credentials and verify access. Prevention: always keep a break-glass admin account in the Azure AD group and protect it with PIM (Privileged Identity Management).

🌍 Real-World Use Case

Zero-Trust AKS at a Financial Services Firm

A multinational bank runs its trading and compliance applications on AKS. Regulatory requirements (SOC 2, PCI-DSS) mandate strict access control and audit trails.

Result: zero security incidents in 18 months, passed three external audits with no findings, and developers self-serve namespace access through Azure AD group membership requests (approved via PIM).

📝 Summary

← Back to AKS Course