Hands-on Lesson 14 of 14

AKS Interview Preparation

40+ real interview questions covering AKS fundamentals, architecture, networking, security, scaling, CI/CD, and troubleshooting — with production-grade answers that demonstrate deep understanding.

💡
How to Use This

Don't just memorize answers. For each question, try to answer it out loud before reading the answer. In real interviews, you'll need to explain concepts clearly and confidently. Focus on the "why" behind each answer, not just the "what".

🔰 Section 1 — AKS Fundamentals (Questions 1–10)

Q1: What is AKS and what does "managed" mean?

Answer: AKS is Azure Kubernetes Service — Azure's managed Kubernetes offering. "Managed" means Azure owns the control plane: the API server, etcd, scheduler, and controller manager. They're highly available, auto-patched, and you never see the VMs running them. You only manage the worker nodes (node pools) and your workloads. The control plane is free — you only pay for worker node VMs.

Q2: What's the difference between the control plane and the data plane in AKS?

Answer:

ComponentControl Plane (Azure-managed)Data Plane (You manage)
ContainsAPI server, etcd, scheduler, controller managerWorker nodes, your pods, volumes, networking
CostFree (Standard/Premium tiers available for SLA)You pay for VMs, disks, networking
UpgradesAzure patches; you trigger K8s version upgradesOS image upgrades (auto or manual)
Accessvia kubectl / API — you never SSH into control planeSSH possible (but discouraged), exec into pods

Q3: What is the MC_ resource group?

Answer: When you create an AKS cluster in a resource group like myRG, Azure creates a second resource group called MC_myRG_myCluster_eastus. This contains all the infrastructure resources — VMSS (node pools), load balancers, public IPs, NSGs, route tables, and the VNet (if Azure-managed). You should never manually modify resources in the MC_ group — AKS reconciles them automatically and your changes would be overwritten.

Q4: How does AKS differ from EKS and GKE?

Answer:

Q5: What identity types does AKS use and why?

Answer: AKS uses Managed Identities (system-assigned or user-assigned) instead of service principals. The cluster identity (control plane) needs permissions to manage Azure resources (e.g., create load balancers, read from ACR). The kubelet identity is used by nodes to pull images from ACR. Using managed identities avoids credential rotation headaches — Azure handles the token lifecycle automatically.

Q6: Explain AKS pricing tiers (Free, Standard, Premium).

Answer:

TierSLAFeaturesCost
FreeNo SLABasic features, good for dev/test$0
Standard99.95% (with AZ) / 99.9%SLA, more API server capacity~$73/month
Premium99.95%+Long-term support, advanced networking, AKS Automatic~$146/month

In interviews, emphasize: "Free tier for dev/test, Standard for production where uptime matters, Premium for enterprise compliance and LTS."

Q7: What happens during an AKS upgrade?

Answer: AKS upgrades are rolling. First the control plane is upgraded (API server, etcd — this is seamless). Then nodes are upgraded one at a time: AKS cordons a node, drains pods (respecting PodDisruptionBudgets), upgrades the node OS and kubelet, then uncordons it. The max-surge setting controls how many extra nodes are created during upgrade to maintain capacity. Best practice: set max-surge=1 or 33% and always have PDBs on critical workloads.

Q8: What is the difference between system and user node pools?

Answer:

Best practice: Separate system and user pools. System pool uses reliable VMs (Standard_D2s_v5). User pool can use cost-effective or specialized VMs (GPU, spot).

Q9: Can you scale an AKS node pool to zero?

Answer: User node pools can be scaled to 0 — useful for cost optimization (e.g., GPU pools that only run during ML training). System node pools cannot be scaled to 0 because they must run system pods. Minimum 1 node in the system pool at all times.

Q10: How do you connect kubectl to your AKS cluster?

Answer: az aks get-credentials --resource-group myRG --name myCluster. This merges the cluster's kubeconfig into your ~/.kube/config. With Azure AD integration, the first kubectl command will trigger a browser login for authentication. For CI/CD, use --admin flag (cluster admin credentials) or a kubelogin plugin with a service principal.

🌐 Section 2 — Networking (Questions 11–18)

Q11: Explain kubenet vs Azure CNI.

Answer:

FeatureKubenetAzure CNI
Pod IPsPrivate range, NAT'd to node IPReal VNet IPs for every pod
IP consumptionLow (1 IP per node)High (1 IP per pod)
VNet integrationLimited — pods aren't directly routableFull — pods get VNet IPs
Network PoliciesCalico onlyCalico or Azure NPM
Best forSmall clusters, IP-constrained environmentsProduction, VNet-peered architectures

Interview tip: "In most production scenarios, we use Azure CNI because pods need to be directly reachable from other VNet resources like databases, VMs, and private endpoints."

Q12: How does the AKS load balancer work?

Answer: When you create a Kubernetes Service of type LoadBalancer, AKS automatically provisions an Azure Load Balancer in the MC_ resource group. It creates a frontend IP (public or internal), a backend pool (node VMSS instances), and health probes. Traffic flows: Client → Azure LB → Node (NodePort) → kube-proxy/iptables → Pod. For internal services, use service.beta.kubernetes.io/azure-load-balancer-internal: "true" annotation.

Q13: What is an Ingress Controller and why do you need one on AKS?

Answer: An Ingress Controller is a reverse proxy that routes HTTP/HTTPS traffic to backend services based on hostname and path rules. Without it, each service would need its own LoadBalancer (= its own public IP = more cost). With an Ingress Controller (NGINX, Traefik, or Azure Application Gateway), you have one Load Balancer routing to many services. It also handles TLS termination, path-based routing, rate limiting, and authentication.

Q14: How do network policies work in AKS?

Answer: Network policies are Kubernetes resources that control pod-to-pod traffic at L3/L4. By default, all pods can communicate with all other pods. Network policies act as firewall rules — you define which pods can talk to which pods on which ports. AKS supports two engines: Calico (works with both kubenet and Azure CNI) and Azure NPM (Azure CNI only). Once you create any NetworkPolicy in a namespace, all traffic not explicitly allowed is denied (default-deny behavior).

Q15: What is Azure Private Link / Private Cluster?

Answer: A private AKS cluster disables the public endpoint of the API server. The API server gets a private IP from your VNet instead. Access is only possible from within the VNet or through VPN/ExpressRoute. This is required for strict compliance environments where the control plane must not be internet-accessible. You enable it with --enable-private-cluster during creation.

Q16: Explain DNS resolution inside AKS.

Answer: CoreDNS runs as pods in kube-system and provides DNS for the cluster. All pods have /etc/resolv.conf pointing to the CoreDNS ClusterIP (typically 10.0.0.10). A service myservice in namespace myns resolves as myservice.myns.svc.cluster.local. CoreDNS can be customized with ConfigMaps for forwarding external domains to custom DNS servers.

Q17: How do you expose a service externally on AKS?

Answer: Three ways, from simplest to most production-ready:

  1. Service type LoadBalancer: Azure creates a public LB with a public IP. Quick, but each service gets its own IP.
  2. Ingress Controller (NGINX/Traefik): One LB + Ingress rules for path/host routing. Most common.
  3. Azure Application Gateway Ingress Controller (AGIC): Uses Azure's L7 load balancer natively. Supports WAF, autoscaling, and SSL offloading.

Q18: What is the difference between ClusterIP, NodePort, and LoadBalancer service types?

Answer:

TypeAccessible FromUse Case
ClusterIPInside the cluster onlyInternal microservice communication
NodePortNode IP + high port (30000-32767)Testing, rarely used in prod on AKS
LoadBalancerExternal via Azure LBProduction external traffic

📈 Section 3 — Scaling & Performance (Questions 19–24)

Q19: What is the Cluster Autoscaler and how does it work?

Answer: The Cluster Autoscaler watches for pods that can't be scheduled due to insufficient resources. When it finds pending pods, it tells Azure to add nodes to the VMSS (scale-out). When nodes are underutilized for 10+ minutes, it removes them (scale-in). It respects PodDisruptionBudgets during scale-in. Configure with --enable-cluster-autoscaler --min-count 2 --max-count 10.

Q20: How do HPA and Cluster Autoscaler work together?

Answer: They're complementary, not competing:

  1. Load increases → HPA creates more pod replicas
  2. New pods can't be scheduled (nodes full) → they go Pending
  3. Cluster Autoscaler sees Pending pods → provisions new nodes
  4. New nodes start → Pending pods get scheduled

The key: HPA scales pods (application layer), Cluster Autoscaler scales nodes (infrastructure layer).

Q21: What is KEDA and when would you use it over HPA?

Answer: KEDA (Kubernetes Event-Driven Autoscaling) scales based on external event sources — Azure Service Bus queue depth, Kafka lag, Prometheus metrics, cron schedules. HPA only scales on CPU/memory (or custom metrics with extra setup). KEDA is best for event-driven workloads: "Scale my worker pods to match the number of messages in the Azure Service Bus queue, and scale to 0 when idle."

Q22: What are Spot node pools?

Answer: Spot node pools use Azure Spot VMs — unused capacity at up to 90% discount. However, Azure can evict them at any time (30-second warning). Use them for: batch processing, CI/CD build agents, dev/test, stateless workers. Never for: stateful workloads, databases, or pods that can't tolerate interruption. Configure with --priority Spot --eviction-policy Delete --spot-max-price -1.

Q23: How do you right-size resource requests and limits?

Answer:

Q24: What are Virtual Nodes / ACI integration?

Answer: Virtual Nodes let AKS burst to Azure Container Instances — serverless containers with no node management. A pod scheduled on a virtual node runs in ACI within seconds (no waiting for VMs to provision). Great for: burst scaling, handling traffic spikes, running short-lived jobs. Limitation: no persistent volumes, limited networking, Linux only.

🔒 Section 4 — Security & RBAC (Questions 25–32)

Q25: How does Azure AD integration with AKS work?

Answer: AKS integrates with Azure AD (Entra ID) for authentication. When a user runs kubectl, they authenticate via Azure AD (browser flow or service principal). The API server validates the Azure AD token and maps the user to Kubernetes RBAC. You can grant Azure AD groups specific ClusterRoles — e.g., the "DevOps" AD group gets the edit ClusterRole in the production namespace.

Q26: Explain Kubernetes RBAC vs Azure RBAC for AKS.

Answer:

AspectKubernetes RBACAzure RBAC
ScopeInside the cluster (pods, services, secrets)Azure resource level (AKS resource, resource group)
Managed bykubectl / K8s manifestsAzure Portal / az CLI
Examples"User X can read pods in namespace Y""User X can stop/start the AKS cluster"
Best forWorkload-level access controlInfrastructure-level access control

Production pattern: Use Azure RBAC for who can manage the AKS resource. Use Kubernetes RBAC for what they can do inside the cluster.

Q27: What is Workload Identity?

Answer: Workload Identity lets a Kubernetes pod authenticate to Azure services (Key Vault, Storage, SQL) using a federated Azure AD identity — no secrets stored in the cluster. It replaces the older pod-managed identity (aad-pod-identity). The flow: Kubernetes ServiceAccount → OIDC federation → Azure Managed Identity → Azure resource access. It's the recommended way to access Azure resources from AKS pods.

Q28: How do you manage secrets in AKS?

Answer: Three approaches (from basic to production):

  1. Kubernetes Secrets: base64-encoded (NOT encrypted by default). Fine for non-sensitive config. Enable encryption at rest with Azure Key Vault KMS.
  2. Azure Key Vault + CSI driver: Secrets stored in Key Vault, mounted as files/env vars in pods via the Secrets Store CSI Driver. Centralized management, audit logging, rotation support.
  3. External Secrets Operator: Syncs secrets from Key Vault into Kubernetes Secrets automatically. Best for teams that want K8s-native Secret references but centralized storage.

Q29: What is Pod Security Admission (PSA)?

Answer: PSA replaced PodSecurityPolicies (removed in K8s 1.25). It enforces security standards at the namespace level using three profiles: Privileged (no restrictions), Baseline (blocks known escalation paths), and Restricted (hardened — no root, no hostPath, drop all capabilities). You set it with namespace labels: pod-security.kubernetes.io/enforce: restricted.

Q30: How do you restrict egress traffic from AKS pods?

Answer: Two layers:

Q31: What Microsoft Defender for Containers does for AKS?

Answer: It provides: runtime threat detection (crypto mining, reverse shells), vulnerability scanning of container images in ACR, security recommendations for cluster configuration, and alerts for suspicious Kubernetes API calls. It's enabled at the subscription level and integrates with Azure Security Center.

Q32: Explain the principle of least privilege for AKS.

Answer:

🛠️ Section 5 — Operations & Troubleshooting (Questions 33–40)

Q33: How do you monitor an AKS cluster?

Answer: Three layers:

Q34: A pod is stuck in Pending — walk me through how you'd debug it.

Answer:

  1. kubectl describe pod <name> → read the Events section
  2. If "Insufficient cpu/memory" → kubectl top nodes to check capacity → scale the node pool or reduce resource requests
  3. If "no nodes match node affinity" → check nodeSelector/affinity rules vs actual node labels
  4. If "0/N nodes available: N had taint" → check pod tolerations vs node taints
  5. If PVC-related → kubectl get pvc to check if the volume is bound

Q35: What's the difference between liveness and readiness probes?

Answer:

ProbePurposeOn Failure
LivenessIs the container alive?Kubelet restarts the container
ReadinessIs the container ready to serve traffic?Remove from Service endpoints (no traffic routed)
StartupHas the container finished starting?Delays liveness/readiness checks for slow-starting apps

Common mistake: Using the same endpoint for liveness and readiness. If your app temporarily can't reach the database, readiness should fail (stop traffic), but liveness should still pass (don't restart — the DB will come back).

Q36: How do you do zero-downtime deployments on AKS?

Answer: Combine these practices:

Q37: What is GitOps and how is it used with AKS?

Answer: GitOps means Git is the single source of truth for cluster state. A GitOps operator running inside the cluster (Flux v2 or ArgoCD) continuously reconciles the desired state (Git repo) with the actual state (cluster). Benefits: audit trail (git log), rollback (git revert), no direct kubectl access needed in production. AKS has built-in Flux v2 support via the microsoft.flux extension.

Q38: How do you handle multi-environment (dev/staging/prod) on AKS?

Answer: Two patterns:

With either pattern, use Helm values files per environment (values-dev.yaml, values-prod.yaml) and GitOps for automated deployment.

Q39: How would you migrate a workload from a VM to AKS?

Answer: Step-by-step approach:

  1. Containerize: Create a Dockerfile, ensure the app can run as a non-root container
  2. Externalize config: Move from files/env on the VM to ConfigMaps and Secrets
  3. Externalize state: Move persistent data to Azure managed services (Azure SQL, Redis Cache, Storage)
  4. CI/CD: Build pipeline to build image → push to ACR → deploy to AKS
  5. Gradual cutover: Run both VM and AKS versions, shift traffic with Azure Traffic Manager or Front Door
  6. Decommission: Once validated, shut down the VM

Q40: What would you check if the AKS API server is slow or unresponsive?

Answer:

  1. Check Azure status: status.azure.com for regional outages
  2. Check SLA tier: Free tier has no SLA — the API server can be slow under load. Upgrade to Standard.
  3. Check authorized IP ranges: Your IP might not be in the allowed list
  4. Check for chatty workloads: Excessive watch/list calls from controllers can overload the API server. Check with kubectl get --raw /metrics | grep apiserver_request_total
  5. Check cluster size: Very large clusters (1000+ nodes) may need Premium tier
  6. Run diagnostics: az aks show to check provisioning state

🎯 Bonus: Scenario-Based Questions

Scenario 1: "Design an AKS architecture for a multi-team e-commerce platform."

Framework for answering:

Scenario 2: "Your deployment succeeded but the app returns 500 errors."

Debugging flow:

  1. Check pod logs → kubectl logs <pod> -n <ns>
  2. Is it all pods or just one? → If one, exec in and check
  3. Check dependent services (DB, Redis, external APIs) → kubectl exec + curl
  4. Check if ConfigMaps/Secrets have the right values → kubectl get cm/secret -o yaml
  5. Check if the new image version has a bug → helm rollback to previous version
  6. If rollback fixes it → the issue is in the code, not the infrastructure

📝 Interview Tips

💡
Congratulations!

You've completed the entire AKS — Zero to Hero course! You now have the knowledge to create, manage, secure, scale, and troubleshoot AKS clusters in production. Review the hands-on labs regularly and practice on a real Azure subscription.

← Back to AKS Course