Click any question to reveal the answer. Try answering out loud first, then compare. For scenario questions, think through the debugging steps before revealing the solution.
Interview Preparation
Top 25+ Kubernetes interview questions with detailed answers, scenario-based challenges, and architecture explanation practice.
Section 1: Core Concepts (Must-Know)
Kubernetes is an open-source container orchestration platform originally developed by Google. It automates deployment, scaling, and management of containerized applications. Key reasons to use it: Self-healing (restarts crashed containers), Auto-scaling (horizontal pod autoscaler), Service discovery & load balancing (via Services), Rolling updates & rollbacks (zero-downtime deployments), Secret & config management, Storage orchestration. It solves the problem of running containers reliably in production at scale.
Control Plane: API Server (central hub, handles all REST requests), etcd (distributed key-value store for cluster state), Scheduler (assigns pods to nodes based on resources/constraints), Controller Manager (runs reconciliation loops β Deployment controller, ReplicaSet controller, etc.). Worker Nodes: kubelet (agent that ensures containers run per pod specs), kube-proxy (manages networking rules for Service traffic), Container Runtime (containerd/CRI-O β runs the actual containers). All communication goes through the API Server. etcd is the single source of truth.
A Container is a single runnable image instance. A Pod is the smallest deployable unit in Kubernetes β it wraps one or more containers that share the same network namespace (same IP, localhost communication), storage volumes, and lifecycle. Pods are ephemeral; they don't self-heal. You almost never create pods directly β you use Deployments which manage ReplicaSets which manage Pods.
A ReplicaSet ensures a specified number of identical pod replicas are running. A Deployment manages ReplicaSets and adds rolling updates and rollback capabilities. When you update a Deployment, it creates a new ReplicaSet, scales it up, and scales the old one down (rolling update). You should almost always use Deployments, not standalone ReplicaSets. Hierarchy: Deployment β ReplicaSet β Pods.
ClusterIP (default): Internal-only IP. Used for pod-to-pod communication within the cluster. NodePort: Exposes the service on a static port (30000-32767) on every node's IP. Accessible externally via NodeIP:NodePort. LoadBalancer: Provisions a cloud load balancer (AWS ALB, Azure LB) that routes to the service. Used in production for external traffic. ExternalName: Maps a service to an external DNS name (CNAME). No proxying involved.
Namespaces provide logical isolation within a cluster. They partition resources, allow different teams or environments (dev, staging, prod) to coexist in one cluster with isolation. Use cases: multi-tenancy, resource quotas per team, RBAC per namespace, environment separation. Default namespaces: default, kube-system, kube-public, kube-node-lease. Note: Cluster-scoped resources (nodes, PVs, ClusterRoles) are not namespaced.
etcd is a distributed, strongly consistent key-value store that stores the entire Kubernetes cluster state β all resource definitions, configurations, secrets, and metadata. If etcd is lost and there's no backup, the cluster is effectively destroyed. It uses the Raft consensus algorithm for distributed consistency. Best practices: run 3+ replicas for HA, regular automated backups (etcdctl snapshot save), encrypt at rest, restrict access to only the API server.
Section 2: Configuration & Security
Both store configuration data as key-value pairs. ConfigMap: For non-sensitive data (app settings, feature flags). Stored in plain text. Secret: For sensitive data (passwords, API keys, TLS certs). Base64-encoded by default (not encrypted unless you enable encryption at rest). Both can be consumed as environment variables or mounted as files. In production, use external secret stores (Vault, AWS Secrets Manager) via CSI drivers for true secret security.
RBAC controls who can do what in the cluster. Four objects: Role (namespace-scoped permissions), ClusterRole (cluster-scoped), RoleBinding (binds Role to users/groups/SAs in a namespace), ClusterRoleBinding (binds ClusterRole cluster-wide). Example: a Role with verbs: [get, list] on resources: [pods] lets the bound subject view pods but not modify them. Follow least privilege: never give cluster-admin to service accounts.
A ServiceAccount provides an identity for pods to authenticate against the Kubernetes API. Each namespace has a default SA. When a pod needs to interact with the API (e.g., CI/CD tools, operators), it uses a ServiceAccount. The SA's permissions are defined by RBAC. Best practices: don't use the default SA for apps, create dedicated SAs with minimal RBAC; set automountServiceAccountToken: false when API access isn't needed.
Network Policies control pod-to-pod and pod-to-external network traffic at L3/L4 (IP/port level). By default, all pods can communicate freely. Network policies restrict this. They use label selectors to match pods, define ingress/egress rules, and require a CNI plugin that supports them (Calico, Cilium, Azure CNI). Best pattern: default-deny everything, then explicitly allow required traffic paths.
Section 3: Scaling & Operations
HPA automatically scales pod replicas based on observed metrics (CPU utilization, memory, or custom metrics). It checks metrics every 15 seconds (configurable), computes the desired replica count using: desiredReplicas = ceil[currentReplicas Γ (currentMetric / targetMetric)], and scales the Deployment. Requires metrics-server for CPU/memory metrics, or Prometheus adapter for custom metrics. Has cooldown periods to prevent flapping. Configure minReplicas and maxReplicas to set bounds.
Liveness probe: "Is the container alive?" If it fails, kubelet kills and restarts the container. Catches deadlocks, hangs. Readiness probe: "Is the container ready to accept traffic?" If it fails, the pod is removed from Service endpoints (no traffic routed). Container continues running. Useful during startup or temporary heavy load. Startup probe: "Has the app finished starting?" Disables liveness/readiness probes until it succeeds. Used for slow-starting applications. Prevents premature liveness kills.
A Service provides L4 (TCP/UDP) load balancing. An Ingress provides L7 (HTTP/HTTPS) routing β host-based routing, path-based routing, TLS termination. Ingress requires an Ingress Controller (NGINX, Traefik, AWS ALB). One Ingress resource can route traffic to multiple services. Example: api.example.com/users β user-service, api.example.com/orders β order-service. It consolidates multiple services behind a single entry point.
Rolling Update: The default strategy. Creates new pods with the updated spec, then terminates old pods gradually. Controlled by maxSurge (extra pods during update) and maxUnavailable (pods that can be down). Set maxUnavailable=0 for zero-downtime. Rollback: Kubernetes keeps revision history. kubectl rollout undo deployment/<name> reverts to previous version. kubectl rollout undo --to-revision=2 goes to a specific revision. kubectl rollout history shows all revisions.
A DaemonSet ensures that a copy of a pod runs on every node (or specific nodes via nodeSelector). Use cases: log collectors (Fluentd), monitoring agents (Prometheus node-exporter), network plugins (Calico, kube-proxy). When new nodes join the cluster, the DaemonSet automatically schedules pods on them. When nodes are removed, those pods are garbage collected.
StatefulSets manage stateful applications that need: stable network identities (pod-0, pod-1, pod-2 β predictable DNS names), stable persistent storage (each pod gets its own PVC), ordered deployment/scaling (pod-0 starts before pod-1). Used for databases (MySQL, PostgreSQL), message queues (Kafka, RabbitMQ), distributed systems (Elasticsearch, ZooKeeper). Unlike Deployments, pods are not interchangeable.
Section 4: Scenario-Based Questions
Systematic approach:
kubectl get podsβ confirm CrashLoopBackOff, note RESTARTS countkubectl logs <pod>β check current crash outputkubectl logs <pod> --previousβ check previous crash if no current logskubectl describe pod <pod>β check Events, Last State, Exit Code- Exit code analysis: 1=app error, 137=OOMKilled, 126=permission, 127=command not found
- If OOMKilled: increase memory limits
- If app error: fix the code, missing config, or missing env vars
- If command not found: check the container image and entrypoint
- Last resort: run the same image interactively with overridden entrypoint to debug
Immediate response:
- Rollback immediately:
kubectl rollout undo deployment/<name>β this restores the previous working version in seconds - Check rollout status:
kubectl rollout status deployment/<name> - Verify users are restored: check service endpoints, test the endpoint
Then investigate:
- Check the failed pods from the bad version:
kubectl logs - Describe the deployment:
kubectl rollout history deployment/<name> --revision=<n> - Root cause: wrong image, bad config, dependency unavailable?
- Fix the issue, test in staging, then redeploy
Prevention: Use readiness probes (bad pods won't receive traffic), staged rollouts (canary), maxUnavailable=0.
Cluster design:
- HA Control Plane: 3+ master nodes across availability zones. Managed K8s (EKS/AKS/GKE) if possible
- Worker Nodes: Separate node pools β general (mixed workloads), memory-optimized (databases), spot/preemptible (batch jobs)
- Namespaces: Per-team or per-service. ResourceQuotas and LimitRanges on each
Workload config:
- Deployments with resource requests/limits, health probes, pod disruption budgets
- HPA on each service with appropriate metrics
- Anti-affinity rules to spread pods across nodes/zones
Networking: Ingress controller + cert-manager for TLS. Network policies for pod isolation. Service mesh (Istio/Linkerd) for observability.
Security: RBAC per namespace. Network policies. Pod security standards. External secrets. Image scanning. OPA/Kyverno policies.
Observability: Prometheus + Grafana for metrics. EFK/Loki for logs. Jaeger/Zipkin for tracing.
kubectl top podsβ check CPU/memory usage. Are pods maxing out?kubectl describe podβ check resource limits. Is CPU being throttled?kubectl logsβ look for slow queries, timeouts, or errors- Check pod distribution:
kubectl get pods -o wideβ are all pods on one overloaded node? - Check HPA: is it scaling?
kubectl get hpa - Check downstream dependencies: is the database slow? External API timing out?
- Fixes: Increase resources, scale up replicas, add pod anti-affinity for distribution, optimize the application code, add caching
K8s built-in (minimum):
- Enable encryption at rest for etcd (EncryptionConfiguration)
- Use RBAC to restrict who can read secrets
- Never commit secrets to Git β use sealed-secrets or SOPS for GitOps
Production (recommended):
- External secret stores: AWS Secrets Manager, Azure Key Vault, HashiCorp Vault
- Integration via: CSI Secret Store Driver, External Secrets Operator, or Vault Agent sidecar
- Rotate secrets automatically
- Audit secret access via K8s audit logs
Never: Store secrets in ConfigMaps, environment variables in CI/CD logs, or unencrypted YAML in Git.
- Rolling update strategy: Set
maxUnavailable: 0andmaxSurge: 1(or 25%) - Readiness probes: New pods only receive traffic after they're ready
- Pod Disruption Budgets (PDB): Ensure minimum pods are available during voluntary disruptions (node drains)
- Graceful shutdown: Handle SIGTERM in your app. Set
terminationGracePeriodSecondsappropriately. Use preStop hooks if needed - Connection draining: Readiness probe should fail immediately on SIGTERM so new connections go to other pods while existing ones finish
- Multiple replicas: Always run 2+ replicas spread across nodes/zones
Deployment: Runs N replicas of a pod. Scheduler decides which nodes. Used for application workloads (web servers, APIs, workers). Scales horizontally.
DaemonSet: Runs exactly one pod per node (or per selected nodes). Used for node-level services: monitoring agents, log collectors, network plugins. Automatically adds/removes pods as nodes join/leave.
Key difference: Deployment = "I need N copies somewhere." DaemonSet = "I need exactly one on every node."
CoreDNS runs as a Deployment in kube-system. Every pod's /etc/resolv.conf points to the CoreDNS service IP. DNS records are auto-created for Services:
<service-name>.<namespace>.svc.cluster.localβ Service ClusterIP- Within the same namespace, just
<service-name>works - Headless services (clusterIP: None) return individual pod IPs
- StatefulSet pods get:
<pod-name>.<service-name>.<namespace>.svc.cluster.local
DNS is the backbone of service discovery in Kubernetes. If CoreDNS is down, inter-service communication breaks.
Section 5: Architecture Explanation Practice
Interviewers often ask you to draw/explain the K8s architecture or a deployment flow on a whiteboard. Practice explaining these:
Practice 1: "Explain what happens when you run 'kubectl apply -f deployment.yaml'"
1) kubectl sends YAML to API Server. 2) API Server authenticates β authorizes (RBAC) β admission controls β validates β stores in etcd. 3) Deployment Controller sees new Deployment, creates ReplicaSet. 4) ReplicaSet Controller creates Pod objects. 5) Scheduler sees unscheduled pods, picks the best node, updates etcd. 6) kubelet on that node sees assigned pods, pulls image via container runtime, starts container. 7) kube-proxy updates iptables/IPVS rules for Service routing.
Practice 2: "How does traffic reach your application?"
Explain: User β DNS β Load Balancer β Ingress Controller Pod β Service (kube-proxy / iptables) β Pod. Cover: L7 routing at Ingress, L4 at Service, pod selection via labels, endpoint slice updates when pods change.
Practice 3: "How do you ensure high availability?"
Cover: Multiple replicas across nodes/zones (pod anti-affinity), Pod Disruption Budgets, readiness probes, rolling updates with maxUnavailable=0, multi-AZ node pools, HA control plane (3+ masters), etcd backups, cluster autoscaler for capacity.
π Interview Tips
- Say "I would checkβ¦" then list commands β interviewers want to see your thought process, not just the answer
- Draw diagrams β for architecture questions, sketch the components and arrows
- Mention trade-offs β "NodePort is simpler but LoadBalancer is better for production becauseβ¦"
- Connect to experience β "In my project, we used HPA becauseβ¦"
- Admit what you don't know β "I haven't used StatefulSets in production, but I understand they provideβ¦"
- Security is always a good answer β mentioning RBAC, least privilege, and secrets management shows maturity
π Summary
- Core concepts (Pods, Deployments, Services, Namespaces) are asked in every interview
- Scenario-based questions test your debugging methodology β always describe a systematic approach
- Architecture questions test your understanding of component interactions β practice drawing the K8s architecture
- Security awareness (RBAC, secrets, network policies) separates good answers from great ones
- Real experience matters β relate answers back to your hands-on practice
You've completed the entire Kubernetes Zero to Hero course. Go back and review any topics you're less confident about, practice the hands-on labs, and you'll be interview-ready. Good luck!