In managed Kubernetes (AKS, EKS, GKE), you typically don't have SSH access to master nodes. The control plane is managed by the cloud provider. You can still inspect worker nodes and kube-system pods.
Kubernetes Architecture
Understand every component of a Kubernetes cluster — control plane, worker nodes, and how they communicate.
🧒 Simple Explanation (ELI5)
Think of Kubernetes as an airport:
- Control Tower (Control Plane) — decides which planes land where, tracks all flights, reroutes if a runway closes
- Runways & Gates (Worker Nodes) — where planes actually land and passengers board
- Flight Database (etcd) — the single source of truth for all flight schedules and statuses
- Ground Crew (kubelet) — on each runway, reports status and follows tower instructions
- Scheduler — assigns incoming flights to available runways based on capacity
The control tower never runs flights itself — it coordinates. The runways do the actual work.
🤔 Why Understand Architecture?
You cannot debug Kubernetes effectively without understanding its architecture. When a pod won't schedule, you need to know the scheduler's role. When a node goes down, you need to know what the controller manager does. Architecture knowledge is the foundation for every interview question and production incident.
🔧 Technical Explanation
Control Plane Components
The control plane runs on master node(s) and manages the entire cluster:
| Component | Role | What Happens If It Fails |
|---|---|---|
| kube-apiserver | Front door for all cluster operations. REST API that kubectl talks to. Validates and processes requests. | All cluster management stops. kubectl stops working. But running pods continue running. |
| etcd | Distributed key-value store. Stores all cluster state — pod info, configs, secrets, service accounts. | Cluster loses its memory. No new operations possible. Existing workloads keep running but can't be managed. |
| kube-scheduler | Assigns pods to nodes based on resource requirements, affinity rules, taints/tolerations. | New pods stay in Pending state. Existing pods unaffected. |
| kube-controller-manager | Runs controller loops: Deployment, ReplicaSet, Node, Job controllers. Ensures desired state = actual state. | No self-healing. Failed pods aren't replaced. Scaling doesn't work. |
| cloud-controller-manager | Integrates with cloud provider APIs (load balancers, storage, node management). | Cloud-specific features stop (LB provisioning, node auto-repair). |
Worker Node Components
Every worker node runs these components:
| Component | Role |
|---|---|
| kubelet | Agent on each node. Receives pod specs from API server, tells the container runtime to start/stop containers, reports node and pod status. |
| kube-proxy | Network proxy on each node. Maintains network rules (iptables/IPVS) for Service-based routing. Enables ClusterIP, NodePort communication. |
| Container Runtime | Actually runs containers. Usually containerd or CRI-O. Implements the Container Runtime Interface (CRI). |
📊 Visual: Cluster Architecture
⌨️ Hands-on: Explore Your Cluster Architecture
# View cluster endpoint information kubectl cluster-info # List all nodes and their status kubectl get nodes -o wide # See control plane pods (in kube-system namespace) kubectl get pods -n kube-system # Describe a node to see capacity, allocatable resources, and conditions kubectl describe node <node-name> # Check component statuses (deprecated but educational) kubectl get componentstatuses # View all API resources available in the cluster kubectl api-resources | head -20 # Check etcd health (if you have access) kubectl get endpoints -n kube-system
# Inspect kubelet on a node (SSH into node first) systemctl status kubelet # View kubelet logs journalctl -u kubelet -f --no-pager | tail -20 # Check container runtime crictl ps crictl images
🐛 Debugging Scenarios
Scenario 1: Master Node Failure
Symptom: kubectl commands hang or return "connection refused".
Impact: Existing pods continue running (kubelet manages them locally). But: no new pods are scheduled, no scaling, no self-healing, no new deployments.
Troubleshooting:
- Check if kube-apiserver pod is running:
docker ps | grep apiserver(on master) - Check etcd health:
etcdctl endpoint health - Review systemd logs:
journalctl -u kubelet -fon master - In managed K8s (AKS/EKS): check cloud provider status page; control plane is provider's responsibility
Scenario 2: Worker Node Not Ready
Symptom: kubectl get nodes shows a node as NotReady.
Troubleshooting:
# Check node conditions kubectl describe node <node-name> | grep -A 10 "Conditions" # Common causes: # - kubelet stopped: systemctl restart kubelet # - Disk pressure: df -h on the node # - Memory pressure: free -m # - Network issues: ping the API server from the node # - Container runtime down: systemctl status containerd
Scenario 3: etcd Data Loss
Impact: Complete cluster state loss. All resource definitions gone.
Prevention: Regular etcd snapshots. In production, run etcd as a 3 or 5 member cluster for HA. Use etcdctl snapshot save for backups.
🎯 Interview Questions
Beginner
A Kubernetes cluster has two parts: the Control Plane (kube-apiserver, etcd, kube-scheduler, kube-controller-manager) and Worker Nodes (kubelet, kube-proxy, container runtime). The control plane makes decisions about scheduling and state management; worker nodes run the actual workloads.
etcd is a distributed, consistent key-value store that serves as Kubernetes' backing store for all cluster data. Every pod definition, service, secret, and config is stored in etcd. If etcd is lost without a backup, the entire cluster state is gone. That's why etcd backups are critical in production.
The kubelet is an agent that runs on every worker node. It receives pod specifications (PodSpecs) from the API server, ensures the containers described in those specs are running and healthy, and reports node/pod status back to the control plane. It's the bridge between the control plane and the container runtime.
The kube-apiserver is the front-end for the Kubernetes control plane. It exposes the Kubernetes API (REST). All external communication (kubectl, dashboards, CI/CD) and internal communication (scheduler, controller-manager) goes through the API server. It handles authentication, authorization, admission control, and validation.
kube-proxy is a network proxy running on each node. It maintains network rules (using iptables or IPVS) that allow network communication to pods from inside and outside the cluster. It implements the Kubernetes Service concept — routing traffic to the right pod backends.
Intermediate
No — running pods continue running. The kubelet on each worker node manages containers locally. However, no new scheduling, no self-healing (crashed pods aren't replaced), no new deployments, and no scaling will work. kubectl commands will fail. This is why production clusters run multiple master nodes for high availability.
The scheduler follows a two-phase process: Filtering — eliminates nodes that can't run the pod (insufficient resources, taints, node selectors, affinity rules). Scoring — ranks remaining nodes by criteria like resource balance, data locality, and spreading. The highest-scored node wins. You can influence this with nodeSelector, affinity/anti-affinity, taints/tolerations, and priority classes.
iptables: Default mode. Creates iptables rules for each Service/endpoint. Simple but performance degrades with thousands of services (O(n) rule matching). IPVS: Uses Linux kernel IPVS (IP Virtual Server). Hash-table based, O(1) lookup. Better performance at scale. Supports more load-balancing algorithms (round-robin, least connections, etc.). Use IPVS for large clusters with 1000+ services.
etcd uses the Raft consensus algorithm to maintain data consistency across multiple replicas. A distributed etcd cluster (3 or 5 members) provides high availability — if one member fails, the cluster continues operating as long as a majority (quorum) is alive. A single etcd instance is a single point of failure and unacceptable for production.
The kube-controller-manager runs multiple controllers in a single process. Each controller is a control loop that watches the cluster state via the API server and makes changes to move from current state → desired state. Examples: ReplicaSet controller ensures correct replica count; Node controller marks unreachable nodes; Job controller manages batch jobs. This watch-and-reconcile pattern is fundamental to Kubernetes' declarative model.
Scenario-Based
1) Check network connectivity to the API server: curl -k https://<api-server>:6443/healthz. 2) Verify kubeconfig is correct: kubectl config view. 3) Check if API server is running: kubectl get pods -n kube-system (from another context or on the master). 4) Check if etcd is healthy — if etcd is down, API server can't serve requests. 5) For managed K8s: check cloud provider status. 6) Check if it's a DNS issue — try using the IP directly in kubeconfig.
1) kubectl describe node <name> — check Conditions section (MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable, Ready). 2) SSH into the node and check kubelet: systemctl status kubelet, journalctl -u kubelet --since "5 min ago". 3) Check container runtime: systemctl status containerd. 4) Check disk: df -h. 5) Check memory: free -m. 6) Check networking: can the node reach the API server? 7) Check for kernel/OS issues: dmesg | tail.
Run 3 master nodes (odd number for etcd quorum). Use a load balancer in front of API servers. Run etcd as a 3-member cluster (or 5 for larger setups). Spread worker nodes across availability zones. Use Pod Disruption Budgets to ensure minimum replicas during node maintenance. For managed K8s (AKS/EKS), the control plane HA is handled by the provider — focus on multi-AZ node pools and PDB configurations.
etcd compacts revisions automatically, but the space isn't freed until defragmentation. Steps: 1) Check space: etcdctl endpoint status --write-out=table. 2) Compact old revisions: etcdctl compact <revision>. 3) Defrag: etcdctl defrag. 4) Check if too many objects exist (excessive ConfigMaps, Secrets, Events). 5) Set up etcd quota with --quota-backend-bytes. 6) Ensure regular snapshots and lifecycle management.
1) Check node capacity: kubectl describe nodes | grep -A 5 "Allocated resources". 2) Check if resource requests on the pod exceed any node's allocatable capacity. 3) Check taints on nodes: kubectl get nodes -o json | jq '.items[].spec.taints' — pods without matching tolerations won't schedule. 4) Check nodeSelector or affinity rules on the pod spec. 5) Check if PersistentVolumeClaims are bound (unbound PVCs prevent pod scheduling). 6) Consider adding nodes or adjusting resource requests.
🌍 Real-World Use Case
A media streaming company runs 200+ microservices on Kubernetes across 3 availability zones. Their architecture decisions:
- 3 master nodes with etcd running in a stacked topology for HA
- Node pools: separate pools for stateless apps (spot/preemptible instances) and stateful workloads (on-demand)
- etcd backups: snapshots every 30 minutes to cloud storage
- Monitoring: Prometheus scraping all control plane metrics; alerts on etcd latency, API server request duration, scheduler binding failures
- When one AZ had an outage, pods automatically rescheduled to the other two zones — zero user impact
📝 Summary
- The Control Plane (API server, etcd, scheduler, controller-manager) makes all cluster decisions
- Worker Nodes (kubelet, kube-proxy, container runtime) run the actual workloads
- etcd is the single source of truth — back it up religiously
- Master failure doesn't stop running pods, but stops all management operations
- Understanding architecture is essential for debugging every K8s issue