Beginner Lesson 2 of 14

Kubernetes Architecture

Understand every component of a Kubernetes cluster — control plane, worker nodes, and how they communicate.

🧒 Simple Explanation (ELI5)

Think of Kubernetes as an airport:

Control Tower (Control Plane) — decides which planes land where, tracks all flights, reroutes if a runway closes
Runways & Gates (Worker Nodes) — where planes actually land and passengers board
Flight Database (etcd) — the single source of truth for all flight schedules and statuses
Ground Crew (kubelet) — on each runway, reports status and follows tower instructions
Scheduler — assigns incoming flights to available runways based on capacity

The control tower never runs flights itself — it coordinates. The runways do the actual work.

🤔 Why Understand Architecture?

You cannot debug Kubernetes effectively without understanding its architecture. When a pod won't schedule, you need to know the scheduler's role. When a node goes down, you need to know what the controller manager does. Architecture knowledge is the foundation for every interview question and production incident.

🔧 Technical Explanation

Control Plane Components

The control plane runs on master node(s) and manages the entire cluster:

Component	Role	What Happens If It Fails
kube-apiserver	Front door for all cluster operations. REST API that kubectl talks to. Validates and processes requests.	All cluster management stops. kubectl stops working. But running pods continue running.
etcd	Distributed key-value store. Stores all cluster state — pod info, configs, secrets, service accounts.	Cluster loses its memory. No new operations possible. Existing workloads keep running but can't be managed.
kube-scheduler	Assigns pods to nodes based on resource requirements, affinity rules, taints/tolerations.	New pods stay in `Pending` state. Existing pods unaffected.
kube-controller-manager	Runs controller loops: Deployment, ReplicaSet, Node, Job controllers. Ensures desired state = actual state.	No self-healing. Failed pods aren't replaced. Scaling doesn't work.
cloud-controller-manager	Integrates with cloud provider APIs (load balancers, storage, node management).	Cloud-specific features stop (LB provisioning, node auto-repair).

Worker Node Components

Every worker node runs these components:

Component	Role
kubelet	Agent on each node. Receives pod specs from API server, tells the container runtime to start/stop containers, reports node and pod status.
kube-proxy	Network proxy on each node. Maintains network rules (iptables/IPVS) for Service-based routing. Enables ClusterIP, NodePort communication.
Container Runtime	Actually runs containers. Usually containerd or CRI-O. Implements the Container Runtime Interface (CRI).

📊 Visual: Cluster Architecture

Kubernetes Cluster Architecture

Control Plane (Master)

kube-apiserver

etcd

kube-scheduler

controller-manager

cloud-controller-mgr

⇄

Worker Node 1

kubelet

kube-proxy

Pod A

Pod B

⇄

Worker Node 2

kubelet

kube-proxy

Pod C

Pod D

Request Flow: kubectl apply → Pod Running

kubectl

→

API Server

→

etcd (store)

API Server

→

Scheduler (assign node)

→

kubelet (start pod)

⌨️ Hands-on: Explore Your Cluster Architecture

bash

# View cluster endpoint information
kubectl cluster-info

# List all nodes and their status
kubectl get nodes -o wide

# See control plane pods (in kube-system namespace)
kubectl get pods -n kube-system

# Describe a node to see capacity, allocatable resources, and conditions
kubectl describe node <node-name>

# Check component statuses (deprecated but educational)
kubectl get componentstatuses

# View all API resources available in the cluster
kubectl api-resources | head -20

# Check etcd health (if you have access)
kubectl get endpoints -n kube-system

bash

# Inspect kubelet on a node (SSH into node first)
systemctl status kubelet

# View kubelet logs
journalctl -u kubelet -f --no-pager | tail -20

# Check container runtime
crictl ps
crictl images

⚠️

Warning

In managed Kubernetes (AKS, EKS, GKE), you typically don't have SSH access to master nodes. The control plane is managed by the cloud provider. You can still inspect worker nodes and kube-system pods.

🐛 Debugging Scenarios

Scenario 1: Master Node Failure

Symptom: kubectl commands hang or return "connection refused".

Impact: Existing pods continue running (kubelet manages them locally). But: no new pods are scheduled, no scaling, no self-healing, no new deployments.

Troubleshooting:

Check if kube-apiserver pod is running: docker ps | grep apiserver (on master)
Check etcd health: etcdctl endpoint health
Review systemd logs: journalctl -u kubelet -f on master
In managed K8s (AKS/EKS): check cloud provider status page; control plane is provider's responsibility

Scenario 2: Worker Node Not Ready

Symptom: kubectl get nodes shows a node as NotReady.

Troubleshooting:

bash

# Check node conditions
kubectl describe node <node-name> | grep -A 10 "Conditions"

# Common causes:
# - kubelet stopped: systemctl restart kubelet
# - Disk pressure: df -h on the node
# - Memory pressure: free -m
# - Network issues: ping the API server from the node
# - Container runtime down: systemctl status containerd

Scenario 3: etcd Data Loss

Impact: Complete cluster state loss. All resource definitions gone.

Prevention: Regular etcd snapshots. In production, run etcd as a 3 or 5 member cluster for HA. Use etcdctl snapshot save for backups.

🎯 Interview Questions

Beginner

Q: What are the main components of a Kubernetes cluster?▼

A Kubernetes cluster has two parts: the Control Plane (kube-apiserver, etcd, kube-scheduler, kube-controller-manager) and Worker Nodes (kubelet, kube-proxy, container runtime). The control plane makes decisions about scheduling and state management; worker nodes run the actual workloads.

Q: What is etcd and why is it important?▼

etcd is a distributed, consistent key-value store that serves as Kubernetes' backing store for all cluster data. Every pod definition, service, secret, and config is stored in etcd. If etcd is lost without a backup, the entire cluster state is gone. That's why etcd backups are critical in production.

Q: What does the kubelet do?▼

The kubelet is an agent that runs on every worker node. It receives pod specifications (PodSpecs) from the API server, ensures the containers described in those specs are running and healthy, and reports node/pod status back to the control plane. It's the bridge between the control plane and the container runtime.

Q: What is the kube-apiserver?▼

The kube-apiserver is the front-end for the Kubernetes control plane. It exposes the Kubernetes API (REST). All external communication (kubectl, dashboards, CI/CD) and internal communication (scheduler, controller-manager) goes through the API server. It handles authentication, authorization, admission control, and validation.

Q: What is kube-proxy?▼

kube-proxy is a network proxy running on each node. It maintains network rules (using iptables or IPVS) that allow network communication to pods from inside and outside the cluster. It implements the Kubernetes Service concept — routing traffic to the right pod backends.

Intermediate

Q: What happens when a master node goes down? Do running pods stop?▼

No — running pods continue running. The kubelet on each worker node manages containers locally. However, no new scheduling, no self-healing (crashed pods aren't replaced), no new deployments, and no scaling will work. kubectl commands will fail. This is why production clusters run multiple master nodes for high availability.

Q: How does the scheduler decide which node to place a pod on?▼

The scheduler follows a two-phase process: Filtering — eliminates nodes that can't run the pod (insufficient resources, taints, node selectors, affinity rules). Scoring — ranks remaining nodes by criteria like resource balance, data locality, and spreading. The highest-scored node wins. You can influence this with nodeSelector, affinity/anti-affinity, taints/tolerations, and priority classes.

Q: What's the difference between iptables and IPVS mode in kube-proxy?▼

iptables: Default mode. Creates iptables rules for each Service/endpoint. Simple but performance degrades with thousands of services (O(n) rule matching). IPVS: Uses Linux kernel IPVS (IP Virtual Server). Hash-table based, O(1) lookup. Better performance at scale. Supports more load-balancing algorithms (round-robin, least connections, etc.). Use IPVS for large clusters with 1000+ services.

Q: Why does Kubernetes need etcd to be distributed?▼

etcd uses the Raft consensus algorithm to maintain data consistency across multiple replicas. A distributed etcd cluster (3 or 5 members) provides high availability — if one member fails, the cluster continues operating as long as a majority (quorum) is alive. A single etcd instance is a single point of failure and unacceptable for production.

Q: What is the Controller Manager pattern?▼

The kube-controller-manager runs multiple controllers in a single process. Each controller is a control loop that watches the cluster state via the API server and makes changes to move from current state → desired state. Examples: ReplicaSet controller ensures correct replica count; Node controller marks unreachable nodes; Job controller manages batch jobs. This watch-and-reconcile pattern is fundamental to Kubernetes' declarative model.

Scenario-Based

Q: You run 'kubectl get pods' and it hangs. What do you check?▼

1) Check network connectivity to the API server: curl -k https://<api-server>:6443/healthz. 2) Verify kubeconfig is correct: kubectl config view. 3) Check if API server is running: kubectl get pods -n kube-system (from another context or on the master). 4) Check if etcd is healthy — if etcd is down, API server can't serve requests. 5) For managed K8s: check cloud provider status. 6) Check if it's a DNS issue — try using the IP directly in kubeconfig.

Q: A node shows NotReady status. Walk through your investigation.▼

1) kubectl describe node <name> — check Conditions section (MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable, Ready). 2) SSH into the node and check kubelet: systemctl status kubelet, journalctl -u kubelet --since "5 min ago". 3) Check container runtime: systemctl status containerd. 4) Check disk: df -h. 5) Check memory: free -m. 6) Check networking: can the node reach the API server? 7) Check for kernel/OS issues: dmesg | tail.

Q: Your company needs a highly available Kubernetes setup. How do you design it?▼

Run 3 master nodes (odd number for etcd quorum). Use a load balancer in front of API servers. Run etcd as a 3-member cluster (or 5 for larger setups). Spread worker nodes across availability zones. Use Pod Disruption Budgets to ensure minimum replicas during node maintenance. For managed K8s (AKS/EKS), the control plane HA is handled by the provider — focus on multi-AZ node pools and PDB configurations.

Q: etcd is consuming too much disk space. How do you handle it?▼

etcd compacts revisions automatically, but the space isn't freed until defragmentation. Steps: 1) Check space: etcdctl endpoint status --write-out=table. 2) Compact old revisions: etcdctl compact <revision>. 3) Defrag: etcdctl defrag. 4) Check if too many objects exist (excessive ConfigMaps, Secrets, Events). 5) Set up etcd quota with --quota-backend-bytes. 6) Ensure regular snapshots and lifecycle management.

Q: Pods are stuck in Pending and the scheduler log says "no nodes available". What do you investigate?▼

1) Check node capacity: kubectl describe nodes | grep -A 5 "Allocated resources". 2) Check if resource requests on the pod exceed any node's allocatable capacity. 3) Check taints on nodes: kubectl get nodes -o json | jq '.items[].spec.taints' — pods without matching tolerations won't schedule. 4) Check nodeSelector or affinity rules on the pod spec. 5) Check if PersistentVolumeClaims are bound (unbound PVCs prevent pod scheduling). 6) Consider adding nodes or adjusting resource requests.

🌍 Real-World Use Case

A media streaming company runs 200+ microservices on Kubernetes across 3 availability zones. Their architecture decisions:

3 master nodes with etcd running in a stacked topology for HA
Node pools: separate pools for stateless apps (spot/preemptible instances) and stateful workloads (on-demand)
etcd backups: snapshots every 30 minutes to cloud storage
Monitoring: Prometheus scraping all control plane metrics; alerts on etcd latency, API server request duration, scheduler binding failures
When one AZ had an outage, pods automatically rescheduled to the other two zones — zero user impact

📝 Summary

The Control Plane (API server, etcd, scheduler, controller-manager) makes all cluster decisions
Worker Nodes (kubelet, kube-proxy, container runtime) run the actual workloads
etcd is the single source of truth — back it up religiously
Master failure doesn't stop running pods, but stops all management operations
Understanding architecture is essential for debugging every K8s issue

← Previous What is Kubernetes Next → Pods

← Back to Kubernetes Course