Beginner Lesson 2 of 14

Kubernetes Architecture

Understand every component of a Kubernetes cluster — control plane, worker nodes, and how they communicate.

🧒 Simple Explanation (ELI5)

Think of Kubernetes as an airport:

The control tower never runs flights itself — it coordinates. The runways do the actual work.

🤔 Why Understand Architecture?

You cannot debug Kubernetes effectively without understanding its architecture. When a pod won't schedule, you need to know the scheduler's role. When a node goes down, you need to know what the controller manager does. Architecture knowledge is the foundation for every interview question and production incident.

🔧 Technical Explanation

Control Plane Components

The control plane runs on master node(s) and manages the entire cluster:

ComponentRoleWhat Happens If It Fails
kube-apiserverFront door for all cluster operations. REST API that kubectl talks to. Validates and processes requests.All cluster management stops. kubectl stops working. But running pods continue running.
etcdDistributed key-value store. Stores all cluster state — pod info, configs, secrets, service accounts.Cluster loses its memory. No new operations possible. Existing workloads keep running but can't be managed.
kube-schedulerAssigns pods to nodes based on resource requirements, affinity rules, taints/tolerations.New pods stay in Pending state. Existing pods unaffected.
kube-controller-managerRuns controller loops: Deployment, ReplicaSet, Node, Job controllers. Ensures desired state = actual state.No self-healing. Failed pods aren't replaced. Scaling doesn't work.
cloud-controller-managerIntegrates with cloud provider APIs (load balancers, storage, node management).Cloud-specific features stop (LB provisioning, node auto-repair).

Worker Node Components

Every worker node runs these components:

ComponentRole
kubeletAgent on each node. Receives pod specs from API server, tells the container runtime to start/stop containers, reports node and pod status.
kube-proxyNetwork proxy on each node. Maintains network rules (iptables/IPVS) for Service-based routing. Enables ClusterIP, NodePort communication.
Container RuntimeActually runs containers. Usually containerd or CRI-O. Implements the Container Runtime Interface (CRI).

📊 Visual: Cluster Architecture

Kubernetes Cluster Architecture
Control Plane (Master)
kube-apiserver
etcd
kube-scheduler
controller-manager
cloud-controller-mgr
Worker Node 1
kubelet
kube-proxy
Pod A
Pod B
Worker Node 2
kubelet
kube-proxy
Pod C
Pod D
Request Flow: kubectl apply → Pod Running
kubectl
API Server
etcd (store)
API Server
Scheduler (assign node)
kubelet (start pod)

⌨️ Hands-on: Explore Your Cluster Architecture

bash
# View cluster endpoint information
kubectl cluster-info

# List all nodes and their status
kubectl get nodes -o wide

# See control plane pods (in kube-system namespace)
kubectl get pods -n kube-system

# Describe a node to see capacity, allocatable resources, and conditions
kubectl describe node <node-name>

# Check component statuses (deprecated but educational)
kubectl get componentstatuses

# View all API resources available in the cluster
kubectl api-resources | head -20

# Check etcd health (if you have access)
kubectl get endpoints -n kube-system
bash
# Inspect kubelet on a node (SSH into node first)
systemctl status kubelet

# View kubelet logs
journalctl -u kubelet -f --no-pager | tail -20

# Check container runtime
crictl ps
crictl images
⚠️
Warning

In managed Kubernetes (AKS, EKS, GKE), you typically don't have SSH access to master nodes. The control plane is managed by the cloud provider. You can still inspect worker nodes and kube-system pods.

🐛 Debugging Scenarios

Scenario 1: Master Node Failure

Symptom: kubectl commands hang or return "connection refused".

Impact: Existing pods continue running (kubelet manages them locally). But: no new pods are scheduled, no scaling, no self-healing, no new deployments.

Troubleshooting:

Scenario 2: Worker Node Not Ready

Symptom: kubectl get nodes shows a node as NotReady.

Troubleshooting:

bash
# Check node conditions
kubectl describe node <node-name> | grep -A 10 "Conditions"

# Common causes:
# - kubelet stopped: systemctl restart kubelet
# - Disk pressure: df -h on the node
# - Memory pressure: free -m
# - Network issues: ping the API server from the node
# - Container runtime down: systemctl status containerd

Scenario 3: etcd Data Loss

Impact: Complete cluster state loss. All resource definitions gone.

Prevention: Regular etcd snapshots. In production, run etcd as a 3 or 5 member cluster for HA. Use etcdctl snapshot save for backups.

🎯 Interview Questions

Beginner

Q: What are the main components of a Kubernetes cluster?

A Kubernetes cluster has two parts: the Control Plane (kube-apiserver, etcd, kube-scheduler, kube-controller-manager) and Worker Nodes (kubelet, kube-proxy, container runtime). The control plane makes decisions about scheduling and state management; worker nodes run the actual workloads.

Q: What is etcd and why is it important?

etcd is a distributed, consistent key-value store that serves as Kubernetes' backing store for all cluster data. Every pod definition, service, secret, and config is stored in etcd. If etcd is lost without a backup, the entire cluster state is gone. That's why etcd backups are critical in production.

Q: What does the kubelet do?

The kubelet is an agent that runs on every worker node. It receives pod specifications (PodSpecs) from the API server, ensures the containers described in those specs are running and healthy, and reports node/pod status back to the control plane. It's the bridge between the control plane and the container runtime.

Q: What is the kube-apiserver?

The kube-apiserver is the front-end for the Kubernetes control plane. It exposes the Kubernetes API (REST). All external communication (kubectl, dashboards, CI/CD) and internal communication (scheduler, controller-manager) goes through the API server. It handles authentication, authorization, admission control, and validation.

Q: What is kube-proxy?

kube-proxy is a network proxy running on each node. It maintains network rules (using iptables or IPVS) that allow network communication to pods from inside and outside the cluster. It implements the Kubernetes Service concept — routing traffic to the right pod backends.

Intermediate

Q: What happens when a master node goes down? Do running pods stop?

No — running pods continue running. The kubelet on each worker node manages containers locally. However, no new scheduling, no self-healing (crashed pods aren't replaced), no new deployments, and no scaling will work. kubectl commands will fail. This is why production clusters run multiple master nodes for high availability.

Q: How does the scheduler decide which node to place a pod on?

The scheduler follows a two-phase process: Filtering — eliminates nodes that can't run the pod (insufficient resources, taints, node selectors, affinity rules). Scoring — ranks remaining nodes by criteria like resource balance, data locality, and spreading. The highest-scored node wins. You can influence this with nodeSelector, affinity/anti-affinity, taints/tolerations, and priority classes.

Q: What's the difference between iptables and IPVS mode in kube-proxy?

iptables: Default mode. Creates iptables rules for each Service/endpoint. Simple but performance degrades with thousands of services (O(n) rule matching). IPVS: Uses Linux kernel IPVS (IP Virtual Server). Hash-table based, O(1) lookup. Better performance at scale. Supports more load-balancing algorithms (round-robin, least connections, etc.). Use IPVS for large clusters with 1000+ services.

Q: Why does Kubernetes need etcd to be distributed?

etcd uses the Raft consensus algorithm to maintain data consistency across multiple replicas. A distributed etcd cluster (3 or 5 members) provides high availability — if one member fails, the cluster continues operating as long as a majority (quorum) is alive. A single etcd instance is a single point of failure and unacceptable for production.

Q: What is the Controller Manager pattern?

The kube-controller-manager runs multiple controllers in a single process. Each controller is a control loop that watches the cluster state via the API server and makes changes to move from current state → desired state. Examples: ReplicaSet controller ensures correct replica count; Node controller marks unreachable nodes; Job controller manages batch jobs. This watch-and-reconcile pattern is fundamental to Kubernetes' declarative model.

Scenario-Based

Q: You run 'kubectl get pods' and it hangs. What do you check?

1) Check network connectivity to the API server: curl -k https://<api-server>:6443/healthz. 2) Verify kubeconfig is correct: kubectl config view. 3) Check if API server is running: kubectl get pods -n kube-system (from another context or on the master). 4) Check if etcd is healthy — if etcd is down, API server can't serve requests. 5) For managed K8s: check cloud provider status. 6) Check if it's a DNS issue — try using the IP directly in kubeconfig.

Q: A node shows NotReady status. Walk through your investigation.

1) kubectl describe node <name> — check Conditions section (MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable, Ready). 2) SSH into the node and check kubelet: systemctl status kubelet, journalctl -u kubelet --since "5 min ago". 3) Check container runtime: systemctl status containerd. 4) Check disk: df -h. 5) Check memory: free -m. 6) Check networking: can the node reach the API server? 7) Check for kernel/OS issues: dmesg | tail.

Q: Your company needs a highly available Kubernetes setup. How do you design it?

Run 3 master nodes (odd number for etcd quorum). Use a load balancer in front of API servers. Run etcd as a 3-member cluster (or 5 for larger setups). Spread worker nodes across availability zones. Use Pod Disruption Budgets to ensure minimum replicas during node maintenance. For managed K8s (AKS/EKS), the control plane HA is handled by the provider — focus on multi-AZ node pools and PDB configurations.

Q: etcd is consuming too much disk space. How do you handle it?

etcd compacts revisions automatically, but the space isn't freed until defragmentation. Steps: 1) Check space: etcdctl endpoint status --write-out=table. 2) Compact old revisions: etcdctl compact <revision>. 3) Defrag: etcdctl defrag. 4) Check if too many objects exist (excessive ConfigMaps, Secrets, Events). 5) Set up etcd quota with --quota-backend-bytes. 6) Ensure regular snapshots and lifecycle management.

Q: Pods are stuck in Pending and the scheduler log says "no nodes available". What do you investigate?

1) Check node capacity: kubectl describe nodes | grep -A 5 "Allocated resources". 2) Check if resource requests on the pod exceed any node's allocatable capacity. 3) Check taints on nodes: kubectl get nodes -o json | jq '.items[].spec.taints' — pods without matching tolerations won't schedule. 4) Check nodeSelector or affinity rules on the pod spec. 5) Check if PersistentVolumeClaims are bound (unbound PVCs prevent pod scheduling). 6) Consider adding nodes or adjusting resource requests.

🌍 Real-World Use Case

A media streaming company runs 200+ microservices on Kubernetes across 3 availability zones. Their architecture decisions:

📝 Summary

← Back to Kubernetes Course