Each challenge gives you a broken YAML or a break command. Your job: figure out what's wrong, diagnose with kubectl, and fix it. Solutions are provided โ but try to solve it yourself first!
Break & Fix Lab
Intentionally break Kubernetes deployments, then diagnose and fix them. The best way to learn debugging is by breaking things on purpose.
๐ง Simple Explanation (ELI5)
You know how mechanics learn by taking cars apart and putting them back together? That's what we're doing. We'll break things on purpose โ wrong image, bad config, missing labels โ and then fix them. After this lab, you'll be able to diagnose real K8s problems in seconds instead of hours.
Setup: Create the Lab Namespace
kubectl create namespace break-fix-lab
๐ด Challenge 1: Wrong Image Name
Scenario: A deployment was created but pods never become Ready.
# challenge-1.yaml โ Deploy this broken manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-1
namespace: break-fix-lab
spec:
replicas: 2
selector:
matchLabels:
app: challenge-1
template:
metadata:
labels:
app: challenge-1
spec:
containers:
- name: web
image: ngnix:latest # โ Bug is here
ports:
- containerPort: 80
kubectl apply -f challenge-1.yaml
Pods are stuck. Diagnose why and fix it without deleting the deployment.
Solution
# Step 1: Check pod status kubectl get pods -n break-fix-lab # STATUS: ImagePullBackOff or ErrImagePull # Step 2: Describe the pod kubectl describe pod -l app=challenge-1 -n break-fix-lab # Events show: "Failed to pull image 'ngnix:latest': image not found" # Step 3: Spot the typo โ "ngnix" should be "nginx" # Step 4: Fix it kubectl set image deployment/challenge-1 web=nginx:latest -n break-fix-lab # Step 5: Verify kubectl get pods -n break-fix-lab -w # Pods transition to Running
Root cause: Image name typo โ ngnix instead of nginx.
๐ด Challenge 2: Mismatched Selector Labels
Scenario: Service is created but has no endpoints. The app is unreachable.
# challenge-2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-2
namespace: break-fix-lab
spec:
replicas: 2
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: web
image: nginx:1.25-alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: challenge-2-svc
namespace: break-fix-lab
spec:
selector:
app: my-app # โ Bug: "my-app" vs "myapp"
ports:
- port: 80
targetPort: 80
kubectl apply -f challenge-2.yaml
Pods are running but the service can't reach them. Find and fix the issue.
Solution
# Step 1: Check endpoints
kubectl get endpoints challenge-2-svc -n break-fix-lab
# ENDPOINTS: <none>
# Step 2: Compare labels
kubectl get pods -n break-fix-lab --show-labels | grep challenge-2
# Labels: app=myapp
kubectl get svc challenge-2-svc -n break-fix-lab -o yaml | grep -A2 selector
# selector: app: my-app โ MISMATCH! "my-app" โ "myapp"
# Step 3: Fix the service selector
kubectl patch svc challenge-2-svc -n break-fix-lab \
-p '{"spec":{"selector":{"app":"myapp"}}}'
# Step 4: Verify
kubectl get endpoints challenge-2-svc -n break-fix-lab
# Should now show pod IPs
Root cause: Service selector my-app doesn't match pod label myapp (missing hyphen).
๐ด Challenge 3: CrashLoopBackOff
Scenario: Pod starts but immediately crashes, restarting in a loop.
# challenge-3.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-3
namespace: break-fix-lab
spec:
replicas: 1
selector:
matchLabels:
app: challenge-3
template:
metadata:
labels:
app: challenge-3
spec:
containers:
- name: app
image: busybox:1.36
command: ["cat", "/config/app.conf"] # โ File doesn't exist
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
emptyDir: {} # โ Empty โ no app.conf inside
Solution
# Step 1: Check status
kubectl get pods -n break-fix-lab -l app=challenge-3
# STATUS: CrashLoopBackOff, RESTARTS increasing
# Step 2: Check logs
kubectl logs -l app=challenge-3 -n break-fix-lab
# "cat: can't open '/config/app.conf': No such file or directory"
# Step 3: The volume is emptyDir with no content. Supply the file via ConfigMap.
# Fix: Create a ConfigMap with the expected file
kubectl create configmap challenge-3-config \
--from-literal=app.conf="server.port=8080" \
-n break-fix-lab
# Patch the deployment to use ConfigMap volume
kubectl patch deployment challenge-3 -n break-fix-lab --type=json \
-p='[{"op":"replace","path":"/spec/template/spec/volumes/0","value":{"name":"config","configMap":{"name":"challenge-3-config"}}},{"op":"replace","path":"/spec/template/spec/containers/0/command","value":["sh","-c","cat /config/app.conf ; sleep 3600"]}]'
# Verify
kubectl get pods -n break-fix-lab -l app=challenge-3
# STATUS: Running
Root cause: The container expects a file at /config/app.conf but the volume is an empty directory.
๐ด Challenge 4: Missing Secret Reference
Scenario: Pod won't start โ stuck in CreateContainerConfigError.
# challenge-4.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-4
namespace: break-fix-lab
spec:
replicas: 1
selector:
matchLabels:
app: challenge-4
template:
metadata:
labels:
app: challenge-4
spec:
containers:
- name: app
image: nginx:1.25-alpine
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials # โ This secret doesn't exist
key: password
Solution
# Step 1: Check status kubectl get pods -n break-fix-lab -l app=challenge-4 # STATUS: CreateContainerConfigError # Step 2: Describe the pod kubectl describe pod -l app=challenge-4 -n break-fix-lab # Warning: "secret 'db-credentials' not found" # Step 3: Create the missing secret kubectl create secret generic db-credentials \ --from-literal=password='MySecurePass123' \ -n break-fix-lab # Step 4: Restart pods to pick up the new secret kubectl rollout restart deployment/challenge-4 -n break-fix-lab # Verify kubectl get pods -n break-fix-lab -l app=challenge-4 # STATUS: Running
Root cause: Pod references a secret that doesn't exist. K8s can't create the container without it.
๐ด Challenge 5: Resource Limit Too Low
Scenario: Pod gets OOMKilled repeatedly.
# challenge-5.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-5
namespace: break-fix-lab
spec:
replicas: 1
selector:
matchLabels:
app: challenge-5
template:
metadata:
labels:
app: challenge-5
spec:
containers:
- name: stress
image: progrium/stress
command: ["stress", "--vm", "1", "--vm-bytes", "128M"]
resources:
limits:
memory: 64Mi # โ Too low for 128M allocation
Solution
# Step 1: Check status
kubectl get pods -n break-fix-lab -l app=challenge-5
# STATUS: OOMKilled, CrashLoopBackOff
# Step 2: Pod describe or previous logs
kubectl describe pod -l app=challenge-5 -n break-fix-lab
# Last State: Terminated, Reason: OOMKilled
# Step 3: The process tries to allocate 128M but limit is 64Mi
# Fix: Increase memory limit
kubectl patch deployment challenge-5 -n break-fix-lab --type=json \
-p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"256Mi"}]'
# Verify
kubectl get pods -n break-fix-lab -l app=challenge-5 -w
Root cause: Memory limit (64Mi) is lower than what the process needs (128M). Kubernetes OOM-kills the container.
๐ด Challenge 6: Liveness Probe Misconfigured
Scenario: Pods keep restarting every few minutes even though the app is healthy.
# challenge-6.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-6
namespace: break-fix-lab
spec:
replicas: 1
selector:
matchLabels:
app: challenge-6
template:
metadata:
labels:
app: challenge-6
spec:
containers:
- name: web
image: nginx:1.25-alpine
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /healthz # โ nginx returns 404 for /healthz
port: 80
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
Solution
# Step 1: Watch pods โ RESTARTS keeps incrementing
kubectl get pods -n break-fix-lab -l app=challenge-6 -w
# Step 2: Describe โ Events
kubectl describe pod -l app=challenge-6 -n break-fix-lab
# "Liveness probe failed: HTTP probe failed with statuscode: 404"
# Step 3: nginx doesn't serve /healthz โ it serves / which returns 200
# Fix: Change probe path to /
kubectl patch deployment challenge-6 -n break-fix-lab --type=json \
-p='[{"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/httpGet/path","value":"/"}]'
# Verify โ RESTARTS should stop incrementing
kubectl get pods -n break-fix-lab -l app=challenge-6 -w
Root cause: Liveness probe checks /healthz but nginx's default config doesn't have that endpoint. Returns 404 โ probe fails โ K8s restarts the container.
๐งน Cleanup
kubectl delete namespace break-fix-lab
๐ Debugging Cheat Sheet
| Status | First Command | What to Look For |
|---|---|---|
| ImagePullBackOff | kubectl describe pod | Wrong image name/tag, missing registry credentials |
| CrashLoopBackOff | kubectl logs | App error, missing file/env, exit code |
| CreateContainerConfigError | kubectl describe pod | Missing ConfigMap or Secret |
| OOMKilled | kubectl describe pod | Memory limit too low for workload |
| Pending | kubectl describe pod | Insufficient CPU/memory, no matching nodeSelector |
| Running (restarts) | kubectl describe pod | Liveness probe failing |
| Service no endpoints | kubectl get endpoints | Selector labels don't match pod labels |
๐ Summary
- Image errors: Always check the exact image name and tag spelling
- Label mismatches: Services, deployments, and selectors must agree on label keys and values
- CrashLoopBackOff: Always start with
kubectl logsโ the answer is usually there - Missing references: Secrets and ConfigMaps must exist before the pod that needs them
- Resource limits: Set them, but set them high enough for your workload
- Health probes: Must match an actual endpoint your app exposes