Each challenge gives you a broken YAML or a break command. Your job: figure out what's wrong, diagnose with kubectl, and fix it. Solutions are provided — but try to solve it yourself first!
Break & Fix Lab
Intentionally break Kubernetes deployments, then diagnose and fix them. The best way to learn debugging is by breaking things on purpose.
🧒 Simple Explanation (ELI5)
You know how mechanics learn by taking cars apart and putting them back together? That's what we're doing. We'll break things on purpose — wrong image, bad config, missing labels — and then fix them. After this lab, you'll be able to diagnose real K8s problems in seconds instead of hours.
Setup: Create the Lab Namespace
kubectl create namespace break-fix-lab
🔴 Challenge 1: Wrong Image Name
Scenario: A deployment was created but pods never become Ready.
# challenge-1.yaml — Deploy this broken manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-1
namespace: break-fix-lab
spec:
replicas: 2
selector:
matchLabels:
app: challenge-1
template:
metadata:
labels:
app: challenge-1
spec:
containers:
- name: web
image: ngnix:latest # ← Bug is here
ports:
- containerPort: 80
kubectl apply -f challenge-1.yaml
Pods are stuck. Diagnose why and fix it without deleting the deployment.
Solution
# Step 1: Check pod status kubectl get pods -n break-fix-lab # STATUS: ImagePullBackOff or ErrImagePull # Step 2: Describe the pod kubectl describe pod -l app=challenge-1 -n break-fix-lab # Events show: "Failed to pull image 'ngnix:latest': image not found" # Step 3: Spot the typo — "ngnix" should be "nginx" # Step 4: Fix it kubectl set image deployment/challenge-1 web=nginx:latest -n break-fix-lab # Step 5: Verify kubectl get pods -n break-fix-lab -w # Pods transition to Running
Root cause: Image name typo — ngnix instead of nginx.
🔴 Challenge 2: Mismatched Selector Labels
Scenario: Service is created but has no endpoints. The app is unreachable.
# challenge-2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-2
namespace: break-fix-lab
spec:
replicas: 2
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: web
image: nginx:1.25-alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: challenge-2-svc
namespace: break-fix-lab
spec:
selector:
app: my-app # ← Bug: "my-app" vs "myapp"
ports:
- port: 80
targetPort: 80
kubectl apply -f challenge-2.yaml
Pods are running but the service can't reach them. Find and fix the issue.
Solution
# Step 1: Check endpoints
kubectl get endpoints challenge-2-svc -n break-fix-lab
# ENDPOINTS: <none>
# Step 2: Compare labels
kubectl get pods -n break-fix-lab --show-labels | grep challenge-2
# Labels: app=myapp
kubectl get svc challenge-2-svc -n break-fix-lab -o yaml | grep -A2 selector
# selector: app: my-app ← MISMATCH! "my-app" ≠ "myapp"
# Step 3: Fix the service selector
kubectl patch svc challenge-2-svc -n break-fix-lab \
-p '{"spec":{"selector":{"app":"myapp"}}}'
# Step 4: Verify
kubectl get endpoints challenge-2-svc -n break-fix-lab
# Should now show pod IPs
Root cause: Service selector my-app doesn't match pod label myapp (missing hyphen).
🔴 Challenge 3: CrashLoopBackOff
Scenario: Pod starts but immediately crashes, restarting in a loop.
# challenge-3.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-3
namespace: break-fix-lab
spec:
replicas: 1
selector:
matchLabels:
app: challenge-3
template:
metadata:
labels:
app: challenge-3
spec:
containers:
- name: app
image: busybox:1.36
command: ["cat", "/config/app.conf"] # ← File doesn't exist
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
emptyDir: {} # ← Empty — no app.conf inside
Solution
# Step 1: Check status
kubectl get pods -n break-fix-lab -l app=challenge-3
# STATUS: CrashLoopBackOff, RESTARTS increasing
# Step 2: Check logs
kubectl logs -l app=challenge-3 -n break-fix-lab
# "cat: can't open '/config/app.conf': No such file or directory"
# Step 3: The volume is emptyDir with no content. Supply the file via ConfigMap.
# Fix: Create a ConfigMap with the expected file
kubectl create configmap challenge-3-config \
--from-literal=app.conf="server.port=8080" \
-n break-fix-lab
# Patch the deployment to use ConfigMap volume
kubectl patch deployment challenge-3 -n break-fix-lab --type=json \
-p='[{"op":"replace","path":"/spec/template/spec/volumes/0","value":{"name":"config","configMap":{"name":"challenge-3-config"}}},{"op":"replace","path":"/spec/template/spec/containers/0/command","value":["sh","-c","cat /config/app.conf ; sleep 3600"]}]'
# Verify
kubectl get pods -n break-fix-lab -l app=challenge-3
# STATUS: Running
Root cause: The container expects a file at /config/app.conf but the volume is an empty directory.
🔴 Challenge 4: Missing Secret Reference
Scenario: Pod won't start — stuck in CreateContainerConfigError.
# challenge-4.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-4
namespace: break-fix-lab
spec:
replicas: 1
selector:
matchLabels:
app: challenge-4
template:
metadata:
labels:
app: challenge-4
spec:
containers:
- name: app
image: nginx:1.25-alpine
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials # ← This secret doesn't exist
key: password
Solution
# Step 1: Check status kubectl get pods -n break-fix-lab -l app=challenge-4 # STATUS: CreateContainerConfigError # Step 2: Describe the pod kubectl describe pod -l app=challenge-4 -n break-fix-lab # Warning: "secret 'db-credentials' not found" # Step 3: Create the missing secret kubectl create secret generic db-credentials \ --from-literal=password='MySecurePass123' \ -n break-fix-lab # Step 4: Restart pods to pick up the new secret kubectl rollout restart deployment/challenge-4 -n break-fix-lab # Verify kubectl get pods -n break-fix-lab -l app=challenge-4 # STATUS: Running
Root cause: Pod references a secret that doesn't exist. K8s can't create the container without it.
🔴 Challenge 5: Resource Limit Too Low
Scenario: Pod gets OOMKilled repeatedly.
# challenge-5.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-5
namespace: break-fix-lab
spec:
replicas: 1
selector:
matchLabels:
app: challenge-5
template:
metadata:
labels:
app: challenge-5
spec:
containers:
- name: stress
image: progrium/stress
command: ["stress", "--vm", "1", "--vm-bytes", "128M"]
resources:
limits:
memory: 64Mi # ← Too low for 128M allocation
Solution
# Step 1: Check status
kubectl get pods -n break-fix-lab -l app=challenge-5
# STATUS: OOMKilled, CrashLoopBackOff
# Step 2: Pod describe or previous logs
kubectl describe pod -l app=challenge-5 -n break-fix-lab
# Last State: Terminated, Reason: OOMKilled
# Step 3: The process tries to allocate 128M but limit is 64Mi
# Fix: Increase memory limit
kubectl patch deployment challenge-5 -n break-fix-lab --type=json \
-p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"256Mi"}]'
# Verify
kubectl get pods -n break-fix-lab -l app=challenge-5 -w
Root cause: Memory limit (64Mi) is lower than what the process needs (128M). Kubernetes OOM-kills the container.
🔴 Challenge 6: Liveness Probe Misconfigured
Scenario: Pods keep restarting every few minutes even though the app is healthy.
# challenge-6.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: challenge-6
namespace: break-fix-lab
spec:
replicas: 1
selector:
matchLabels:
app: challenge-6
template:
metadata:
labels:
app: challenge-6
spec:
containers:
- name: web
image: nginx:1.25-alpine
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /healthz # ← nginx returns 404 for /healthz
port: 80
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
Solution
# Step 1: Watch pods — RESTARTS keeps incrementing
kubectl get pods -n break-fix-lab -l app=challenge-6 -w
# Step 2: Describe → Events
kubectl describe pod -l app=challenge-6 -n break-fix-lab
# "Liveness probe failed: HTTP probe failed with statuscode: 404"
# Step 3: nginx doesn't serve /healthz — it serves / which returns 200
# Fix: Change probe path to /
kubectl patch deployment challenge-6 -n break-fix-lab --type=json \
-p='[{"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/httpGet/path","value":"/"}]'
# Verify — RESTARTS should stop incrementing
kubectl get pods -n break-fix-lab -l app=challenge-6 -w
Root cause: Liveness probe checks /healthz but nginx's default config doesn't have that endpoint. Returns 404 → probe fails → K8s restarts the container.
🧹 Cleanup
kubectl delete namespace break-fix-lab
📝 Debugging Cheat Sheet
| Status | First Command | What to Look For |
|---|---|---|
| ImagePullBackOff | kubectl describe pod | Wrong image name/tag, missing registry credentials |
| CrashLoopBackOff | kubectl logs | App error, missing file/env, exit code |
| CreateContainerConfigError | kubectl describe pod | Missing ConfigMap or Secret |
| OOMKilled | kubectl describe pod | Memory limit too low for workload |
| Pending | kubectl describe pod | Insufficient CPU/memory, no matching nodeSelector |
| Running (restarts) | kubectl describe pod | Liveness probe failing |
| Service no endpoints | kubectl get endpoints | Selector labels don't match pod labels |
📝 Summary
- Image errors: Always check the exact image name and tag spelling
- Label mismatches: Services, deployments, and selectors must agree on label keys and values
- CrashLoopBackOff: Always start with
kubectl logs— the answer is usually there - Missing references: Secrets and ConfigMaps must exist before the pod that needs them
- Resource limits: Set them, but set them high enough for your workload
- Health probes: Must match an actual endpoint your app exposes