Hands-on Lesson 12 of 14

Break & Fix Lab

Intentionally break Kubernetes deployments, then diagnose and fix them. The best way to learn debugging is by breaking things on purpose.

๐Ÿง’ Simple Explanation (ELI5)

You know how mechanics learn by taking cars apart and putting them back together? That's what we're doing. We'll break things on purpose โ€” wrong image, bad config, missing labels โ€” and then fix them. After this lab, you'll be able to diagnose real K8s problems in seconds instead of hours.

๐Ÿ’ก
How This Lab Works

Each challenge gives you a broken YAML or a break command. Your job: figure out what's wrong, diagnose with kubectl, and fix it. Solutions are provided โ€” but try to solve it yourself first!

Setup: Create the Lab Namespace

bash
kubectl create namespace break-fix-lab

๐Ÿ”ด Challenge 1: Wrong Image Name

Scenario: A deployment was created but pods never become Ready.

yaml
# challenge-1.yaml โ€” Deploy this broken manifest
apiVersion: apps/v1
kind: Deployment
metadata:
  name: challenge-1
  namespace: break-fix-lab
spec:
  replicas: 2
  selector:
    matchLabels:
      app: challenge-1
  template:
    metadata:
      labels:
        app: challenge-1
    spec:
      containers:
        - name: web
          image: ngnix:latest    # โ† Bug is here
          ports:
            - containerPort: 80
bash
kubectl apply -f challenge-1.yaml
๐Ÿค”
Your Task

Pods are stuck. Diagnose why and fix it without deleting the deployment.

Solution
bash
# Step 1: Check pod status
kubectl get pods -n break-fix-lab
# STATUS: ImagePullBackOff or ErrImagePull

# Step 2: Describe the pod
kubectl describe pod -l app=challenge-1 -n break-fix-lab
# Events show: "Failed to pull image 'ngnix:latest': image not found"

# Step 3: Spot the typo โ€” "ngnix" should be "nginx"

# Step 4: Fix it
kubectl set image deployment/challenge-1 web=nginx:latest -n break-fix-lab

# Step 5: Verify
kubectl get pods -n break-fix-lab -w
# Pods transition to Running

Root cause: Image name typo โ€” ngnix instead of nginx.

๐Ÿ”ด Challenge 2: Mismatched Selector Labels

Scenario: Service is created but has no endpoints. The app is unreachable.

yaml
# challenge-2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: challenge-2
  namespace: break-fix-lab
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: web
          image: nginx:1.25-alpine
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: challenge-2-svc
  namespace: break-fix-lab
spec:
  selector:
    app: my-app         # โ† Bug: "my-app" vs "myapp"
  ports:
    - port: 80
      targetPort: 80
bash
kubectl apply -f challenge-2.yaml
๐Ÿค”
Your Task

Pods are running but the service can't reach them. Find and fix the issue.

Solution
bash
# Step 1: Check endpoints
kubectl get endpoints challenge-2-svc -n break-fix-lab
# ENDPOINTS: <none>

# Step 2: Compare labels
kubectl get pods -n break-fix-lab --show-labels | grep challenge-2
# Labels: app=myapp

kubectl get svc challenge-2-svc -n break-fix-lab -o yaml | grep -A2 selector
# selector: app: my-app    โ† MISMATCH! "my-app" โ‰  "myapp"

# Step 3: Fix the service selector
kubectl patch svc challenge-2-svc -n break-fix-lab \
  -p '{"spec":{"selector":{"app":"myapp"}}}'

# Step 4: Verify
kubectl get endpoints challenge-2-svc -n break-fix-lab
# Should now show pod IPs

Root cause: Service selector my-app doesn't match pod label myapp (missing hyphen).

๐Ÿ”ด Challenge 3: CrashLoopBackOff

Scenario: Pod starts but immediately crashes, restarting in a loop.

yaml
# challenge-3.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: challenge-3
  namespace: break-fix-lab
spec:
  replicas: 1
  selector:
    matchLabels:
      app: challenge-3
  template:
    metadata:
      labels:
        app: challenge-3
    spec:
      containers:
        - name: app
          image: busybox:1.36
          command: ["cat", "/config/app.conf"]   # โ† File doesn't exist
          volumeMounts:
            - name: config
              mountPath: /config
      volumes:
        - name: config
          emptyDir: {}    # โ† Empty โ€” no app.conf inside
Solution
bash
# Step 1: Check status
kubectl get pods -n break-fix-lab -l app=challenge-3
# STATUS: CrashLoopBackOff, RESTARTS increasing

# Step 2: Check logs
kubectl logs -l app=challenge-3 -n break-fix-lab
# "cat: can't open '/config/app.conf': No such file or directory"

# Step 3: The volume is emptyDir with no content. Supply the file via ConfigMap.

# Fix: Create a ConfigMap with the expected file
kubectl create configmap challenge-3-config \
  --from-literal=app.conf="server.port=8080" \
  -n break-fix-lab

# Patch the deployment to use ConfigMap volume
kubectl patch deployment challenge-3 -n break-fix-lab --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/volumes/0","value":{"name":"config","configMap":{"name":"challenge-3-config"}}},{"op":"replace","path":"/spec/template/spec/containers/0/command","value":["sh","-c","cat /config/app.conf ; sleep 3600"]}]'

# Verify
kubectl get pods -n break-fix-lab -l app=challenge-3
# STATUS: Running

Root cause: The container expects a file at /config/app.conf but the volume is an empty directory.

๐Ÿ”ด Challenge 4: Missing Secret Reference

Scenario: Pod won't start โ€” stuck in CreateContainerConfigError.

yaml
# challenge-4.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: challenge-4
  namespace: break-fix-lab
spec:
  replicas: 1
  selector:
    matchLabels:
      app: challenge-4
  template:
    metadata:
      labels:
        app: challenge-4
    spec:
      containers:
        - name: app
          image: nginx:1.25-alpine
          env:
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: db-credentials     # โ† This secret doesn't exist
                  key: password
Solution
bash
# Step 1: Check status
kubectl get pods -n break-fix-lab -l app=challenge-4
# STATUS: CreateContainerConfigError

# Step 2: Describe the pod
kubectl describe pod -l app=challenge-4 -n break-fix-lab
# Warning: "secret 'db-credentials' not found"

# Step 3: Create the missing secret
kubectl create secret generic db-credentials \
  --from-literal=password='MySecurePass123' \
  -n break-fix-lab

# Step 4: Restart pods to pick up the new secret
kubectl rollout restart deployment/challenge-4 -n break-fix-lab

# Verify
kubectl get pods -n break-fix-lab -l app=challenge-4
# STATUS: Running

Root cause: Pod references a secret that doesn't exist. K8s can't create the container without it.

๐Ÿ”ด Challenge 5: Resource Limit Too Low

Scenario: Pod gets OOMKilled repeatedly.

yaml
# challenge-5.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: challenge-5
  namespace: break-fix-lab
spec:
  replicas: 1
  selector:
    matchLabels:
      app: challenge-5
  template:
    metadata:
      labels:
        app: challenge-5
    spec:
      containers:
        - name: stress
          image: progrium/stress
          command: ["stress", "--vm", "1", "--vm-bytes", "128M"]
          resources:
            limits:
              memory: 64Mi    # โ† Too low for 128M allocation
Solution
bash
# Step 1: Check status
kubectl get pods -n break-fix-lab -l app=challenge-5
# STATUS: OOMKilled, CrashLoopBackOff

# Step 2: Pod describe or previous logs
kubectl describe pod -l app=challenge-5 -n break-fix-lab
# Last State: Terminated, Reason: OOMKilled

# Step 3: The process tries to allocate 128M but limit is 64Mi

# Fix: Increase memory limit
kubectl patch deployment challenge-5 -n break-fix-lab --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"256Mi"}]'

# Verify
kubectl get pods -n break-fix-lab -l app=challenge-5 -w

Root cause: Memory limit (64Mi) is lower than what the process needs (128M). Kubernetes OOM-kills the container.

๐Ÿ”ด Challenge 6: Liveness Probe Misconfigured

Scenario: Pods keep restarting every few minutes even though the app is healthy.

yaml
# challenge-6.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: challenge-6
  namespace: break-fix-lab
spec:
  replicas: 1
  selector:
    matchLabels:
      app: challenge-6
  template:
    metadata:
      labels:
        app: challenge-6
    spec:
      containers:
        - name: web
          image: nginx:1.25-alpine
          ports:
            - containerPort: 80
          livenessProbe:
            httpGet:
              path: /healthz    # โ† nginx returns 404 for /healthz
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 2
Solution
bash
# Step 1: Watch pods โ€” RESTARTS keeps incrementing
kubectl get pods -n break-fix-lab -l app=challenge-6 -w

# Step 2: Describe โ†’ Events
kubectl describe pod -l app=challenge-6 -n break-fix-lab
# "Liveness probe failed: HTTP probe failed with statuscode: 404"

# Step 3: nginx doesn't serve /healthz โ€” it serves / which returns 200

# Fix: Change probe path to /
kubectl patch deployment challenge-6 -n break-fix-lab --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/httpGet/path","value":"/"}]'

# Verify โ€” RESTARTS should stop incrementing
kubectl get pods -n break-fix-lab -l app=challenge-6 -w

Root cause: Liveness probe checks /healthz but nginx's default config doesn't have that endpoint. Returns 404 โ†’ probe fails โ†’ K8s restarts the container.

๐Ÿงน Cleanup

bash
kubectl delete namespace break-fix-lab

๐Ÿ“ Debugging Cheat Sheet

StatusFirst CommandWhat to Look For
ImagePullBackOffkubectl describe podWrong image name/tag, missing registry credentials
CrashLoopBackOffkubectl logsApp error, missing file/env, exit code
CreateContainerConfigErrorkubectl describe podMissing ConfigMap or Secret
OOMKilledkubectl describe podMemory limit too low for workload
Pendingkubectl describe podInsufficient CPU/memory, no matching nodeSelector
Running (restarts)kubectl describe podLiveness probe failing
Service no endpointskubectl get endpointsSelector labels don't match pod labels

๐Ÿ“ Summary

โ† Back to Kubernetes Course