Hands-on Lesson 12 of 14

Break & Fix Challenges

Intentionally break Helm deployments, then diagnose and fix them. The best way to learn debugging is to cause the bugs yourself.

🧒 Simple Explanation (ELI5)

Doctors learn surgery by practicing on simulations, not just reading textbooks. This lab gives you controlled failure scenarios — you'll deliberately inject broken values, bad templates, wrong versions, and failed upgrades, then fix each one. When these happen in production, you'll know exactly what to do.

🔧 Setup

bash
# Create a base chart for all challenges
helm create breakfix
cd breakfix

# Install a working baseline
helm install breakfix . -n bf --create-namespace
kubectl get pods -n bf  # Should be READY 1/1

# Every challenge starts from this working state

🔴 Challenge 1: Wrong Image Tag

🔴
The Break

Deploy with a non-existent image tag.

bash
# Break it — use a tag that doesn't exist
helm upgrade breakfix . -n bf --set image.tag=v999.999.999

# Observe the failure
kubectl get pods -n bf
kubectl describe pod -n bf | grep -A 3 "Events"
# → ErrImagePull / ImagePullBackOff
💚 Fix (click to reveal)
bash
# Option 1: Rollback to last working revision
helm rollback breakfix 1 -n bf

# Option 2: Upgrade with correct tag
helm upgrade breakfix . -n bf --set image.tag="1.25-alpine"

# Verify
kubectl get pods -n bf  # READY 1/1

🔴 Challenge 2: Invalid YAML in values

🔴
The Break

Pass malformed values that break template rendering.

bash
# Create a broken values file
cat > broken-values.yaml <<EOF
replicaCount: "not-a-number"
service:
  port: abc
EOF

# Try to install
helm upgrade breakfix . -n bf -f broken-values.yaml
# → Error rendering templates
💚 Fix (click to reveal)
bash
# Step 1: Identify the error
helm template test . -f broken-values.yaml 2>&1
# Shows exactly which template and line failed

# Step 2: Fix the values
cat > fixed-values.yaml <<EOF
replicaCount: 2
service:
  port: 80
EOF

# Step 3: Re-deploy with fixed values
helm upgrade breakfix . -n bf -f fixed-values.yaml

# Verify
kubectl get pods -n bf

🔴 Challenge 3: Template Syntax Error

🔴
The Break

Introduce a Go template syntax error.

bash
# Add a broken template
cat > templates/broken.yaml <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "breakfix.fullname" . }}-broken
data:
  value: {{ .Values.missing.nested.key }}
  bad_syntax: {{ if .Values.enabled }
EOF

# Try to render
helm template test .
# → "unexpected "}" in if" or nil pointer error
💚 Fix (click to reveal)
bash
# Fix 1: Use default for missing values
# Fix 2: Close the if block properly

cat > templates/broken.yaml <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "breakfix.fullname" . }}-fixed
data:
  value: {{ .Values.missing | default "fallback" | quote }}
  enabled: {{ if .Values.enabled }}"yes"{{ else }}"no"{{ end }}
EOF

# Verify
helm template test .
helm lint .

🔴 Challenge 4: Stuck in pending-upgrade

🔴
The Break

Simulate an interrupted upgrade that leaves the release in a broken state.

bash
# Simulate: upgrade with a very short timeout and a bad image
helm upgrade breakfix . -n bf \
  --set image.repository=invalid/image \
  --timeout 10s \
  --wait
# → Fails, release might be in "pending-upgrade" or "failed"

# Check status
helm status breakfix -n bf
helm list -n bf  # STATUS column shows "failed" or "pending-upgrade"

# Try another upgrade — may get:
# "Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress"
💚 Fix (click to reveal)
bash
# Option 1: Rollback to last successful revision
helm history breakfix -n bf    # Find the last "deployed" revision
helm rollback breakfix 1 -n bf

# Option 2: If rollback won't work (rare)
# Delete the pending release secret
kubectl get secrets -n bf -l owner=helm,status=pending-upgrade
kubectl delete secret sh.helm.release.v1.breakfix.v3 -n bf

# Then retry the upgrade with correct values
helm upgrade breakfix . -n bf

# Verify clean state
helm list -n bf   # STATUS: deployed

🔴 Challenge 5: Values Not Being Applied

🔴
The Break

You set values but they don't show up in the deployed resources.

bash
# Create a custom values file with a typo in the key name
cat > my-values.yaml <<EOF
replicacount: 5
service:
  Port: 9090
EOF

# Apply (note: these are case-sensitive!)
helm upgrade breakfix . -n bf -f my-values.yaml

# Check — still 1 replica, still port 80!
kubectl get deploy -n bf -o jsonpath='{.items[0].spec.replicas}'
kubectl get svc -n bf -o jsonpath='{.items[0].spec.ports[0].port}'
💚 Fix (click to reveal)
bash
# The issue: YAML keys are case-sensitive!
# "replicacount" ≠ "replicaCount"
# "Port" ≠ "port"

# Check what values Helm actually used:
helm get values breakfix -n bf --all | grep -i replica
helm get values breakfix -n bf --all | grep -i port

# Fix: Use correct case
cat > my-values.yaml <<EOF
replicaCount: 5
service:
  port: 9090
EOF

helm upgrade breakfix . -n bf -f my-values.yaml

# Verify
kubectl get deploy -n bf -o jsonpath='{.items[0].spec.replicas}'  # → 5
kubectl get svc -n bf -o jsonpath='{.items[0].spec.ports[0].port}'  # → 9090

🔴 Challenge 6: Missing Dependency

🔴
The Break

Add a dependency but forget to download it.

bash
# Add dependency to Chart.yaml
cat >> Chart.yaml <<EOF

dependencies:
  - name: redis
    version: "17.15.0"
    repository: "https://charts.bitnami.com/bitnami"
EOF

# Try to install WITHOUT running dependency update
helm upgrade breakfix . -n bf
# → "Error: found in Chart.yaml, but missing in charts/ directory: redis"
💚 Fix (click to reveal)
bash
# Step 1: Add the repo (if not already)
helm repo add bitnami https://charts.bitnami.com/bitnami

# Step 2: Download dependencies
helm dependency update .

# Step 3: Verify
ls charts/      # redis-17.15.0.tgz
helm dependency list .

# Step 4: Re-deploy
helm upgrade breakfix . -n bf

🔴 Challenge 7: Name Collision

🔴
The Break

Try to install a release with an existing name in a different namespace.

bash
# Install in namespace bf (already exists)
# Try installing same release name in bf2
helm install breakfix . -n bf2 --create-namespace
# This actually works! Release names are scoped to namespace.

# But try installing in the SAME namespace:
helm install breakfix . -n bf
# → "Error: INSTALLATION FAILED: cannot re-use a name that is still in use"
💚 Fix (click to reveal)
bash
# Option 1: Use a different release name
helm install breakfix-v2 . -n bf

# Option 2: Upgrade the existing release
helm upgrade breakfix . -n bf

# Option 3: Use upgrade --install (idempotent)
helm upgrade --install breakfix . -n bf

# Key insight: helm upgrade --install is almost always
# preferred over helm install in scripts and CI/CD

🔴 Challenge 8: RBAC Permission Denied

🔴
The Break

Deploy as a user/service account that lacks permissions on the target namespace.

bash
# Create a limited service account
kubectl create namespace restricted
kubectl create serviceaccount helm-deployer -n restricted

# Create a Role with ONLY get/list (no create/update)
kubectl create role viewer --verb=get,list --resource=pods,deployments,services -n restricted
kubectl create rolebinding helm-deployer-viewer --role=viewer \
  --serviceaccount=restricted:helm-deployer -n restricted

# Try to deploy with this limited context
# (simulate by using --as flag if your cluster supports it)
helm install breakfix . -n restricted \
  --set serviceAccount.name=helm-deployer
# → Error: create: failed to create: deployments.apps is forbidden:
#   User cannot create resource "deployments" in namespace "restricted"
🟢 Fix (click to reveal)
bash
# Step 1: Check what permissions are needed
kubectl auth can-i create deployments -n restricted --as system:serviceaccount:restricted:helm-deployer
# no

# Step 2: Create a proper Role with Helm's minimum permissions
kubectl create role helm-deploy \
  --verb=get,list,watch,create,update,patch,delete \
  --resource=pods,deployments,services,configmaps,secrets,serviceaccounts \
  -n restricted

kubectl create rolebinding helm-deployer-deploy \
  --role=helm-deploy \
  --serviceaccount=restricted:helm-deployer \
  -n restricted

# Step 3: Verify permissions
kubectl auth can-i create deployments -n restricted \
  --as system:serviceaccount:restricted:helm-deployer
# yes

# Step 4: Retry deploy
helm install breakfix . -n restricted

# Key insight: Helm needs create/update/delete on ALL resource types
# that appear in your chart templates

🔴 Challenge 9: Resource Quota Exceeded

🔴
The Break

Deploy to a namespace with tight resource quotas that block pod creation.

bash
# Create a namespace with very tight quotas
kubectl create namespace quota-demo
kubectl apply -n quota-demo -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
  name: tight-quota
spec:
  hard:
    requests.cpu: "100m"
    requests.memory: "128Mi"
    limits.cpu: "200m"
    limits.memory: "256Mi"
    pods: "2"
EOF

# Deploy requesting more than the quota allows
helm install breakfix . -n quota-demo \
  --set replicaCount=3 \
  --set resources.requests.cpu=200m \
  --set resources.requests.memory=256Mi \
  --wait --timeout 30s
# → Pods stuck in Pending, helm eventually times out
🟢 Fix (click to reveal)
bash
# Step 1: Find why pods are pending
kubectl get events -n quota-demo --sort-by='.lastTimestamp'
# "exceeded quota: tight-quota, requested: requests.cpu=200m, 
#  used: requests.cpu=0, limited: requests.cpu=100m"

# Step 2: Check current quota usage
kubectl describe resourcequota tight-quota -n quota-demo

# Step 3: Fix by reducing resource requests to fit the quota
helm upgrade --install breakfix . -n quota-demo \
  --set replicaCount=1 \
  --set resources.requests.cpu=50m \
  --set resources.requests.memory=64Mi \
  --set resources.limits.cpu=100m \
  --set resources.limits.memory=128Mi

# Verify
kubectl get pods -n quota-demo
kubectl describe resourcequota tight-quota -n quota-demo

# Key insight: ResourceQuotas are enforced at the K8s API level.
# Helm won't warn you — pods just fail to schedule.
# Use values.schema.json to validate resource requests at helm time.

🧹 Full Cleanup

bash
helm uninstall breakfix -n bf 2>/dev/null
helm uninstall breakfix -n bf2 2>/dev/null
helm uninstall breakfix-v2 -n bf 2>/dev/null
kubectl delete ns bf bf2 2>/dev/null
cd .. && rm -rf breakfix broken-values.yaml fixed-values.yaml my-values.yaml

📝 Summary

← Back to Helm Course