Advanced Lesson 9 of 14

Hooks & Tests

Run jobs at specific lifecycle points (pre-install, post-upgrade) and validate releases with test hooks.

🧒 Simple Explanation (ELI5)

When you install an app, sometimes you need to run a task before or after — like running a database migration before the app starts, or sending a notification after deployment completes. Hooks let you say "run this Job at exactly this point in the release lifecycle." Tests let you verify the deployment actually works.

🔧 Technical Explanation

What Are Hooks?

Hooks are regular Kubernetes resources (usually Jobs) with special annotations that tell Helm when to create them during the release lifecycle.

yaml
# templates/pre-install-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "myapp.fullname" . }}-db-migrate
  annotations:
    "helm.sh/hook": pre-install,pre-upgrade
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": before-hook-creation
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: migrate
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          command: ["npm", "run", "migrate"]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: {{ include "myapp.fullname" . }}-secret
                  key: database-url

Hook Types

HookWhenUse Case
pre-installAfter templates render, before any resources createdDB migration, create secrets
post-installAfter all resources are loadedNotifications, seed data
pre-upgradeAfter templates render, before upgradeDB migration, backup
post-upgradeAfter upgrade completesCache warm-up, notifications
pre-deleteBefore any resources deletedBackup, drain connections
post-deleteAfter all resources deletedCleanup external resources
pre-rollbackBefore rollbackBackup current state
post-rollbackAfter rollbackVerify rollback, notify
testWhen helm test is runSmoke tests, connectivity checks
Install Lifecycle with Hooks
helm install
pre-install hooks (sorted by weight)
Create resources
post-install hooks

Hook Weight

Controls execution order when multiple hooks share the same event. Lower weight runs first. Default is 0.

yaml
"helm.sh/hook-weight": "-5"   # Runs before weight "0" or "10"
"helm.sh/hook-weight": "0"    # Default
"helm.sh/hook-weight": "10"   # Runs last

Hook Delete Policies

PolicyBehavior
before-hook-creationDelete previous hook resource before creating new one (recommended)
hook-succeededDelete hook after it succeeds
hook-failedDelete hook if it fails
⚠️
Hook Gotcha

Without a delete policy, old hook Jobs accumulate. Use before-hook-creation to automatically clean up the previous hook before re-running. If a hook Job fails, the entire helm install/upgrade fails.

--atomic + Hooks: The Full Picture

When --atomic is combined with hooks, the behavior depends on when the hook runs:

text
# Scenario: pre-upgrade hook (DB migration) fails

helm upgrade myapp . -n prod --atomic --timeout 5m

# Timeline:
# 1. Helm renders templates           ✓
# 2. pre-upgrade hook Job created      ✓
# 3. Hook Job fails (BackoffLimitExceeded)  ✗
# 4. --atomic kicks in:
#    a) Helm does NOT create new Deployment/Service (pre-upgrade failed)
#    b) Helm rolls back to previous revision
#    c) Release status: "deployed" (on the previous revision)
# 5. The failed hook Job may still exist (depends on delete-policy)
Critical: Hook Cleanup After --atomic Rollback

When --atomic rolls back, the failed hook Job is NOT automatically deleted (unless you have hook-failed in your delete policy). You must clean it up manually before the next attempt, or use before-hook-creation policy which handles this automatically.

Deep-Dive: Investigating a Failed Hook Job

bash
# Step 1: Find the hook Job
kubectl get jobs -n prod -l app.kubernetes.io/managed-by=Helm

# Step 2: Check Job status and completions
kubectl describe job myapp-db-migrate -n prod
# Look for:
#   Completions: 0/1
#   Pods Statuses: 0 Active / 0 Succeeded / 3 Failed
#   Events: BackoffLimitExceeded

# Step 3: Get the pod logs (the pod name includes the Job name)
kubectl get pods -n prod -l job-name=myapp-db-migrate
kubectl logs myapp-db-migrate-xxxxx -n prod
# Common errors:
#   - Connection refused (DB not reachable from this pod)
#   - Permission denied (wrong DB credentials)
#   - Migration syntax error (bad SQL)

# Step 4: Test DB connectivity from the namespace
kubectl run db-test --rm -it --image=postgres:15-alpine -n prod -- \
  psql -h myapp-postgresql -U postgres -c "SELECT 1"

# Step 5: Fix and retry
kubectl delete job myapp-db-migrate -n prod  # Clean up failed Job
helm upgrade myapp . -n prod --atomic --timeout 5m

Helm Tests

Test hooks run when you invoke helm test <release>. They validate that the release is working.

yaml
# templates/tests/test-connection.yaml
apiVersion: v1
kind: Pod
metadata:
  name: {{ include "myapp.fullname" . }}-test
  annotations:
    "helm.sh/hook": test
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
spec:
  restartPolicy: Never
  containers:
    - name: wget
      image: busybox
      command: ['wget']
      args: ['{{ include "myapp.fullname" . }}:{{ .Values.service.port }}']
---
# More thorough API test
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "myapp.fullname" . }}-api-test
  annotations:
    "helm.sh/hook": test
    "helm.sh/hook-delete-policy": before-hook-creation
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: api-test
          image: curlimages/curl:latest
          command:
            - sh
            - -c
            - |
              curl -sf http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/health || exit 1
              echo "Health check passed"
bash
# Run tests after install
helm install myapp ./myapp -n demo --create-namespace
helm test myapp -n demo

# View test logs
helm test myapp -n demo --logs

# Output:
# NAME: myapp
# LAST DEPLOYED: ...
# STATUS: deployed
# TEST SUITE:     myapp-test
# Last Started:   ...
# Last Completed: ...
# Phase:          Succeeded

⌨️ Hands-on

bash
# Lab: Add hooks and tests to a chart

# Step 1: Create base chart
helm create hooklab
cd hooklab

# Step 2: Add a pre-install hook
cat > templates/pre-install-init.yaml <<'EOF'
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "hooklab.fullname" . }}-init
  annotations:
    "helm.sh/hook": pre-install
    "helm.sh/hook-weight": "0"
    "helm.sh/hook-delete-policy": before-hook-creation
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: init
          image: busybox
          command: ["sh", "-c", "echo 'Running pre-install init...' && sleep 5 && echo 'Done!'"]
EOF

# Step 3: Add a post-install hook
cat > templates/post-install-notify.yaml <<'EOF'
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "hooklab.fullname" . }}-notify
  annotations:
    "helm.sh/hook": post-install,post-upgrade
    "helm.sh/hook-weight": "5"
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: notify
          image: busybox
          command: ["sh", "-c", "echo 'Deployment complete!'"]
EOF

# Step 4: Install and watch hooks run
helm install hooklab . -n demo --create-namespace --debug

# Step 5: Check test
helm test hooklab -n demo --logs

# Step 6: Upgrade (triggers pre-upgrade if hooks are set)
helm upgrade hooklab . -n demo --set replicaCount=2

# Cleanup
helm uninstall hooklab -n demo

🐛 Debugging Scenarios

Scenario 1: Hook Job times out

bash
# "Error: timed out waiting for the condition"
# The hook Job didn't complete within the timeout

# Increase timeout
helm install myapp . --timeout 10m

# Check hook Job logs
kubectl get jobs -n demo
kubectl logs job/myapp-db-migrate -n demo

# Common causes:
# - Job command hangs (DB not reachable)
# - Wrong image or command
# - Missing environment variables

Scenario 2: Old hook Jobs preventing new install

bash
# "Job already exists" error
# Add delete policy to your hook:
"helm.sh/hook-delete-policy": before-hook-creation

# Or manually clean up:
kubectl delete job myapp-db-migrate -n demo

# Then retry:
helm upgrade myapp . -n demo

Scenario 3: Helm test fails

bash
# "FAILED" test output
# Step 1: Check test pod logs
helm test myapp -n demo --logs

# Step 2: Check if service is reachable
kubectl get svc -n demo
kubectl get endpoints -n demo

# Step 3: Run the test command manually
kubectl run test-pod --rm -it --image=busybox -n demo -- \
  wget -qO- myapp:80

# Common causes:
# - Service not yet ready (add readiness probes)
# - Wrong port in test
# - Network policy blocking test pod

🎯 Interview Questions

Beginner

Q: What is a Helm hook?

A Kubernetes resource (usually a Job) with the helm.sh/hook annotation that runs at a specific point in the release lifecycle — before install, after upgrade, before delete, etc. Hooks are not managed as part of the release; they're executed and optionally cleaned up.

Q: What is the most common use case for hooks?

Database migrations as pre-install / pre-upgrade hooks. The migration Job runs before the app containers start, ensuring the database schema matches the new code. If migration fails, the install/upgrade is aborted.

Q: How do you run Helm tests?

helm test <release-name>. It creates resources with the helm.sh/hook: test annotation and waits for them to complete. Use --logs to see output. Tests are typically simple connectivity checks or API health probes.

Q: What is hook-weight?

An annotation that controls execution order among hooks of the same type. Lower weight runs first: weight "-5" runs before "0" before "10". Default is 0. Useful when you need to create a secret before running a migration that uses it.

Q: What happens if a hook fails?

The entire helm operation (install/upgrade) fails. The release goes to a "failed" state. Resources from that install are not created (for pre-install) or may need manual cleanup. Use --atomic to automatically rollback on hook failure.

Intermediate

Q: What is the difference between a hook Job and a regular Job in Helm?

Regular Jobs are managed as part of the release (tracked, upgraded, deleted with the release). Hook Jobs are external to the release lifecycle — they run at the specified event and are managed by their delete policy. Hooks are not shown in helm get manifest unless they have no delete policy.

Q: Explain the three hook-delete-policies.

before-hook-creation: Delete old hook resource before creating a new one (prevents "already exists" errors). hook-succeeded: Delete after success (keeps failures for debugging). hook-failed: Delete after failure. You can combine them. Most common: before-hook-creation.

Q: Can a hook be something other than a Job?

Yes. Any Kubernetes resource can be a hook — ConfigMaps, Secrets, Pods, even Deployments. But Jobs are the most useful because they run to completion. A pre-install ConfigMap hook is useful for creating configuration that other hooks or resources need.

Q: How does --atomic interact with hooks?

With --atomic, if any hook fails, Helm automatically rolls back the entire release to the previous revision. Without it, a failed hook leaves the release in "failed" state requiring manual intervention. Always use --atomic in CI/CD pipelines.

Q: How do you debug a hook that runs during install?

1) Use --debug flag for verbose output. 2) Check Job status: kubectl get jobs. 3) Read logs: kubectl logs job/<hook-name>. 4) If using hook-succeeded delete policy, the Job is gone — temporarily remove the policy to debug. 5) Use helm template to verify the hook manifest.

Scenario-Based

Q: Your app requires a DB migration before the new version starts. How do you set this up?

Create a Job template with annotations: helm.sh/hook: pre-install,pre-upgrade, helm.sh/hook-weight: "-5" (run before other hooks), helm.sh/hook-delete-policy: before-hook-creation. The Job container runs the migration command. If it fails, the release fails and code doesn't deploy.

Q: After every deployment, you need to warm up caches. How?

Create a post-install/post-upgrade hook Job that calls the cache warm-up endpoint: curl http://myapp:80/api/cache/warmup. Use a higher weight (e.g., "10") so it runs after other post-install hooks. Use hook-succeeded delete policy to clean up after success.

Q: Your helm test passes locally but fails in CI. What could be wrong?

Common causes: 1) Service not ready yet — add --timeout and ensure readiness probes are configured. 2) Network policies in CI blocking test pod traffic. 3) Test image not pullable from CI cluster (registry auth). 4) Different namespace or service name in CI. Debug with helm test --logs and kubectl describe pod.

Q: You need to create an external resource (DNS record) before install. How?

pre-install hook Job that calls the DNS API (e.g., using curl or a custom CLI tool). Pass API credentials via a Secret (already existing, not managed by this chart). Set a low hook-weight so it runs first. For cleanup, use a pre-delete hook that removes the DNS record.

Q: Hook Jobs keep accumulating, consuming resources. How do you fix this?

Add helm.sh/hook-delete-policy: before-hook-creation — deletes the old Job before creating a new one. Or combine policies: before-hook-creation,hook-succeeded. Also set ttlSecondsAfterFinished: 300 on the Job spec as a safety net. For existing clutter: kubectl delete jobs -l helm.sh/hook -n <namespace>.

🌍 Real-World Use Case

An e-commerce platform uses hooks at every stage:

🔗
K8s Connection: Hooks are Just K8s Resources

Helm hooks are regular Kubernetes Jobs — they follow all K8s rules: they need RBAC permissions, they're subject to ResourceQuotas, and they run as Pods that can be affected by NetworkPolicies. When debugging hook failures, use the same kubectl describe job / kubectl logs workflow you'd use for any K8s workload. Remember: hook pods need access to whatever service they interact with (DB, APIs) — check ServiceAccount permissions and network access.

📝 Summary

← Back to Helm Course