Hands-onLesson 15 of 16

🐛 Debugging Docker Container Failures

Master the systematic approach to diagnosing every common Docker failure — crashes, network issues, volume problems, OOM kills, and image pull failures.

🔧 Debugging Framework

Container Failure Diagnosis Flow
1. docker ps -a — check exit code and state
2. docker logs <id> — read stdout/stderr
3. docker inspect <id> — metadata, health, OOM flag
4. docker exec -it <id> /bin/sh — enter and poke around
5. docker run -it --entrypoint /bin/sh <image> — override for crashed containers

🔢 Exit Code Reference

text
# Exit Code Cheat Sheet 0 = Clean exit (process ended intentionally) 1 = Application error (check logs for stack trace) 126 = Permission denied — CMD/ENTRYPOINT not executable 127 = Command not found — executable missing in image 137 = OOMKilled (SIGKILL from OOM — out of memory) 139 = Segmentation fault (SIGSEGV) 143 = Graceful stop (SIGTERM received and handled) 255 = Exit status out of range / abnormal exit

🔥 Scenario 1: Container Exits Immediately (Exit 1)

bash
# Symptom: docker ps -a shows "Exited (1) 2 seconds ago" # Diagnose docker logs <container_id> # Look for: stack traces, missing env vars, missing files, port conflicts # Interactive debug (override the CMD) docker run -it --entrypoint /bin/sh myapp:1.0 node server.js # run it manually to see the full error # Common fixes: # - Missing environment variable: export REQUIRED_VAR=value # - Wrong file path in CMD: docker exec and check ls -la /app # - Port conflict: another process using the port (use -p different port)

🔥 Scenario 2: OOMKilled (Exit 137)

bash
# Symptom: container exits with code 137 / "Killed" docker inspect <id> --format "{{.State.OOMKilled}}" # true # Diagnose: see how much memory it was using docker stats --no-stream <id> # right before it dies # Fixes: # a) Increase memory limit docker run -d --memory 1g myapp:1.0 # b) Cap Node.js heap (Node.js does not respect --memory by default) docker run -e NODE_OPTIONS="--max-old-space-size=800" myapp:1.0 # c) Profile memory leaks in the application # Use node --inspect and connect Chrome DevTools

🔥 Scenario 3: Network — "Connection Refused"

bash
# Symptom: API cannot connect to DB container # Step 1: Are both on same network? docker network inspect myapp-network # Check Containers section — both must be there # Step 2: Is the DB actually running? docker ps | grep db docker logs db # Step 3: Test DNS from inside the API container docker exec -it api nslookup db docker exec -it api ping db docker exec -it api wget -O- http://db:5432 # or curl # Step 4: Check DB port is listening docker exec -it db netstat -tlnp | grep 5432 # Fix: ensure both containers use same user-defined network docker network create myapp-net docker run -d --name db --network myapp-net postgres:15 docker run -d --name api --network myapp-net myapi:latest

🔥 Scenario 4: ImagePullBackOff

bash
# Symptom: docker pull fails with "unauthorized" or "not found" # Check the exact error docker pull myacr.azurecr.io/myapp:1.0 # Fix: re-authenticate az acr login --name myacr # Azure docker login # Docker Hub # Verify image exists in registry az acr repository show-tags --name myacr --repository myapp # Verify full image name and tag (typos are common) docker pull myacr.azurecr.io/myapp:1.2.3 # typo in tag?

🔥 Scenario 5: Volume Mount Empty

bash
# Symptom: app can't find config files — mount appears empty # Check what is actually mounted docker inspect myapp --format "{{json .Mounts}}" # Exec in and check the path docker exec -it myapp ls -la /etc/myapp/ # Common causes: # - Relative path in bind mount — must be absolute # Wrong: -v ./config:/etc/myapp # Right: -v $(pwd)/config:/etc/myapp # Linux/Mac # Right: -v ${PWD}/config:/etc/myapp # PowerShell # Right: -v C:\Users\me\config:/etc/myapp # Windows absolute path

🎯 Practice Exercises

  1. Intentionally create a container with a missing environment variable — diagnose and fix it.
  2. Run a container with --memory 10m on a Node.js app — watch it OOMKill.
  3. Start two containers: one on the default bridge, one on a custom network — try to connect them and observe the failure.
  4. Build an image with a typo in the CMD path — diagnose with docker logs and --entrypoint /bin/sh.