🐛 Debugging Docker Container Failures
Master the systematic approach to diagnosing every common Docker failure — crashes, network issues, volume problems, OOM kills, and image pull failures.
🔧 Debugging Framework
Container Failure Diagnosis Flow
1. docker ps -a — check exit code and state
↓
2. docker logs <id> — read stdout/stderr
↓
3. docker inspect <id> — metadata, health, OOM flag
↓
4. docker exec -it <id> /bin/sh — enter and poke around
↓
5. docker run -it --entrypoint /bin/sh <image> — override for crashed containers
🔢 Exit Code Reference
text
# Exit Code Cheat Sheet 0 = Clean exit (process ended intentionally) 1 = Application error (check logs for stack trace) 126 = Permission denied — CMD/ENTRYPOINT not executable 127 = Command not found — executable missing in image 137 = OOMKilled (SIGKILL from OOM — out of memory) 139 = Segmentation fault (SIGSEGV) 143 = Graceful stop (SIGTERM received and handled) 255 = Exit status out of range / abnormal exit
🔥 Scenario 1: Container Exits Immediately (Exit 1)
bash
# Symptom: docker ps -a shows "Exited (1) 2 seconds ago" # Diagnose docker logs <container_id> # Look for: stack traces, missing env vars, missing files, port conflicts # Interactive debug (override the CMD) docker run -it --entrypoint /bin/sh myapp:1.0 node server.js # run it manually to see the full error # Common fixes: # - Missing environment variable: export REQUIRED_VAR=value # - Wrong file path in CMD: docker exec and check ls -la /app # - Port conflict: another process using the port (use -p different port)
🔥 Scenario 2: OOMKilled (Exit 137)
bash
# Symptom: container exits with code 137 / "Killed"
docker inspect <id> --format "{{.State.OOMKilled}}" # true
# Diagnose: see how much memory it was using
docker stats --no-stream <id> # right before it dies
# Fixes:
# a) Increase memory limit
docker run -d --memory 1g myapp:1.0
# b) Cap Node.js heap (Node.js does not respect --memory by default)
docker run -e NODE_OPTIONS="--max-old-space-size=800" myapp:1.0
# c) Profile memory leaks in the application
# Use node --inspect and connect Chrome DevTools🔥 Scenario 3: Network — "Connection Refused"
bash
# Symptom: API cannot connect to DB container # Step 1: Are both on same network? docker network inspect myapp-network # Check Containers section — both must be there # Step 2: Is the DB actually running? docker ps | grep db docker logs db # Step 3: Test DNS from inside the API container docker exec -it api nslookup db docker exec -it api ping db docker exec -it api wget -O- http://db:5432 # or curl # Step 4: Check DB port is listening docker exec -it db netstat -tlnp | grep 5432 # Fix: ensure both containers use same user-defined network docker network create myapp-net docker run -d --name db --network myapp-net postgres:15 docker run -d --name api --network myapp-net myapi:latest
🔥 Scenario 4: ImagePullBackOff
bash
# Symptom: docker pull fails with "unauthorized" or "not found" # Check the exact error docker pull myacr.azurecr.io/myapp:1.0 # Fix: re-authenticate az acr login --name myacr # Azure docker login # Docker Hub # Verify image exists in registry az acr repository show-tags --name myacr --repository myapp # Verify full image name and tag (typos are common) docker pull myacr.azurecr.io/myapp:1.2.3 # typo in tag?
🔥 Scenario 5: Volume Mount Empty
bash
# Symptom: app can't find config files — mount appears empty
# Check what is actually mounted
docker inspect myapp --format "{{json .Mounts}}"
# Exec in and check the path
docker exec -it myapp ls -la /etc/myapp/
# Common causes:
# - Relative path in bind mount — must be absolute
# Wrong: -v ./config:/etc/myapp
# Right: -v $(pwd)/config:/etc/myapp # Linux/Mac
# Right: -v ${PWD}/config:/etc/myapp # PowerShell
# Right: -v C:\Users\me\config:/etc/myapp # Windows absolute path🎯 Practice Exercises
- Intentionally create a container with a missing environment variable — diagnose and fix it.
- Run a container with
--memory 10mon a Node.js app — watch it OOMKill. - Start two containers: one on the default bridge, one on a custom network — try to connect them and observe the failure.
- Build an image with a typo in the CMD path — diagnose with
docker logsand--entrypoint /bin/sh.