Without limits, a memory leak or runaway process can OOM the entire host and take down all containers. In Kubernetes, resources without limits also cause pods to be evicted under node pressure. Every production container must have memory and CPU limits.
⚙️ Production Operations, Health Checks and Resource Limits
Run containers safely in production — set memory and CPU limits, configure health checks, implement graceful shutdown, and handle logging properly.
🧒 Simple Explanation (ELI5)
Running a container in production without limits is like letting a tenant in your building use unlimited electricity and water with no circuit breakers. One runaway process can starve all other containers on the host. Resource limits are the fuses and breakers — they protect everyone by capping what any one container can consume.
🔧 Resource Limits
# Memory limits docker run -d \ --memory 512m \ # hard limit: container is OOMKilled if exceeded --memory-reservation 256m \ # soft limit: reclaimed under host pressure --memory-swap 512m \ # disable swap (= --memory value means no swap) myapp:1.0 # CPU limits docker run -d \ --cpus 1.5 \ # max 1.5 CPU cores --cpu-shares 512 \ # relative weight (default 1024) myapp:1.0 # Both together (typical production settings) docker run -d \ --name api \ --memory 512m \ --memory-reservation 256m \ --cpus 1.0 \ --restart unless-stopped \ myapp:1.0 # Monitor resource usage docker stats # live view of all containers docker stats api --no-stream # single snapshot
🏥 Health Checks
# Dockerfile HEALTHCHECK instruction HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 \ CMD curl -f http://localhost:8080/healthz || exit 1 # Alternatives for containers without curl: HEALTHCHECK CMD wget -qO- http://localhost:8080/healthz || exit 1 # For Node.js — use a lightweight custom check script: HEALTHCHECK CMD node healthcheck.js
# Check container health status
docker inspect --format "{{.State.Health.Status}}" myapp
# Outputs: healthy / unhealthy / starting
# View health check log
docker inspect --format "{{json .State.Health}}" myapp | python -m json.tool🚪 Graceful Shutdown
# Node.js graceful shutdown — handle SIGTERM from docker stop / K8s
const server = app.listen(PORT, () => console.log('Server started'));
process.on('SIGTERM', () => {
console.log('SIGTERM received — shutting down gracefully');
server.close(() => {
// Close DB connections, flush logs, etc.
console.log('Server closed');
process.exit(0);
});
// Force exit after 30s if server.close hangs
setTimeout(() => process.exit(1), 30000);
});📝 Production Logging
# Containers should log to stdout/stderr — Docker captures it # Never write to files inside the container # Configure log driver (default is json-file) docker run -d \ --log-driver json-file \ --log-opt max-size=10m \ # rotate at 10MB --log-opt max-file=3 \ # keep 3 rotated files myapp:1.0 # Use syslog / fluentd / Azure Monitor for production log aggregation docker run -d \ --log-driver fluentd \ --log-opt fluentd-address=localhost:24224 \ myapp:1.0 # View logs regardless of driver docker logs myapp
🐛 Debugging Scenario
Problem: Container keeps restarting with exit code 137 — OOMKilled.
# Step 1: confirm OOM kill docker inspect <id> | grep -i oom # OOMKilled: true # Step 2: find the actual memory usage before it died docker stats --no-stream <id> # realtime before next crash # Step 3: look at historical usage (if monitoring set up) # Azure Monitor / Grafana / Prometheus node_memory metrics # Step 4: fix options: # a) Increase the memory limit docker run -d --memory 1g myapp:1.0 # b) Fix the memory leak in the application (profiling needed) # c) Set NODE_OPTIONS=--max-old-space-size=512 for Node.js heap cap
🎯 Interview Questions
The Linux kernel's Out-of-Memory killer terminates the container process with SIGKILL. The container exits with code 137. Docker (and Kubernetes) may then restart it depending on the restart policy. This is OOMKilled. To diagnose: docker inspect <id> | grep OOMKilled. Fix: increase the memory limit, fix the memory leak, or reduce heap usage.
The health endpoint should return HTTP 200 when the application is ready to receive traffic. It should check the minimum required dependencies (DB connection, config loaded) — not deep dependencies that could cause false positives. A full dependency check should be a separate /ready endpoint. The check should be fast (under 2s) because it runs every 30s. Return non-200 or non-zero exit code when unhealthy.
This is a liveness probe failure scenario. Without a health check, Docker has no way to detect a hung (zombie) application. Fix: 1) Add a HEALTHCHECK to the Dockerfile. 2) Configure --health-cmd, --health-interval, and --health-retries at run time. 3) In Kubernetes, add a livenessProbe — K8s restarts the pod when the probe fails. 4) Set --restart unless-stopped so Docker restarts on unhealthy status. 5) Add application-level watchdog timeouts for hung requests.
📋 Summary
- Always set
--memoryand--cpuslimits in production — unbound containers can starve the host. - Exit 137 = OOMKilled — increase memory limit or fix the leak.
- Add HEALTHCHECK to Dockerfiles — Docker marks containers unhealthy and can auto-restart them.
- Log to stdout/stderr — configure rotation with
max-sizeandmax-file. - Handle SIGTERM for graceful shutdown — drain connections before exiting.