BeginnerLesson 3 of 16

📄 Dockerfile and Image Build Process

Write production-quality Dockerfiles — master every instruction, optimize layer caching, use .dockerignore, and build images efficiently with BuildKit.

🧒 Simple Explanation (ELI5)

A Dockerfile is a recipe card. It tells Docker: "Start with this base ingredient (Ubuntu/Alpine), add these tools (Node, Python), copy my app files in, and when someone runs this container, execute this command." Each line is a step that Docker executes in order, creating a cached layer.

🔒
Security: Always run as non-root

By default containers run as root — a container breakout could compromise the host. Always create a dedicated non-root user with RUN adduser and switch with USER. This is also required by Kubernetes Pod Security Standards in most production clusters.

🔧 Core Dockerfile Instructions

dockerfile
# FROM — base image (must be first instruction)
FROM node:18-alpine

# WORKDIR — set working directory inside the container
WORKDIR /app

# COPY — copy files from build context into image
# Copy package files first for layer cache optimization
COPY package*.json ./

# RUN — execute command during build (creates a new layer)
RUN npm ci --only=production

# Now copy the rest of the source
COPY . .

# ENV — environment variables available at runtime
ENV NODE_ENV=production
ENV PORT=3000

# EXPOSE — document the port (informational — does NOT publish)
EXPOSE 3000

# ARG — build-time variable (NOT available at runtime)
ARG APP_VERSION=1.0

# LABEL — metadata on the image
LABEL maintainer="team@company.com" version="${APP_VERSION}"

# USER — switch to non-root user for security
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

# CMD — default command (can be overridden at docker run)
CMD ["node", "server.js"]

# ENTRYPOINT — fixed executable (CMD becomes default args)
# ENTRYPOINT ["node"]
# CMD ["server.js"]

📁 .dockerignore

Like .gitignore, this file prevents items from being sent to the Docker build context. Without it, Docker sends your entire project directory including node_modules, .git, and any .env files — slowing builds and risking secret leakage.

.dockerignore
node_modules
.git
.env
.env.*
*.log
dist
coverage
__pycache__
.pytest_cache
README.md
.DS_Store
.vscode

💻 Building Images

bash
# Standard build — tag as myapp:1.0
docker build -t myapp:1.0 .

# Use a specific Dockerfile (useful for multi-stage or env-specific builds)
docker build -t myapp:1.0 -f Dockerfile.prod .

# Pass build args (e.g. for versioning or feature flags)
docker build --build-arg APP_VERSION=2.1 -t myapp:2.1 .

# Enable BuildKit for faster parallel builds and better caching
DOCKER_BUILDKIT=1 docker build -t myapp:1.0 .

# List your built images
docker image ls myapp
💡
CMD vs ENTRYPOINT

Use ENTRYPOINT when the container has one clear purpose (e.g., always run node). Use CMD for the default arguments. Combine them: ENTRYPOINT ["node"] + CMD ["server.js"]. Users can then override the script at runtime without changing the entrypoint.

🧪 Hands-on Exercises

  1. Write a Dockerfile for a Node.js API: use Alpine base, non-root user, and npm ci.
  2. Create a .dockerignore file, then compare build context size with and without it using docker build output.
  3. Use docker image history myapp:1.0 to inspect each layer and its size.
  4. Try both CMD and ENTRYPOINT in a test image — override CMD at docker run time to see the difference.
  5. Build using DOCKER_BUILDKIT=1 and notice how parallel steps are handled.

🐛 Debugging Scenario

Problem: Build succeeds but container crashes immediately with permission denied.

bash
# Check logs first
docker logs <container_id>

# Run interactively to investigate
docker run -it --entrypoint /bin/sh myapp:1.0
ls -la /app                        # check file permissions
id                                 # confirm which user you are

# Fix in Dockerfile — ensure start script is executable
RUN chmod +x /app/start.sh

# Or fix ownership when copying
COPY --chown=appuser:appgroup . .
⚠️
Never store secrets in Dockerfile or image

Any secret passed via RUN or ENV in a Dockerfile ends up in image layers — readable by anyone who pulls the image. Use Docker BuildKit secrets (--secret) for build-time secrets, and runtime environment variables from Key Vault or Kubernetes secrets for production.

🎯 Interview Questions

What is the difference between CMD and ENTRYPOINT?

ENTRYPOINT sets the fixed executable that always runs. CMD provides default arguments to ENTRYPOINT, or acts as the default command if no ENTRYPOINT is set. CMD can be fully overridden at docker run image <args>. ENTRYPOINT can only be overridden with --entrypoint. Best practice: ENTRYPOINT ["node"], CMD ["server.js"].

Why should you use a .dockerignore file?

Without .dockerignore, Docker sends the full build context to the daemon — including node_modules (hundreds of MB), .git history, .env secrets, and test data. This makes builds slow and risks leaking secrets into the image layers. .dockerignore keeps the context small, builds fast, and images clean.

Scenario: You accidentally baked a .env file with secrets into an image and pushed it to a registry. What do you do?

1. Immediately rotate ALL secrets in the .env — assume they are compromised. 2. Delete the image tag from the registry (note: layers may persist — force-delete the manifest). 3. Add .env to .dockerignore. 4. Rebuild and push a clean image. 5. Audit who pulled the compromised image from registry access logs. 6. Switch to proper secrets management: Azure Key Vault, Kubernetes secrets, or Docker BuildKit secrets — never files baked into images.

What is the difference between RUN, CMD, and ENTRYPOINT?

RUN executes commands at build time to create image layers (installing packages, compiling code). CMD and ENTRYPOINT specify what runs at container startup. The key distinction: RUN = build time, CMD/ENTRYPOINT = runtime. Use RUN for setup, CMD/ENTRYPOINT for the application start command.

📋 Summary