Hands-on Lesson 12 of 14

Build a Full Pipeline

End-to-end hands-on lab: lint → test → build Docker → push ACR → Helm deploy to AKS → smoke test — a production-grade CI/CD pipeline from scratch.

🧒 Simple Explanation (ELI5)

You're the head chef opening a brand-new restaurant. This lab is creating the entire operation from scratch:

Ingredient quality check (lint) — inspect every ingredient before it enters the kitchen. Rotten lettuce? Rejected immediately.
Cooking (build) — combine the ingredients into a dish following the recipe exactly.
Taste test (test) — the sous chef tastes every dish before it leaves the kitchen. Bad flavor? Back to the stove.
Plating (Docker) — package the dish beautifully on a plate so it looks and works the same every single time.
Sending to the dining room (AKS deploy) — the waiter carries the plated dish to the customer's table (your live cluster).
Checking the customer is happy (smoke test) — the manager walks by the table: "Is everything okay?" If yes, success. If not, pull the dish and fix it.

By the end of this lab, you'll have an automated restaurant — every code push triggers the entire operation with zero human intervention until the final production approval.

📋 Lab Overview

What you'll build: A complete CI/CD pipeline for a Node.js Express application — from first commit to production deployment on AKS.

Architecture	GitHub → Actions → Docker → ACR → Helm → AKS
Time estimate	45–60 minutes
Difficulty	Intermediate to Advanced
Pipeline stages	Lint → Test → Build & Push → Deploy Staging → Smoke Test → Deploy Production

Prerequisites

GitHub account with a repository
Azure subscription (free tier works)
Azure CLI installed locally (az --version)
Basic YAML knowledge (covered in Lesson 3)
Docker basics (building images, Dockerfiles)
Helm basics (charts, values — covered in Lessons 9–10)

⌨️ Step 1 — Project Setup

Create a simple Express application with a health endpoint, a Dockerfile, tests, and a Helm chart. This gives us everything we need for a full pipeline.

1a. Application Code

package.json

json

{
  "name": "myapp",
  "version": "1.0.0",
  "description": "Demo app for full CI/CD pipeline",
  "main": "app.js",
  "scripts": {
    "start": "node app.js",
    "test": "jest",
    "lint": "eslint ."
  },
  "dependencies": {
    "express": "^4.18.2"
  },
  "devDependencies": {
    "eslint": "^8.56.0",
    "jest": "^29.7.0",
    "supertest": "^6.3.3"
  }
}

app.js

javascript

const express = require('express');
const app = express();
const PORT = process.env.PORT || 3000;

// Health endpoint — used by smoke tests and K8s probes
app.get('/health', (req, res) => {
  res.status(200).json({
    status: 'healthy',
    version: process.env.APP_VERSION || '1.0.0',
    environment: process.env.NODE_ENV || 'development',
    timestamp: new Date().toISOString()
  });
});

app.get('/', (req, res) => {
  res.json({ message: 'Hello from myapp!' });
});

// Only start the server if not in test mode
if (process.env.NODE_ENV !== 'test') {
  app.listen(PORT, () => {
    console.log(`Server running on port ${PORT}`);
  });
}

module.exports = app;

app.test.js

javascript

const request = require('supertest');
const app = require('./app');

describe('GET /health', () => {
  it('should return 200 and healthy status', async () => {
    const res = await request(app).get('/health');
    expect(res.statusCode).toBe(200);
    expect(res.body.status).toBe('healthy');
  });
});

describe('GET /', () => {
  it('should return welcome message', async () => {
    const res = await request(app).get('/');
    expect(res.statusCode).toBe(200);
    expect(res.body.message).toBeDefined();
  });
});

1b. Dockerfile

dockerfile

# --- Build stage ---
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# --- Production stage ---
FROM node:20-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY app.js ./
ENV NODE_ENV=production
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "app.js"]

1c. Helm Chart

Create the chart directory structure:

text

charts/myapp/
├── Chart.yaml
├── values.yaml
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    └── ingress.yaml

Chart.yaml

yaml

apiVersion: v2
name: myapp
description: A Helm chart for myapp CI/CD demo
type: application
version: 0.1.0
appVersion: "1.0.0"

values.yaml

yaml

replicaCount: 2

image:
  repository: myacr.azurecr.io/myapp
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80
  targetPort: 3000

ingress:
  enabled: true
  className: nginx
  host: myapp.example.com

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

env: production

templates/deployment.yaml

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
  labels:
    app: {{ .Release.Name }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ .Release.Name }}
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
    spec:
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - containerPort: {{ .Values.service.targetPort }}
          env:
            - name: NODE_ENV
              value: {{ .Values.env | quote }}
          livenessProbe:
            httpGet:
              path: /health
              port: {{ .Values.service.targetPort }}
            initialDelaySeconds: 10
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /health
              port: {{ .Values.service.targetPort }}
            initialDelaySeconds: 5
            periodSeconds: 10
          resources:
            {{- toYaml .Values.resources | nindent 12 }}

templates/service.yaml

yaml

apiVersion: v1
kind: Service
metadata:
  name: {{ .Release.Name }}
spec:
  type: {{ .Values.service.type }}
  selector:
    app: {{ .Release.Name }}
  ports:
    - port: {{ .Values.service.port }}
      targetPort: {{ .Values.service.targetPort }}
      protocol: TCP

templates/ingress.yaml

yaml

{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ .Release.Name }}
spec:
  ingressClassName: {{ .Values.ingress.className }}
  rules:
    - host: {{ .Values.ingress.host }}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: {{ .Release.Name }}
                port:
                  number: {{ .Values.service.port }}
{{- end }}

🔐 Step 2 — Configure Secrets & Variables

Go to your GitHub repository → Settings → Secrets and variables → Actions.

Required Secrets (OIDC — no passwords stored!)

Secret Name	Description	Example
`AZURE_CLIENT_ID`	App registration client ID	`12345678-abcd-...`
`AZURE_TENANT_ID`	Azure AD tenant ID	`abcdef12-3456-...`
`AZURE_SUBSCRIPTION_ID`	Azure subscription ID	`aaaabbbb-cccc-...`

Required Variables

Variable Name	Description	Example
`ACR_NAME`	Azure Container Registry name	`myappacr`
`AKS_RESOURCE_GROUP`	AKS cluster resource group	`myapp-rg`
`AKS_CLUSTER_NAME`	AKS cluster name	`myapp-aks`

Set Up OIDC Federated Credential

Instead of storing Azure passwords, we use OpenID Connect. Create a federated credential on your Azure App Registration:

bash

# Create app registration
az ad app create --display-name "github-actions-myapp"
APP_ID=$(az ad app list --display-name "github-actions-myapp" --query "[0].appId" -o tsv)

# Create service principal
az ad sp create --id $APP_ID

# Add federated credential for main branch
az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-main",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:YOUR_ORG/YOUR_REPO:ref:refs/heads/main",
  "audiences": ["api://AzureADTokenExchange"]
}'

# Add federated credential for staging environment
az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-staging",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:YOUR_ORG/YOUR_REPO:environment:staging",
  "audiences": ["api://AzureADTokenExchange"]
}'

# Add federated credential for production environment
az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-production",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:YOUR_ORG/YOUR_REPO:environment:production",
  "audiences": ["api://AzureADTokenExchange"]
}'

# Grant Contributor role on AKS resource group
az role assignment create --assignee $APP_ID \
  --role Contributor \
  --scope /subscriptions/YOUR_SUB_ID/resourceGroups/myapp-rg

# Grant AcrPush role on ACR
az role assignment create --assignee $APP_ID \
  --role AcrPush \
  --scope /subscriptions/YOUR_SUB_ID/resourceGroups/myapp-rg/providers/Microsoft.ContainerRegistry/registries/myappacr

Replace placeholders: Update YOUR_ORG/YOUR_REPO, YOUR_SUB_ID, resource group and ACR names with your actual values.

🚀 Step 3 — Create the Pipeline

Create .github/workflows/ci-cd.yml — the full pipeline in a single file:

yaml

name: Full CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  id-token: write
  contents: read
  packages: write

env:
  ACR_NAME: ${{ vars.ACR_NAME }}
  IMAGE_NAME: myapp

jobs:
  # ──────────────────────── STAGE 1: LINT ────────────────────────
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm run lint

  # ──────────────────────── STAGE 2: TEST ────────────────────────
  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm test -- --coverage
      - uses: actions/upload-artifact@v4
        with:
          name: coverage-report
          path: coverage/

  # ──────────────────── STAGE 3: BUILD & PUSH ──────────────────
  build-and-push:
    needs: test
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ github.sha }}
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - run: az acr login --name ${{ env.ACR_NAME }}

      - uses: docker/build-push-action@v5
        with:
          push: true
          tags: |
            ${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }}:${{ github.sha }}
            ${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }}:latest

  # ──────────────── STAGE 4: DEPLOY TO STAGING ─────────────────
  deploy-staging:
    needs: build-and-push
    runs-on: ubuntu-latest
    environment:
      name: staging
      url: https://staging.myapp.com
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - uses: azure/aks-set-context@v3
        with:
          resource-group: ${{ vars.AKS_RESOURCE_GROUP }}
          cluster-name: ${{ vars.AKS_CLUSTER_NAME }}

      - run: |
          helm upgrade --install myapp-staging ./charts/myapp \
            --namespace staging --create-namespace \
            --set image.repository=${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }} \
            --set image.tag=${{ github.sha }} \
            --set env=staging \
            --wait --timeout 5m

  # ──────────────── STAGE 5: SMOKE TEST ────────────────────────
  smoke-test:
    needs: deploy-staging
    runs-on: ubuntu-latest
    steps:
      - name: Health check with retries
        run: |
          for i in {1..10}; do
            STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://staging.myapp.com/health)
            if [ "$STATUS" = "200" ]; then
              echo "✅ Health check passed!"
              exit 0
            fi
            echo "Attempt $i: Status $STATUS, retrying in 10s..."
            sleep 10
          done
          echo "❌ Smoke test failed after 10 attempts!"
          exit 1

  # ──────────── STAGE 6: DEPLOY TO PRODUCTION ──────────────────
  deploy-production:
    needs: smoke-test
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://myapp.com
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - uses: azure/aks-set-context@v3
        with:
          resource-group: ${{ vars.AKS_RESOURCE_GROUP }}
          cluster-name: ${{ vars.AKS_CLUSTER_NAME }}

      - run: |
          helm upgrade --install myapp ./charts/myapp \
            --namespace production --create-namespace \
            --set image.repository=${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }} \
            --set image.tag=${{ github.sha }} \
            --set env=production \
            --set replicaCount=3 \
            --wait --timeout 5m

Pipeline Breakdown

Stage	Purpose	Depends On	Runs On
lint	Code quality — catch syntax and style errors early	—	Every push & PR
test	Run unit tests, generate coverage report	lint	Every push & PR
build-and-push	Build Docker image, push to ACR	test	main branch only
deploy-staging	Helm deploy to staging namespace	build-and-push	main branch only
smoke-test	Hit /health endpoint, verify 200	deploy-staging	main branch only
deploy-production	Helm deploy to production namespace	smoke-test	main + manual approval

⚙️ Step 4 — Configure Environments

Go to your repository → Settings → Environments.

Staging Environment

Name: staging
No protection rules — deploys automatically after build
Add the AZURE_* secrets scoped to this environment

Production Environment

Name: production
Required reviewers: Add 1–2 team leads who must approve before deploy
Wait timer: 5 minutes — gives reviewers time to check staging
Deployment branches: Restrict to main branch only
Add the AZURE_* secrets scoped to this environment

Why separate environment secrets? You can use different Azure credentials or even different Azure subscriptions for staging vs production. This is a security best practice — a compromised staging credential can't touch production.

🧪 Step 5 — Test the Pipeline

5a. Trigger the Pipeline

bash

# Push your code to main
git add .
git commit -m "feat: add full CI/CD pipeline"
git push origin main

5b. Watch Each Stage

Go to Actions tab in your repository. You should see:

lint — runs ESLint, should pass in ~30s
test — runs Jest with coverage, ~45s
build-and-push — builds Docker image, pushes to ACR, ~2–3 min
deploy-staging — Helm installs to staging namespace, ~1–2 min
smoke-test — curls the health endpoint, ~10s–2 min
deploy-production — ⏸️ waiting for approval

5c. Verify Staging

bash

# Check staging pods
kubectl get pods -n staging
# NAME                              READY   STATUS    RESTARTS   AGE
# myapp-staging-6d4f8b7c9-abc12    1/1     Running   0          2m

# Check the deployment
kubectl describe deployment myapp-staging -n staging

# Port-forward and test locally
kubectl port-forward svc/myapp-staging 8080:80 -n staging &
curl http://localhost:8080/health
# {"status":"healthy","version":"1.0.0","environment":"staging",...}

5d. Approve Production Deployment

In the Actions tab, click the pending deploy-production job
Click "Review deployments"
Select the production environment checkbox
Add an optional comment: "Staging verified, approving production"
Click "Approve and deploy"

5e. Verify Production

bash

# Check production pods (should have 3 replicas)
kubectl get pods -n production
# NAME                     READY   STATUS    RESTARTS   AGE
# myapp-7f9d8c6b5-x1y2z   1/1     Running   0          1m
# myapp-7f9d8c6b5-a3b4c   1/1     Running   0          1m
# myapp-7f9d8c6b5-d5e6f   1/1     Running   0          1m

# Verify health
curl https://myapp.com/health
# {"status":"healthy","environment":"production",...}

🗺️ Pipeline Flow Diagram

text

┌─────────────────────────────────────────────────────────────────────────────┐
│                          FULL CI/CD PIPELINE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐     Every push & PR                                          │
│  │ git push │─────────────┐                                                │
│  └──────────┘             ▼                                                │
│                    ┌──────────────┐                                         │
│                    │    LINT      │  ESLint code quality                    │
│                    │  (~30 sec)   │                                         │
│                    └──────┬───────┘                                         │
│                           ▼                                                │
│                    ┌──────────────┐                                         │
│                    │    TEST      │  Jest + coverage report                 │
│                    │  (~45 sec)   │  → artifact: coverage/                  │
│                    └──────┬───────┘                                         │
│                           │                                                │
│           ┌───────────────┼────────────────────┐                           │
│           │ PR stops here │ main branch only   │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │ BUILD & PUSH │            │                           │
│           │        │  Docker→ACR  │            │                           │
│           │        │  (~2-3 min)  │            │                           │
│           │        └──────┬───────┘            │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │   DEPLOY     │            │                           │
│           │        │   STAGING    │            │                           │
│           │        │  Helm→AKS   │            │                           │
│           │        └──────┬───────┘            │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │ SMOKE TEST   │            │                           │
│           │        │ curl /health │            │                           │
│           │        │ 10 retries   │            │                           │
│           │        └──────┬───────┘            │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │  ⏸️ APPROVAL  │  Required reviewers                    │
│           │        │  (manual)    │  + 5-min wait timer                    │
│           │        └──────┬───────┘            │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │   DEPLOY     │            │                           │
│           │        │  PRODUCTION  │  3 replicas                            │
│           │        │  Helm→AKS   │                                        │
│           │        └──────────────┘            │                           │
│           └────────────────────────────────────┘                           │
└─────────────────────────────────────────────────────────────────────────────┘

🔧 Troubleshooting

Common issues at each stage and how to fix them:

Lint Stage Failures

Error	Cause	Fix
`ESLint: command not found`	ESLint not in devDependencies	Run `npm install --save-dev eslint`
`Parsing error: Unexpected token`	Missing ESLint config or wrong parser	Add `.eslintrc.json` with `{"env":{"node":true,"es2021":true}}`
Lint passes locally but fails in CI	Different Node version or missing `.eslintrc`	Ensure `.eslintrc.json` is committed and Node version matches

Test Stage Failures

Error	Cause	Fix
`Cannot find module 'supertest'`	Missing dependency	Check `devDependencies`, ensure `npm ci` runs first
Tests pass locally, fail in CI	Port conflicts, timing issues	Don't start server in tests — use `supertest(app)` directly
`--coverage` flag not recognized	Wrong test runner	Ensure Jest is configured: `"test": "jest"` in package.json

Build & Push Stage Failures

Error	Cause	Fix
`AADSTS700024: Client assertion not within valid time range`	OIDC token expired or clock drift	Re-run the workflow; check federated credential config
`unauthorized: authentication required`	ACR login failed	Verify `ACR_NAME` variable, check RBAC: `AcrPush` role assigned
`denied: requested access to the resource is denied`	Service principal lacks push permission	Run `az role assignment create --role AcrPush`
Docker build fails — `COPY failed`	Build context wrong or file missing	Add `context: .` to `docker/build-push-action`

Deploy Staging Failures

Error	Cause	Fix
`Error: timed out waiting for condition`	Pod not becoming ready	Check `kubectl describe pod`, verify image pull, check health probe
`Error: UPGRADE FAILED: release not found`	First deploy uses `upgrade` without `--install`	Use `helm upgrade --install` (already in our pipeline)
`ErrImagePull` / `ImagePullBackOff`	AKS can't pull from ACR	Attach ACR: `az aks update -n CLUSTER -g RG --attach-acr ACR_NAME`

Smoke Test Failures

Error	Cause	Fix
All 10 attempts return `000`	DNS not configured or Ingress missing	Check Ingress resource, verify DNS A record points to load balancer IP
Returns `503`	Backend pods not ready	Increase `--wait --timeout` in deploy step, check readiness probe
Returns `502`	Ingress controller can't reach backend	Check Service selector matches Pod labels, verify port numbers

Production Deploy Failures

Error	Cause	Fix
Job stays in "Waiting" forever	No reviewer approved	Check Environment settings, ensure reviewers are added
Approval granted but job fails	OIDC federated credential not set for `production` environment	Add federated credential with `subject: ...environment:production`

🏆 Challenge Extensions

You've built the core pipeline. Now level up with these challenges:

Challenge 1: Add Slack Notifications

Notify your team channel on deploy success or failure:

yaml

  notify:
    needs: [deploy-production]
    if: always()
    runs-on: ubuntu-latest
    steps:
      - uses: 8398a7/action-slack@v3
        with:
          status: ${{ job.status }}
          fields: repo,message,commit,author,action,eventName,ref,workflow
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Challenge 2: Auto-Rollback on Failed Smoke Test

If the smoke test fails, automatically roll back the staging deployment:

yaml

  rollback-staging:
    needs: smoke-test
    if: failure()
    runs-on: ubuntu-latest
    steps:
      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - uses: azure/aks-set-context@v3
        with:
          resource-group: ${{ vars.AKS_RESOURCE_GROUP }}
          cluster-name: ${{ vars.AKS_CLUSTER_NAME }}
      - run: |
          echo "🔄 Rolling back staging..."
          helm rollback myapp-staging -n staging
          echo "✅ Rollback complete"

Challenge 3: Add Canary Deployment

Deploy to a small percentage of traffic first, then shift 100% if healthy:

yaml

  canary-deploy:
    needs: smoke-test
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      # ... Azure login & AKS context ...
      - name: Deploy canary (10% traffic)
        run: |
          helm upgrade --install myapp-canary ./charts/myapp \
            --namespace production \
            --set image.tag=${{ github.sha }} \
            --set replicaCount=1 \
            --wait --timeout 3m

      - name: Monitor canary (5 minutes)
        run: |
          for i in {1..30}; do
            ERROR_RATE=$(curl -s https://myapp.com/metrics | grep error_rate | awk '{print $2}')
            if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
              echo "❌ Error rate too high: $ERROR_RATE"
              helm rollback myapp-canary -n production
              exit 1
            fi
            sleep 10
          done
          echo "✅ Canary healthy — promoting to full deploy"

      - name: Promote to full production
        run: |
          helm upgrade --install myapp ./charts/myapp \
            --namespace production \
            --set image.tag=${{ github.sha }} \
            --set replicaCount=3 \
            --wait --timeout 5m
          helm uninstall myapp-canary -n production

💬 Interview Questions

Q1. Walk me through how you'd design a CI/CD pipeline for a microservice deployed to Kubernetes.

A: I'd design a multi-stage pipeline: (1) Lint & static analysis for fast feedback on code quality, (2) Unit & integration tests with coverage thresholds, (3) Docker build & push to a container registry with tags based on commit SHA for traceability, (4) Deploy to staging using Helm with --wait to ensure pods are healthy, (5) Automated smoke/integration tests against staging, (6) Manual approval gate for production, (7) Production deploy with Helm, and optionally (8) Post-deploy verification. Each stage depends on the previous one succeeding. The pipeline uses OIDC for Azure auth (no stored secrets), GitHub Environments for protection rules, and artifacts for passing data between jobs.

Q2. Why separate lint and test into different jobs instead of one job?

A: Three reasons: (1) Fast fail — lint catches syntax errors in seconds without waiting for a full test suite, (2) Parallelism opportunity — in more complex pipelines, independent checks can run in parallel, and (3) Clear feedback — developers see exactly which stage failed. A lint failure means "fix your code style," while a test failure means "fix your logic." One combined job would give ambiguous feedback.

Q3. What is OIDC authentication and why is it preferred over service principal secrets?

A: OIDC (OpenID Connect) lets GitHub Actions request a short-lived token from Azure AD without storing any secrets. The workflow presents a JWT to Azure, which validates it against the configured federated credential (checking the repo, branch, and environment). Benefits: (1) No secret rotation — there's no password to expire or rotate, (2) Scoped access — tokens are valid only for the specific repo/branch/environment, (3) Audit trail — every token request is logged in Azure AD, (4) No leakage risk — there's nothing stored that can be exposed in logs or compromised repos.

Q4. How does `helm upgrade --install` differ from `helm install`?

A: helm install creates a new release and fails if it already exists. helm upgrade --install upgrades an existing release or creates it if it doesn't exist. This makes it idempotent — safe to run repeatedly in CI/CD without checking whether it's the first deploy or the 100th. The --wait flag ensures Helm waits for all pods to be Ready before reporting success, which is critical for pipeline reliability.

Q5. A deployment to staging succeeds but the smoke test fails with 503 errors. How do you debug?

A: Step-by-step: (1) Check if pods are actually Running and Ready: kubectl get pods -n staging, (2) Check the readiness probe — if the probe fails, the Service won't route traffic, (3) Check the Service: kubectl describe svc -n staging — are the endpoints populated?, (4) Check the Ingress: kubectl describe ingress -n staging — is the backend correctly mapped?, (5) Check the Ingress Controller logs for upstream errors, (6) Port-forward directly to the pod: kubectl port-forward pod/xxx 3000:3000 — does the app respond? If yes, the issue is in Service/Ingress networking, not the app.

Q6. How do you handle database migrations in a CI/CD pipeline?

A: Migrations should be a separate step before the application deploy. Options: (1) A Helm pre-upgrade hook that runs a migration Job, (2) A dedicated pipeline job between build and deploy that applies migrations, (3) An init container in the deployment that runs migrations before the app starts. Key rules: migrations must be backward-compatible (additive only), never rename/drop columns in the same release as code changes, and always support rollback by making each migration reversible.

Q7. What happens if the production deploy fails halfway? How do you recover?

A: Helm's --wait flag ensures that if the new version's pods don't become Ready within the timeout, the upgrade is marked as failed. Recovery: (1) Automatic: Run helm rollback myapp -n production to revert to the last successful revision, (2) Pipeline-level: Add a rollback job triggered by if: failure() in the deploy job, (3) Kubernetes-level: Kubernetes keeps the old ReplicaSet — you can also use kubectl rollout undo deployment/myapp. Prevention: use blue-green or canary deployments so the old version keeps serving while the new one is verified.

Q8. How do you prevent two developers from deploying to production simultaneously?

A: Use GitHub Actions concurrency control: concurrency: { group: deploy-production, cancel-in-progress: false }. This ensures only one deploy-production job runs at a time. Setting cancel-in-progress: false is critical for deploys — you don't want to cancel a running deployment. Instead, the second deploy queues until the first finishes. Additionally, the environment protection rules (required reviewers) act as a human gate — only one deployment can be approved and proceed at a time.

Q9. Why use the commit SHA as the image tag instead of `latest`?

A: (1) Traceability — every running container can be traced back to the exact commit that produced it, (2) Immutability — the tag abc123 always points to the same image; latest is mutable and can be overwritten, (3) Rollback precision — to roll back, deploy the previous commit's SHA tag, (4) Cache invalidation — Kubernetes detects tag changes and pulls the new image; redeploying latest may serve a cached old image. We still also tag latest for convenience, but deployments always reference the SHA.

Q10. How would you add a security scanning stage to this pipeline?

A: Insert a security scan job after the build step: (1) Container scan: Use aquasecurity/trivy-action@master to scan the Docker image for CVEs before pushing, (2) SAST: Add CodeQL or Semgrep in the lint stage to catch security bugs in source code, (3) Dependency audit: Add npm audit --audit-level=high after npm ci, (4) Secret scanning: Enable GitHub's built-in secret scanning and add trufflesecurity/trufflehog in CI. Gate the pipeline — if any critical vulnerability is found, block the deploy. For non-critical findings, report to a dashboard but don't block.

📝 Summary

Project setup: Express app + Dockerfile + Helm chart — the three pillars of a K8s-deployed microservice
OIDC auth: Federated credentials eliminate stored secrets. Configure separate credentials for each environment
Pipeline stages: Lint → Test → Build & Push → Deploy Staging → Smoke Test → Deploy Production. Each stage gates the next
Environments: Staging auto-deploys; Production requires manual approval with wait timer and branch restrictions
Smoke testing: Automated health check with retries catches broken deploys before they reach production
Image tagging: Use commit SHA for traceability and immutability; latest as a convenience alias only
Troubleshooting: Work backwards from the failing stage — check logs, describe resources, port-forward to isolate
Extensions: Slack notifications, auto-rollback, canary deployments — build on the core pipeline incrementally

← Matrix & Advanced Patterns Debugging Workflows →

← Back to GitHub Actions Course

Build a Full Pipeline

🧒 Simple Explanation (ELI5)

📋 Lab Overview

Prerequisites

⌨️ Step 1 — Project Setup

1a. Application Code

1b. Dockerfile

1c. Helm Chart

🔐 Step 2 — Configure Secrets & Variables

Required Secrets (OIDC — no passwords stored!)

Required Variables

Set Up OIDC Federated Credential

🚀 Step 3 — Create the Pipeline

Pipeline Breakdown

⚙️ Step 4 — Configure Environments

Staging Environment

Production Environment

🧪 Step 5 — Test the Pipeline

5a. Trigger the Pipeline

5b. Watch Each Stage

5c. Verify Staging

5d. Approve Production Deployment

5e. Verify Production

🗺️ Pipeline Flow Diagram

🔧 Troubleshooting

Lint Stage Failures

Test Stage Failures

Build & Push Stage Failures

Deploy Staging Failures

Smoke Test Failures

Production Deploy Failures

🏆 Challenge Extensions

Challenge 1: Add Slack Notifications

Challenge 2: Auto-Rollback on Failed Smoke Test

Challenge 3: Add Canary Deployment

💬 Interview Questions

Q1. Walk me through how you'd design a CI/CD pipeline for a microservice deployed to Kubernetes.

Q2. Why separate lint and test into different jobs instead of one job?

Q3. What is OIDC authentication and why is it preferred over service principal secrets?

Q4. How does helm upgrade --install differ from helm install?

Q5. A deployment to staging succeeds but the smoke test fails with 503 errors. How do you debug?

Q6. How do you handle database migrations in a CI/CD pipeline?

Q7. What happens if the production deploy fails halfway? How do you recover?

Q8. How do you prevent two developers from deploying to production simultaneously?

Q9. Why use the commit SHA as the image tag instead of latest?

Q10. How would you add a security scanning stage to this pipeline?

📝 Summary

Q4. How does `helm upgrade --install` differ from `helm install`?

Q9. Why use the commit SHA as the image tag instead of `latest`?