Hands-on Lesson 12 of 14

Build a Full Pipeline

End-to-end hands-on lab: lint → test → build Docker → push ACR → Helm deploy to AKS → smoke test — a production-grade CI/CD pipeline from scratch.

🧒 Simple Explanation (ELI5)

You're the head chef opening a brand-new restaurant. This lab is creating the entire operation from scratch:

By the end of this lab, you'll have an automated restaurant — every code push triggers the entire operation with zero human intervention until the final production approval.

📋 Lab Overview

What you'll build: A complete CI/CD pipeline for a Node.js Express application — from first commit to production deployment on AKS.
ArchitectureGitHub → Actions → Docker → ACR → Helm → AKS
Time estimate45–60 minutes
DifficultyIntermediate to Advanced
Pipeline stagesLint → Test → Build & Push → Deploy Staging → Smoke Test → Deploy Production

Prerequisites

⌨️ Step 1 — Project Setup

Create a simple Express application with a health endpoint, a Dockerfile, tests, and a Helm chart. This gives us everything we need for a full pipeline.

1a. Application Code

package.json

json
{
  "name": "myapp",
  "version": "1.0.0",
  "description": "Demo app for full CI/CD pipeline",
  "main": "app.js",
  "scripts": {
    "start": "node app.js",
    "test": "jest",
    "lint": "eslint ."
  },
  "dependencies": {
    "express": "^4.18.2"
  },
  "devDependencies": {
    "eslint": "^8.56.0",
    "jest": "^29.7.0",
    "supertest": "^6.3.3"
  }
}

app.js

javascript
const express = require('express');
const app = express();
const PORT = process.env.PORT || 3000;

// Health endpoint — used by smoke tests and K8s probes
app.get('/health', (req, res) => {
  res.status(200).json({
    status: 'healthy',
    version: process.env.APP_VERSION || '1.0.0',
    environment: process.env.NODE_ENV || 'development',
    timestamp: new Date().toISOString()
  });
});

app.get('/', (req, res) => {
  res.json({ message: 'Hello from myapp!' });
});

// Only start the server if not in test mode
if (process.env.NODE_ENV !== 'test') {
  app.listen(PORT, () => {
    console.log(`Server running on port ${PORT}`);
  });
}

module.exports = app;

app.test.js

javascript
const request = require('supertest');
const app = require('./app');

describe('GET /health', () => {
  it('should return 200 and healthy status', async () => {
    const res = await request(app).get('/health');
    expect(res.statusCode).toBe(200);
    expect(res.body.status).toBe('healthy');
  });
});

describe('GET /', () => {
  it('should return welcome message', async () => {
    const res = await request(app).get('/');
    expect(res.statusCode).toBe(200);
    expect(res.body.message).toBeDefined();
  });
});

1b. Dockerfile

dockerfile
# --- Build stage ---
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# --- Production stage ---
FROM node:20-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY app.js ./
ENV NODE_ENV=production
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "app.js"]

1c. Helm Chart

Create the chart directory structure:

text
charts/myapp/
├── Chart.yaml
├── values.yaml
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    └── ingress.yaml

Chart.yaml

yaml
apiVersion: v2
name: myapp
description: A Helm chart for myapp CI/CD demo
type: application
version: 0.1.0
appVersion: "1.0.0"

values.yaml

yaml
replicaCount: 2

image:
  repository: myacr.azurecr.io/myapp
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80
  targetPort: 3000

ingress:
  enabled: true
  className: nginx
  host: myapp.example.com

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

env: production

templates/deployment.yaml

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
  labels:
    app: {{ .Release.Name }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ .Release.Name }}
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
    spec:
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - containerPort: {{ .Values.service.targetPort }}
          env:
            - name: NODE_ENV
              value: {{ .Values.env | quote }}
          livenessProbe:
            httpGet:
              path: /health
              port: {{ .Values.service.targetPort }}
            initialDelaySeconds: 10
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /health
              port: {{ .Values.service.targetPort }}
            initialDelaySeconds: 5
            periodSeconds: 10
          resources:
            {{- toYaml .Values.resources | nindent 12 }}

templates/service.yaml

yaml
apiVersion: v1
kind: Service
metadata:
  name: {{ .Release.Name }}
spec:
  type: {{ .Values.service.type }}
  selector:
    app: {{ .Release.Name }}
  ports:
    - port: {{ .Values.service.port }}
      targetPort: {{ .Values.service.targetPort }}
      protocol: TCP

templates/ingress.yaml

yaml
{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ .Release.Name }}
spec:
  ingressClassName: {{ .Values.ingress.className }}
  rules:
    - host: {{ .Values.ingress.host }}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: {{ .Release.Name }}
                port:
                  number: {{ .Values.service.port }}
{{- end }}

🔐 Step 2 — Configure Secrets & Variables

Go to your GitHub repository → Settings → Secrets and variables → Actions.

Required Secrets (OIDC — no passwords stored!)

Secret NameDescriptionExample
AZURE_CLIENT_IDApp registration client ID12345678-abcd-...
AZURE_TENANT_IDAzure AD tenant IDabcdef12-3456-...
AZURE_SUBSCRIPTION_IDAzure subscription IDaaaabbbb-cccc-...

Required Variables

Variable NameDescriptionExample
ACR_NAMEAzure Container Registry namemyappacr
AKS_RESOURCE_GROUPAKS cluster resource groupmyapp-rg
AKS_CLUSTER_NAMEAKS cluster namemyapp-aks

Set Up OIDC Federated Credential

Instead of storing Azure passwords, we use OpenID Connect. Create a federated credential on your Azure App Registration:

bash
# Create app registration
az ad app create --display-name "github-actions-myapp"
APP_ID=$(az ad app list --display-name "github-actions-myapp" --query "[0].appId" -o tsv)

# Create service principal
az ad sp create --id $APP_ID

# Add federated credential for main branch
az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-main",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:YOUR_ORG/YOUR_REPO:ref:refs/heads/main",
  "audiences": ["api://AzureADTokenExchange"]
}'

# Add federated credential for staging environment
az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-staging",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:YOUR_ORG/YOUR_REPO:environment:staging",
  "audiences": ["api://AzureADTokenExchange"]
}'

# Add federated credential for production environment
az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-production",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:YOUR_ORG/YOUR_REPO:environment:production",
  "audiences": ["api://AzureADTokenExchange"]
}'

# Grant Contributor role on AKS resource group
az role assignment create --assignee $APP_ID \
  --role Contributor \
  --scope /subscriptions/YOUR_SUB_ID/resourceGroups/myapp-rg

# Grant AcrPush role on ACR
az role assignment create --assignee $APP_ID \
  --role AcrPush \
  --scope /subscriptions/YOUR_SUB_ID/resourceGroups/myapp-rg/providers/Microsoft.ContainerRegistry/registries/myappacr
Replace placeholders: Update YOUR_ORG/YOUR_REPO, YOUR_SUB_ID, resource group and ACR names with your actual values.

🚀 Step 3 — Create the Pipeline

Create .github/workflows/ci-cd.yml — the full pipeline in a single file:

yaml
name: Full CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  id-token: write
  contents: read
  packages: write

env:
  ACR_NAME: ${{ vars.ACR_NAME }}
  IMAGE_NAME: myapp

jobs:
  # ──────────────────────── STAGE 1: LINT ────────────────────────
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm run lint

  # ──────────────────────── STAGE 2: TEST ────────────────────────
  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm test -- --coverage
      - uses: actions/upload-artifact@v4
        with:
          name: coverage-report
          path: coverage/

  # ──────────────────── STAGE 3: BUILD & PUSH ──────────────────
  build-and-push:
    needs: test
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ github.sha }}
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - run: az acr login --name ${{ env.ACR_NAME }}

      - uses: docker/build-push-action@v5
        with:
          push: true
          tags: |
            ${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }}:${{ github.sha }}
            ${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }}:latest

  # ──────────────── STAGE 4: DEPLOY TO STAGING ─────────────────
  deploy-staging:
    needs: build-and-push
    runs-on: ubuntu-latest
    environment:
      name: staging
      url: https://staging.myapp.com
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - uses: azure/aks-set-context@v3
        with:
          resource-group: ${{ vars.AKS_RESOURCE_GROUP }}
          cluster-name: ${{ vars.AKS_CLUSTER_NAME }}

      - run: |
          helm upgrade --install myapp-staging ./charts/myapp \
            --namespace staging --create-namespace \
            --set image.repository=${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }} \
            --set image.tag=${{ github.sha }} \
            --set env=staging \
            --wait --timeout 5m

  # ──────────────── STAGE 5: SMOKE TEST ────────────────────────
  smoke-test:
    needs: deploy-staging
    runs-on: ubuntu-latest
    steps:
      - name: Health check with retries
        run: |
          for i in {1..10}; do
            STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://staging.myapp.com/health)
            if [ "$STATUS" = "200" ]; then
              echo "✅ Health check passed!"
              exit 0
            fi
            echo "Attempt $i: Status $STATUS, retrying in 10s..."
            sleep 10
          done
          echo "❌ Smoke test failed after 10 attempts!"
          exit 1

  # ──────────── STAGE 6: DEPLOY TO PRODUCTION ──────────────────
  deploy-production:
    needs: smoke-test
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://myapp.com
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - uses: azure/aks-set-context@v3
        with:
          resource-group: ${{ vars.AKS_RESOURCE_GROUP }}
          cluster-name: ${{ vars.AKS_CLUSTER_NAME }}

      - run: |
          helm upgrade --install myapp ./charts/myapp \
            --namespace production --create-namespace \
            --set image.repository=${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }} \
            --set image.tag=${{ github.sha }} \
            --set env=production \
            --set replicaCount=3 \
            --wait --timeout 5m

Pipeline Breakdown

StagePurposeDepends OnRuns On
lintCode quality — catch syntax and style errors earlyEvery push & PR
testRun unit tests, generate coverage reportlintEvery push & PR
build-and-pushBuild Docker image, push to ACRtestmain branch only
deploy-stagingHelm deploy to staging namespacebuild-and-pushmain branch only
smoke-testHit /health endpoint, verify 200deploy-stagingmain branch only
deploy-productionHelm deploy to production namespacesmoke-testmain + manual approval

⚙️ Step 4 — Configure Environments

Go to your repository → Settings → Environments.

Staging Environment

Production Environment

Why separate environment secrets? You can use different Azure credentials or even different Azure subscriptions for staging vs production. This is a security best practice — a compromised staging credential can't touch production.

🧪 Step 5 — Test the Pipeline

5a. Trigger the Pipeline

bash
# Push your code to main
git add .
git commit -m "feat: add full CI/CD pipeline"
git push origin main

5b. Watch Each Stage

Go to Actions tab in your repository. You should see:

  1. lint — runs ESLint, should pass in ~30s
  2. test — runs Jest with coverage, ~45s
  3. build-and-push — builds Docker image, pushes to ACR, ~2–3 min
  4. deploy-staging — Helm installs to staging namespace, ~1–2 min
  5. smoke-test — curls the health endpoint, ~10s–2 min
  6. deploy-production — ⏸️ waiting for approval

5c. Verify Staging

bash
# Check staging pods
kubectl get pods -n staging
# NAME                              READY   STATUS    RESTARTS   AGE
# myapp-staging-6d4f8b7c9-abc12    1/1     Running   0          2m

# Check the deployment
kubectl describe deployment myapp-staging -n staging

# Port-forward and test locally
kubectl port-forward svc/myapp-staging 8080:80 -n staging &
curl http://localhost:8080/health
# {"status":"healthy","version":"1.0.0","environment":"staging",...}

5d. Approve Production Deployment

  1. In the Actions tab, click the pending deploy-production job
  2. Click "Review deployments"
  3. Select the production environment checkbox
  4. Add an optional comment: "Staging verified, approving production"
  5. Click "Approve and deploy"

5e. Verify Production

bash
# Check production pods (should have 3 replicas)
kubectl get pods -n production
# NAME                     READY   STATUS    RESTARTS   AGE
# myapp-7f9d8c6b5-x1y2z   1/1     Running   0          1m
# myapp-7f9d8c6b5-a3b4c   1/1     Running   0          1m
# myapp-7f9d8c6b5-d5e6f   1/1     Running   0          1m

# Verify health
curl https://myapp.com/health
# {"status":"healthy","environment":"production",...}

🗺️ Pipeline Flow Diagram

text
┌─────────────────────────────────────────────────────────────────────────────┐
│                          FULL CI/CD PIPELINE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐     Every push & PR                                          │
│  │ git push │─────────────┐                                                │
│  └──────────┘             ▼                                                │
│                    ┌──────────────┐                                         │
│                    │    LINT      │  ESLint code quality                    │
│                    │  (~30 sec)   │                                         │
│                    └──────┬───────┘                                         │
│                           ▼                                                │
│                    ┌──────────────┐                                         │
│                    │    TEST      │  Jest + coverage report                 │
│                    │  (~45 sec)   │  → artifact: coverage/                  │
│                    └──────┬───────┘                                         │
│                           │                                                │
│           ┌───────────────┼────────────────────┐                           │
│           │ PR stops here │ main branch only   │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │ BUILD & PUSH │            │                           │
│           │        │  Docker→ACR  │            │                           │
│           │        │  (~2-3 min)  │            │                           │
│           │        └──────┬───────┘            │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │   DEPLOY     │            │                           │
│           │        │   STAGING    │            │                           │
│           │        │  Helm→AKS   │            │                           │
│           │        └──────┬───────┘            │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │ SMOKE TEST   │            │                           │
│           │        │ curl /health │            │                           │
│           │        │ 10 retries   │            │                           │
│           │        └──────┬───────┘            │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │  ⏸️ APPROVAL  │  Required reviewers                    │
│           │        │  (manual)    │  + 5-min wait timer                    │
│           │        └──────┬───────┘            │                           │
│           │               ▼                    │                           │
│           │        ┌──────────────┐            │                           │
│           │        │   DEPLOY     │            │                           │
│           │        │  PRODUCTION  │  3 replicas                            │
│           │        │  Helm→AKS   │                                        │
│           │        └──────────────┘            │                           │
│           └────────────────────────────────────┘                           │
└─────────────────────────────────────────────────────────────────────────────┘

🔧 Troubleshooting

Common issues at each stage and how to fix them:

Lint Stage Failures

ErrorCauseFix
ESLint: command not foundESLint not in devDependenciesRun npm install --save-dev eslint
Parsing error: Unexpected tokenMissing ESLint config or wrong parserAdd .eslintrc.json with {"env":{"node":true,"es2021":true}}
Lint passes locally but fails in CIDifferent Node version or missing .eslintrcEnsure .eslintrc.json is committed and Node version matches

Test Stage Failures

ErrorCauseFix
Cannot find module 'supertest'Missing dependencyCheck devDependencies, ensure npm ci runs first
Tests pass locally, fail in CIPort conflicts, timing issuesDon't start server in tests — use supertest(app) directly
--coverage flag not recognizedWrong test runnerEnsure Jest is configured: "test": "jest" in package.json

Build & Push Stage Failures

ErrorCauseFix
AADSTS700024: Client assertion not within valid time rangeOIDC token expired or clock driftRe-run the workflow; check federated credential config
unauthorized: authentication requiredACR login failedVerify ACR_NAME variable, check RBAC: AcrPush role assigned
denied: requested access to the resource is deniedService principal lacks push permissionRun az role assignment create --role AcrPush
Docker build fails — COPY failedBuild context wrong or file missingAdd context: . to docker/build-push-action

Deploy Staging Failures

ErrorCauseFix
Error: timed out waiting for conditionPod not becoming readyCheck kubectl describe pod, verify image pull, check health probe
Error: UPGRADE FAILED: release not foundFirst deploy uses upgrade without --installUse helm upgrade --install (already in our pipeline)
ErrImagePull / ImagePullBackOffAKS can't pull from ACRAttach ACR: az aks update -n CLUSTER -g RG --attach-acr ACR_NAME

Smoke Test Failures

ErrorCauseFix
All 10 attempts return 000DNS not configured or Ingress missingCheck Ingress resource, verify DNS A record points to load balancer IP
Returns 503Backend pods not readyIncrease --wait --timeout in deploy step, check readiness probe
Returns 502Ingress controller can't reach backendCheck Service selector matches Pod labels, verify port numbers

Production Deploy Failures

ErrorCauseFix
Job stays in "Waiting" foreverNo reviewer approvedCheck Environment settings, ensure reviewers are added
Approval granted but job failsOIDC federated credential not set for production environmentAdd federated credential with subject: ...environment:production

🏆 Challenge Extensions

You've built the core pipeline. Now level up with these challenges:

Challenge 1: Add Slack Notifications

Notify your team channel on deploy success or failure:

yaml
  notify:
    needs: [deploy-production]
    if: always()
    runs-on: ubuntu-latest
    steps:
      - uses: 8398a7/action-slack@v3
        with:
          status: ${{ job.status }}
          fields: repo,message,commit,author,action,eventName,ref,workflow
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Challenge 2: Auto-Rollback on Failed Smoke Test

If the smoke test fails, automatically roll back the staging deployment:

yaml
  rollback-staging:
    needs: smoke-test
    if: failure()
    runs-on: ubuntu-latest
    steps:
      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - uses: azure/aks-set-context@v3
        with:
          resource-group: ${{ vars.AKS_RESOURCE_GROUP }}
          cluster-name: ${{ vars.AKS_CLUSTER_NAME }}
      - run: |
          echo "🔄 Rolling back staging..."
          helm rollback myapp-staging -n staging
          echo "✅ Rollback complete"

Challenge 3: Add Canary Deployment

Deploy to a small percentage of traffic first, then shift 100% if healthy:

yaml
  canary-deploy:
    needs: smoke-test
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      # ... Azure login & AKS context ...
      - name: Deploy canary (10% traffic)
        run: |
          helm upgrade --install myapp-canary ./charts/myapp \
            --namespace production \
            --set image.tag=${{ github.sha }} \
            --set replicaCount=1 \
            --wait --timeout 3m

      - name: Monitor canary (5 minutes)
        run: |
          for i in {1..30}; do
            ERROR_RATE=$(curl -s https://myapp.com/metrics | grep error_rate | awk '{print $2}')
            if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
              echo "❌ Error rate too high: $ERROR_RATE"
              helm rollback myapp-canary -n production
              exit 1
            fi
            sleep 10
          done
          echo "✅ Canary healthy — promoting to full deploy"

      - name: Promote to full production
        run: |
          helm upgrade --install myapp ./charts/myapp \
            --namespace production \
            --set image.tag=${{ github.sha }} \
            --set replicaCount=3 \
            --wait --timeout 5m
          helm uninstall myapp-canary -n production

💬 Interview Questions

Q1. Walk me through how you'd design a CI/CD pipeline for a microservice deployed to Kubernetes.

A: I'd design a multi-stage pipeline: (1) Lint & static analysis for fast feedback on code quality, (2) Unit & integration tests with coverage thresholds, (3) Docker build & push to a container registry with tags based on commit SHA for traceability, (4) Deploy to staging using Helm with --wait to ensure pods are healthy, (5) Automated smoke/integration tests against staging, (6) Manual approval gate for production, (7) Production deploy with Helm, and optionally (8) Post-deploy verification. Each stage depends on the previous one succeeding. The pipeline uses OIDC for Azure auth (no stored secrets), GitHub Environments for protection rules, and artifacts for passing data between jobs.

Q2. Why separate lint and test into different jobs instead of one job?

A: Three reasons: (1) Fast fail — lint catches syntax errors in seconds without waiting for a full test suite, (2) Parallelism opportunity — in more complex pipelines, independent checks can run in parallel, and (3) Clear feedback — developers see exactly which stage failed. A lint failure means "fix your code style," while a test failure means "fix your logic." One combined job would give ambiguous feedback.

Q3. What is OIDC authentication and why is it preferred over service principal secrets?

A: OIDC (OpenID Connect) lets GitHub Actions request a short-lived token from Azure AD without storing any secrets. The workflow presents a JWT to Azure, which validates it against the configured federated credential (checking the repo, branch, and environment). Benefits: (1) No secret rotation — there's no password to expire or rotate, (2) Scoped access — tokens are valid only for the specific repo/branch/environment, (3) Audit trail — every token request is logged in Azure AD, (4) No leakage risk — there's nothing stored that can be exposed in logs or compromised repos.

Q4. How does helm upgrade --install differ from helm install?

A: helm install creates a new release and fails if it already exists. helm upgrade --install upgrades an existing release or creates it if it doesn't exist. This makes it idempotent — safe to run repeatedly in CI/CD without checking whether it's the first deploy or the 100th. The --wait flag ensures Helm waits for all pods to be Ready before reporting success, which is critical for pipeline reliability.

Q5. A deployment to staging succeeds but the smoke test fails with 503 errors. How do you debug?

A: Step-by-step: (1) Check if pods are actually Running and Ready: kubectl get pods -n staging, (2) Check the readiness probe — if the probe fails, the Service won't route traffic, (3) Check the Service: kubectl describe svc -n staging — are the endpoints populated?, (4) Check the Ingress: kubectl describe ingress -n staging — is the backend correctly mapped?, (5) Check the Ingress Controller logs for upstream errors, (6) Port-forward directly to the pod: kubectl port-forward pod/xxx 3000:3000 — does the app respond? If yes, the issue is in Service/Ingress networking, not the app.

Q6. How do you handle database migrations in a CI/CD pipeline?

A: Migrations should be a separate step before the application deploy. Options: (1) A Helm pre-upgrade hook that runs a migration Job, (2) A dedicated pipeline job between build and deploy that applies migrations, (3) An init container in the deployment that runs migrations before the app starts. Key rules: migrations must be backward-compatible (additive only), never rename/drop columns in the same release as code changes, and always support rollback by making each migration reversible.

Q7. What happens if the production deploy fails halfway? How do you recover?

A: Helm's --wait flag ensures that if the new version's pods don't become Ready within the timeout, the upgrade is marked as failed. Recovery: (1) Automatic: Run helm rollback myapp -n production to revert to the last successful revision, (2) Pipeline-level: Add a rollback job triggered by if: failure() in the deploy job, (3) Kubernetes-level: Kubernetes keeps the old ReplicaSet — you can also use kubectl rollout undo deployment/myapp. Prevention: use blue-green or canary deployments so the old version keeps serving while the new one is verified.

Q8. How do you prevent two developers from deploying to production simultaneously?

A: Use GitHub Actions concurrency control: concurrency: { group: deploy-production, cancel-in-progress: false }. This ensures only one deploy-production job runs at a time. Setting cancel-in-progress: false is critical for deploys — you don't want to cancel a running deployment. Instead, the second deploy queues until the first finishes. Additionally, the environment protection rules (required reviewers) act as a human gate — only one deployment can be approved and proceed at a time.

Q9. Why use the commit SHA as the image tag instead of latest?

A: (1) Traceability — every running container can be traced back to the exact commit that produced it, (2) Immutability — the tag abc123 always points to the same image; latest is mutable and can be overwritten, (3) Rollback precision — to roll back, deploy the previous commit's SHA tag, (4) Cache invalidation — Kubernetes detects tag changes and pulls the new image; redeploying latest may serve a cached old image. We still also tag latest for convenience, but deployments always reference the SHA.

Q10. How would you add a security scanning stage to this pipeline?

A: Insert a security scan job after the build step: (1) Container scan: Use aquasecurity/trivy-action@master to scan the Docker image for CVEs before pushing, (2) SAST: Add CodeQL or Semgrep in the lint stage to catch security bugs in source code, (3) Dependency audit: Add npm audit --audit-level=high after npm ci, (4) Secret scanning: Enable GitHub's built-in secret scanning and add trufflesecurity/trufflehog in CI. Gate the pipeline — if any critical vulnerability is found, block the deploy. For non-critical findings, report to a dashboard but don't block.

📝 Summary

← Back to GitHub Actions Course