Build a Full Pipeline
End-to-end hands-on lab: lint → test → build Docker → push ACR → Helm deploy to AKS → smoke test — a production-grade CI/CD pipeline from scratch.
🧒 Simple Explanation (ELI5)
You're the head chef opening a brand-new restaurant. This lab is creating the entire operation from scratch:
- Ingredient quality check (lint) — inspect every ingredient before it enters the kitchen. Rotten lettuce? Rejected immediately.
- Cooking (build) — combine the ingredients into a dish following the recipe exactly.
- Taste test (test) — the sous chef tastes every dish before it leaves the kitchen. Bad flavor? Back to the stove.
- Plating (Docker) — package the dish beautifully on a plate so it looks and works the same every single time.
- Sending to the dining room (AKS deploy) — the waiter carries the plated dish to the customer's table (your live cluster).
- Checking the customer is happy (smoke test) — the manager walks by the table: "Is everything okay?" If yes, success. If not, pull the dish and fix it.
By the end of this lab, you'll have an automated restaurant — every code push triggers the entire operation with zero human intervention until the final production approval.
📋 Lab Overview
| Architecture | GitHub → Actions → Docker → ACR → Helm → AKS |
| Time estimate | 45–60 minutes |
| Difficulty | Intermediate to Advanced |
| Pipeline stages | Lint → Test → Build & Push → Deploy Staging → Smoke Test → Deploy Production |
Prerequisites
- GitHub account with a repository
- Azure subscription (free tier works)
- Azure CLI installed locally (
az --version) - Basic YAML knowledge (covered in Lesson 3)
- Docker basics (building images, Dockerfiles)
- Helm basics (charts, values — covered in Lessons 9–10)
⌨️ Step 1 — Project Setup
Create a simple Express application with a health endpoint, a Dockerfile, tests, and a Helm chart. This gives us everything we need for a full pipeline.
1a. Application Code
package.json
{
"name": "myapp",
"version": "1.0.0",
"description": "Demo app for full CI/CD pipeline",
"main": "app.js",
"scripts": {
"start": "node app.js",
"test": "jest",
"lint": "eslint ."
},
"dependencies": {
"express": "^4.18.2"
},
"devDependencies": {
"eslint": "^8.56.0",
"jest": "^29.7.0",
"supertest": "^6.3.3"
}
}
app.js
const express = require('express');
const app = express();
const PORT = process.env.PORT || 3000;
// Health endpoint — used by smoke tests and K8s probes
app.get('/health', (req, res) => {
res.status(200).json({
status: 'healthy',
version: process.env.APP_VERSION || '1.0.0',
environment: process.env.NODE_ENV || 'development',
timestamp: new Date().toISOString()
});
});
app.get('/', (req, res) => {
res.json({ message: 'Hello from myapp!' });
});
// Only start the server if not in test mode
if (process.env.NODE_ENV !== 'test') {
app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
});
}
module.exports = app;
app.test.js
const request = require('supertest');
const app = require('./app');
describe('GET /health', () => {
it('should return 200 and healthy status', async () => {
const res = await request(app).get('/health');
expect(res.statusCode).toBe(200);
expect(res.body.status).toBe('healthy');
});
});
describe('GET /', () => {
it('should return welcome message', async () => {
const res = await request(app).get('/');
expect(res.statusCode).toBe(200);
expect(res.body.message).toBeDefined();
});
});
1b. Dockerfile
# --- Build stage --- FROM node:20-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production # --- Production stage --- FROM node:20-alpine RUN addgroup -S appgroup && adduser -S appuser -G appgroup WORKDIR /app COPY --from=builder /app/node_modules ./node_modules COPY app.js ./ ENV NODE_ENV=production USER appuser EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=3s --retries=3 \ CMD wget -qO- http://localhost:3000/health || exit 1 CMD ["node", "app.js"]
1c. Helm Chart
Create the chart directory structure:
charts/myapp/
├── Chart.yaml
├── values.yaml
└── templates/
├── deployment.yaml
├── service.yaml
└── ingress.yaml
Chart.yaml
apiVersion: v2 name: myapp description: A Helm chart for myapp CI/CD demo type: application version: 0.1.0 appVersion: "1.0.0"
values.yaml
replicaCount: 2
image:
repository: myacr.azurecr.io/myapp
tag: latest
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 80
targetPort: 3000
ingress:
enabled: true
className: nginx
host: myapp.example.com
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
env: production
templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}
labels:
app: {{ .Release.Name }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ .Release.Name }}
template:
metadata:
labels:
app: {{ .Release.Name }}
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- containerPort: {{ .Values.service.targetPort }}
env:
- name: NODE_ENV
value: {{ .Values.env | quote }}
livenessProbe:
httpGet:
path: /health
port: {{ .Values.service.targetPort }}
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet:
path: /health
port: {{ .Values.service.targetPort }}
initialDelaySeconds: 5
periodSeconds: 10
resources:
{{- toYaml .Values.resources | nindent 12 }}
templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}
spec:
type: {{ .Values.service.type }}
selector:
app: {{ .Release.Name }}
ports:
- port: {{ .Values.service.port }}
targetPort: {{ .Values.service.targetPort }}
protocol: TCP
templates/ingress.yaml
{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ .Release.Name }}
spec:
ingressClassName: {{ .Values.ingress.className }}
rules:
- host: {{ .Values.ingress.host }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {{ .Release.Name }}
port:
number: {{ .Values.service.port }}
{{- end }}
🔐 Step 2 — Configure Secrets & Variables
Go to your GitHub repository → Settings → Secrets and variables → Actions.
Required Secrets (OIDC — no passwords stored!)
| Secret Name | Description | Example |
|---|---|---|
AZURE_CLIENT_ID | App registration client ID | 12345678-abcd-... |
AZURE_TENANT_ID | Azure AD tenant ID | abcdef12-3456-... |
AZURE_SUBSCRIPTION_ID | Azure subscription ID | aaaabbbb-cccc-... |
Required Variables
| Variable Name | Description | Example |
|---|---|---|
ACR_NAME | Azure Container Registry name | myappacr |
AKS_RESOURCE_GROUP | AKS cluster resource group | myapp-rg |
AKS_CLUSTER_NAME | AKS cluster name | myapp-aks |
Set Up OIDC Federated Credential
Instead of storing Azure passwords, we use OpenID Connect. Create a federated credential on your Azure App Registration:
# Create app registration
az ad app create --display-name "github-actions-myapp"
APP_ID=$(az ad app list --display-name "github-actions-myapp" --query "[0].appId" -o tsv)
# Create service principal
az ad sp create --id $APP_ID
# Add federated credential for main branch
az ad app federated-credential create --id $APP_ID --parameters '{
"name": "github-main",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:YOUR_ORG/YOUR_REPO:ref:refs/heads/main",
"audiences": ["api://AzureADTokenExchange"]
}'
# Add federated credential for staging environment
az ad app federated-credential create --id $APP_ID --parameters '{
"name": "github-staging",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:YOUR_ORG/YOUR_REPO:environment:staging",
"audiences": ["api://AzureADTokenExchange"]
}'
# Add federated credential for production environment
az ad app federated-credential create --id $APP_ID --parameters '{
"name": "github-production",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:YOUR_ORG/YOUR_REPO:environment:production",
"audiences": ["api://AzureADTokenExchange"]
}'
# Grant Contributor role on AKS resource group
az role assignment create --assignee $APP_ID \
--role Contributor \
--scope /subscriptions/YOUR_SUB_ID/resourceGroups/myapp-rg
# Grant AcrPush role on ACR
az role assignment create --assignee $APP_ID \
--role AcrPush \
--scope /subscriptions/YOUR_SUB_ID/resourceGroups/myapp-rg/providers/Microsoft.ContainerRegistry/registries/myappacr
YOUR_ORG/YOUR_REPO, YOUR_SUB_ID, resource group and ACR names with your actual values.
🚀 Step 3 — Create the Pipeline
Create .github/workflows/ci-cd.yml — the full pipeline in a single file:
name: Full CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
permissions:
id-token: write
contents: read
packages: write
env:
ACR_NAME: ${{ vars.ACR_NAME }}
IMAGE_NAME: myapp
jobs:
# ──────────────────────── STAGE 1: LINT ────────────────────────
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm run lint
# ──────────────────────── STAGE 2: TEST ────────────────────────
test:
needs: lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm test -- --coverage
- uses: actions/upload-artifact@v4
with:
name: coverage-report
path: coverage/
# ──────────────────── STAGE 3: BUILD & PUSH ──────────────────
build-and-push:
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
outputs:
image-tag: ${{ github.sha }}
steps:
- uses: actions/checkout@v4
- uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- run: az acr login --name ${{ env.ACR_NAME }}
- uses: docker/build-push-action@v5
with:
push: true
tags: |
${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }}:${{ github.sha }}
${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }}:latest
# ──────────────── STAGE 4: DEPLOY TO STAGING ─────────────────
deploy-staging:
needs: build-and-push
runs-on: ubuntu-latest
environment:
name: staging
url: https://staging.myapp.com
steps:
- uses: actions/checkout@v4
- uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- uses: azure/aks-set-context@v3
with:
resource-group: ${{ vars.AKS_RESOURCE_GROUP }}
cluster-name: ${{ vars.AKS_CLUSTER_NAME }}
- run: |
helm upgrade --install myapp-staging ./charts/myapp \
--namespace staging --create-namespace \
--set image.repository=${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }} \
--set image.tag=${{ github.sha }} \
--set env=staging \
--wait --timeout 5m
# ──────────────── STAGE 5: SMOKE TEST ────────────────────────
smoke-test:
needs: deploy-staging
runs-on: ubuntu-latest
steps:
- name: Health check with retries
run: |
for i in {1..10}; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://staging.myapp.com/health)
if [ "$STATUS" = "200" ]; then
echo "✅ Health check passed!"
exit 0
fi
echo "Attempt $i: Status $STATUS, retrying in 10s..."
sleep 10
done
echo "❌ Smoke test failed after 10 attempts!"
exit 1
# ──────────── STAGE 6: DEPLOY TO PRODUCTION ──────────────────
deploy-production:
needs: smoke-test
runs-on: ubuntu-latest
environment:
name: production
url: https://myapp.com
steps:
- uses: actions/checkout@v4
- uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- uses: azure/aks-set-context@v3
with:
resource-group: ${{ vars.AKS_RESOURCE_GROUP }}
cluster-name: ${{ vars.AKS_CLUSTER_NAME }}
- run: |
helm upgrade --install myapp ./charts/myapp \
--namespace production --create-namespace \
--set image.repository=${{ env.ACR_NAME }}.azurecr.io/${{ env.IMAGE_NAME }} \
--set image.tag=${{ github.sha }} \
--set env=production \
--set replicaCount=3 \
--wait --timeout 5m
Pipeline Breakdown
| Stage | Purpose | Depends On | Runs On |
|---|---|---|---|
| lint | Code quality — catch syntax and style errors early | — | Every push & PR |
| test | Run unit tests, generate coverage report | lint | Every push & PR |
| build-and-push | Build Docker image, push to ACR | test | main branch only |
| deploy-staging | Helm deploy to staging namespace | build-and-push | main branch only |
| smoke-test | Hit /health endpoint, verify 200 | deploy-staging | main branch only |
| deploy-production | Helm deploy to production namespace | smoke-test | main + manual approval |
⚙️ Step 4 — Configure Environments
Go to your repository → Settings → Environments.
Staging Environment
- Name:
staging - No protection rules — deploys automatically after build
- Add the
AZURE_*secrets scoped to this environment
Production Environment
- Name:
production - Required reviewers: Add 1–2 team leads who must approve before deploy
- Wait timer: 5 minutes — gives reviewers time to check staging
- Deployment branches: Restrict to
mainbranch only - Add the
AZURE_*secrets scoped to this environment
🧪 Step 5 — Test the Pipeline
5a. Trigger the Pipeline
# Push your code to main git add . git commit -m "feat: add full CI/CD pipeline" git push origin main
5b. Watch Each Stage
Go to Actions tab in your repository. You should see:
- lint — runs ESLint, should pass in ~30s
- test — runs Jest with coverage, ~45s
- build-and-push — builds Docker image, pushes to ACR, ~2–3 min
- deploy-staging — Helm installs to staging namespace, ~1–2 min
- smoke-test — curls the health endpoint, ~10s–2 min
- deploy-production — ⏸️ waiting for approval
5c. Verify Staging
# Check staging pods
kubectl get pods -n staging
# NAME READY STATUS RESTARTS AGE
# myapp-staging-6d4f8b7c9-abc12 1/1 Running 0 2m
# Check the deployment
kubectl describe deployment myapp-staging -n staging
# Port-forward and test locally
kubectl port-forward svc/myapp-staging 8080:80 -n staging &
curl http://localhost:8080/health
# {"status":"healthy","version":"1.0.0","environment":"staging",...}
5d. Approve Production Deployment
- In the Actions tab, click the pending deploy-production job
- Click "Review deployments"
- Select the production environment checkbox
- Add an optional comment: "Staging verified, approving production"
- Click "Approve and deploy"
5e. Verify Production
# Check production pods (should have 3 replicas)
kubectl get pods -n production
# NAME READY STATUS RESTARTS AGE
# myapp-7f9d8c6b5-x1y2z 1/1 Running 0 1m
# myapp-7f9d8c6b5-a3b4c 1/1 Running 0 1m
# myapp-7f9d8c6b5-d5e6f 1/1 Running 0 1m
# Verify health
curl https://myapp.com/health
# {"status":"healthy","environment":"production",...}
🗺️ Pipeline Flow Diagram
┌─────────────────────────────────────────────────────────────────────────────┐ │ FULL CI/CD PIPELINE │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ Every push & PR │ │ │ git push │─────────────┐ │ │ └──────────┘ ▼ │ │ ┌──────────────┐ │ │ │ LINT │ ESLint code quality │ │ │ (~30 sec) │ │ │ └──────┬───────┘ │ │ ▼ │ │ ┌──────────────┐ │ │ │ TEST │ Jest + coverage report │ │ │ (~45 sec) │ → artifact: coverage/ │ │ └──────┬───────┘ │ │ │ │ │ ┌───────────────┼────────────────────┐ │ │ │ PR stops here │ main branch only │ │ │ │ ▼ │ │ │ │ ┌──────────────┐ │ │ │ │ │ BUILD & PUSH │ │ │ │ │ │ Docker→ACR │ │ │ │ │ │ (~2-3 min) │ │ │ │ │ └──────┬───────┘ │ │ │ │ ▼ │ │ │ │ ┌──────────────┐ │ │ │ │ │ DEPLOY │ │ │ │ │ │ STAGING │ │ │ │ │ │ Helm→AKS │ │ │ │ │ └──────┬───────┘ │ │ │ │ ▼ │ │ │ │ ┌──────────────┐ │ │ │ │ │ SMOKE TEST │ │ │ │ │ │ curl /health │ │ │ │ │ │ 10 retries │ │ │ │ │ └──────┬───────┘ │ │ │ │ ▼ │ │ │ │ ┌──────────────┐ │ │ │ │ │ ⏸️ APPROVAL │ Required reviewers │ │ │ │ (manual) │ + 5-min wait timer │ │ │ └──────┬───────┘ │ │ │ │ ▼ │ │ │ │ ┌──────────────┐ │ │ │ │ │ DEPLOY │ │ │ │ │ │ PRODUCTION │ 3 replicas │ │ │ │ Helm→AKS │ │ │ │ └──────────────┘ │ │ │ └────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘
🔧 Troubleshooting
Common issues at each stage and how to fix them:
Lint Stage Failures
| Error | Cause | Fix |
|---|---|---|
ESLint: command not found | ESLint not in devDependencies | Run npm install --save-dev eslint |
Parsing error: Unexpected token | Missing ESLint config or wrong parser | Add .eslintrc.json with {"env":{"node":true,"es2021":true}} |
| Lint passes locally but fails in CI | Different Node version or missing .eslintrc | Ensure .eslintrc.json is committed and Node version matches |
Test Stage Failures
| Error | Cause | Fix |
|---|---|---|
Cannot find module 'supertest' | Missing dependency | Check devDependencies, ensure npm ci runs first |
| Tests pass locally, fail in CI | Port conflicts, timing issues | Don't start server in tests — use supertest(app) directly |
--coverage flag not recognized | Wrong test runner | Ensure Jest is configured: "test": "jest" in package.json |
Build & Push Stage Failures
| Error | Cause | Fix |
|---|---|---|
AADSTS700024: Client assertion not within valid time range | OIDC token expired or clock drift | Re-run the workflow; check federated credential config |
unauthorized: authentication required | ACR login failed | Verify ACR_NAME variable, check RBAC: AcrPush role assigned |
denied: requested access to the resource is denied | Service principal lacks push permission | Run az role assignment create --role AcrPush |
Docker build fails — COPY failed | Build context wrong or file missing | Add context: . to docker/build-push-action |
Deploy Staging Failures
| Error | Cause | Fix |
|---|---|---|
Error: timed out waiting for condition | Pod not becoming ready | Check kubectl describe pod, verify image pull, check health probe |
Error: UPGRADE FAILED: release not found | First deploy uses upgrade without --install | Use helm upgrade --install (already in our pipeline) |
ErrImagePull / ImagePullBackOff | AKS can't pull from ACR | Attach ACR: az aks update -n CLUSTER -g RG --attach-acr ACR_NAME |
Smoke Test Failures
| Error | Cause | Fix |
|---|---|---|
All 10 attempts return 000 | DNS not configured or Ingress missing | Check Ingress resource, verify DNS A record points to load balancer IP |
Returns 503 | Backend pods not ready | Increase --wait --timeout in deploy step, check readiness probe |
Returns 502 | Ingress controller can't reach backend | Check Service selector matches Pod labels, verify port numbers |
Production Deploy Failures
| Error | Cause | Fix |
|---|---|---|
| Job stays in "Waiting" forever | No reviewer approved | Check Environment settings, ensure reviewers are added |
| Approval granted but job fails | OIDC federated credential not set for production environment | Add federated credential with subject: ...environment:production |
🏆 Challenge Extensions
You've built the core pipeline. Now level up with these challenges:
Challenge 1: Add Slack Notifications
Notify your team channel on deploy success or failure:
notify:
needs: [deploy-production]
if: always()
runs-on: ubuntu-latest
steps:
- uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
fields: repo,message,commit,author,action,eventName,ref,workflow
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
Challenge 2: Auto-Rollback on Failed Smoke Test
If the smoke test fails, automatically roll back the staging deployment:
rollback-staging:
needs: smoke-test
if: failure()
runs-on: ubuntu-latest
steps:
- uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- uses: azure/aks-set-context@v3
with:
resource-group: ${{ vars.AKS_RESOURCE_GROUP }}
cluster-name: ${{ vars.AKS_CLUSTER_NAME }}
- run: |
echo "🔄 Rolling back staging..."
helm rollback myapp-staging -n staging
echo "✅ Rollback complete"
Challenge 3: Add Canary Deployment
Deploy to a small percentage of traffic first, then shift 100% if healthy:
canary-deploy:
needs: smoke-test
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
# ... Azure login & AKS context ...
- name: Deploy canary (10% traffic)
run: |
helm upgrade --install myapp-canary ./charts/myapp \
--namespace production \
--set image.tag=${{ github.sha }} \
--set replicaCount=1 \
--wait --timeout 3m
- name: Monitor canary (5 minutes)
run: |
for i in {1..30}; do
ERROR_RATE=$(curl -s https://myapp.com/metrics | grep error_rate | awk '{print $2}')
if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
echo "❌ Error rate too high: $ERROR_RATE"
helm rollback myapp-canary -n production
exit 1
fi
sleep 10
done
echo "✅ Canary healthy — promoting to full deploy"
- name: Promote to full production
run: |
helm upgrade --install myapp ./charts/myapp \
--namespace production \
--set image.tag=${{ github.sha }} \
--set replicaCount=3 \
--wait --timeout 5m
helm uninstall myapp-canary -n production
💬 Interview Questions
Q1. Walk me through how you'd design a CI/CD pipeline for a microservice deployed to Kubernetes.
A: I'd design a multi-stage pipeline: (1) Lint & static analysis for fast feedback on code quality, (2) Unit & integration tests with coverage thresholds, (3) Docker build & push to a container registry with tags based on commit SHA for traceability, (4) Deploy to staging using Helm with --wait to ensure pods are healthy, (5) Automated smoke/integration tests against staging, (6) Manual approval gate for production, (7) Production deploy with Helm, and optionally (8) Post-deploy verification. Each stage depends on the previous one succeeding. The pipeline uses OIDC for Azure auth (no stored secrets), GitHub Environments for protection rules, and artifacts for passing data between jobs.
Q2. Why separate lint and test into different jobs instead of one job?
A: Three reasons: (1) Fast fail — lint catches syntax errors in seconds without waiting for a full test suite, (2) Parallelism opportunity — in more complex pipelines, independent checks can run in parallel, and (3) Clear feedback — developers see exactly which stage failed. A lint failure means "fix your code style," while a test failure means "fix your logic." One combined job would give ambiguous feedback.
Q3. What is OIDC authentication and why is it preferred over service principal secrets?
A: OIDC (OpenID Connect) lets GitHub Actions request a short-lived token from Azure AD without storing any secrets. The workflow presents a JWT to Azure, which validates it against the configured federated credential (checking the repo, branch, and environment). Benefits: (1) No secret rotation — there's no password to expire or rotate, (2) Scoped access — tokens are valid only for the specific repo/branch/environment, (3) Audit trail — every token request is logged in Azure AD, (4) No leakage risk — there's nothing stored that can be exposed in logs or compromised repos.
Q4. How does helm upgrade --install differ from helm install?
A: helm install creates a new release and fails if it already exists. helm upgrade --install upgrades an existing release or creates it if it doesn't exist. This makes it idempotent — safe to run repeatedly in CI/CD without checking whether it's the first deploy or the 100th. The --wait flag ensures Helm waits for all pods to be Ready before reporting success, which is critical for pipeline reliability.
Q5. A deployment to staging succeeds but the smoke test fails with 503 errors. How do you debug?
A: Step-by-step: (1) Check if pods are actually Running and Ready: kubectl get pods -n staging, (2) Check the readiness probe — if the probe fails, the Service won't route traffic, (3) Check the Service: kubectl describe svc -n staging — are the endpoints populated?, (4) Check the Ingress: kubectl describe ingress -n staging — is the backend correctly mapped?, (5) Check the Ingress Controller logs for upstream errors, (6) Port-forward directly to the pod: kubectl port-forward pod/xxx 3000:3000 — does the app respond? If yes, the issue is in Service/Ingress networking, not the app.
Q6. How do you handle database migrations in a CI/CD pipeline?
A: Migrations should be a separate step before the application deploy. Options: (1) A Helm pre-upgrade hook that runs a migration Job, (2) A dedicated pipeline job between build and deploy that applies migrations, (3) An init container in the deployment that runs migrations before the app starts. Key rules: migrations must be backward-compatible (additive only), never rename/drop columns in the same release as code changes, and always support rollback by making each migration reversible.
Q7. What happens if the production deploy fails halfway? How do you recover?
A: Helm's --wait flag ensures that if the new version's pods don't become Ready within the timeout, the upgrade is marked as failed. Recovery: (1) Automatic: Run helm rollback myapp -n production to revert to the last successful revision, (2) Pipeline-level: Add a rollback job triggered by if: failure() in the deploy job, (3) Kubernetes-level: Kubernetes keeps the old ReplicaSet — you can also use kubectl rollout undo deployment/myapp. Prevention: use blue-green or canary deployments so the old version keeps serving while the new one is verified.
Q8. How do you prevent two developers from deploying to production simultaneously?
A: Use GitHub Actions concurrency control: concurrency: { group: deploy-production, cancel-in-progress: false }. This ensures only one deploy-production job runs at a time. Setting cancel-in-progress: false is critical for deploys — you don't want to cancel a running deployment. Instead, the second deploy queues until the first finishes. Additionally, the environment protection rules (required reviewers) act as a human gate — only one deployment can be approved and proceed at a time.
Q9. Why use the commit SHA as the image tag instead of latest?
A: (1) Traceability — every running container can be traced back to the exact commit that produced it, (2) Immutability — the tag abc123 always points to the same image; latest is mutable and can be overwritten, (3) Rollback precision — to roll back, deploy the previous commit's SHA tag, (4) Cache invalidation — Kubernetes detects tag changes and pulls the new image; redeploying latest may serve a cached old image. We still also tag latest for convenience, but deployments always reference the SHA.
Q10. How would you add a security scanning stage to this pipeline?
A: Insert a security scan job after the build step: (1) Container scan: Use aquasecurity/trivy-action@master to scan the Docker image for CVEs before pushing, (2) SAST: Add CodeQL or Semgrep in the lint stage to catch security bugs in source code, (3) Dependency audit: Add npm audit --audit-level=high after npm ci, (4) Secret scanning: Enable GitHub's built-in secret scanning and add trufflesecurity/trufflehog in CI. Gate the pipeline — if any critical vulnerability is found, block the deploy. For non-critical findings, report to a dashboard but don't block.
📝 Summary
- Project setup: Express app + Dockerfile + Helm chart — the three pillars of a K8s-deployed microservice
- OIDC auth: Federated credentials eliminate stored secrets. Configure separate credentials for each environment
- Pipeline stages: Lint → Test → Build & Push → Deploy Staging → Smoke Test → Deploy Production. Each stage gates the next
- Environments: Staging auto-deploys; Production requires manual approval with wait timer and branch restrictions
- Smoke testing: Automated health check with retries catches broken deploys before they reach production
- Image tagging: Use commit SHA for traceability and immutability;
latestas a convenience alias only - Troubleshooting: Work backwards from the failing stage — check logs, describe resources, port-forward to isolate
- Extensions: Slack notifications, auto-rollback, canary deployments — build on the core pipeline incrementally