Hands-on Lesson 14 of 14

Interview Preparation

50+ curated interview questions covering CI/CD fundamentals, GitHub Actions architecture, YAML pipelines, deployment strategies, security, and real-world troubleshooting — with detailed answers.

💡

How to Use This Page

Questions are grouped by category and difficulty.
Each question has a collapsible detailed answer — try answering out loud before expanding.
⭐ marks questions that come up in 80%+ of interviews.
"Scenario" questions simulate real whiteboard / design sessions.

🔰 Category 1 — CI/CD Fundamentals

Q1 ⭐ What is CI/CD and why is it important?

CI (Continuous Integration) — every developer commit triggers an automated build and test cycle. The goal is to catch bugs early and provide fast feedback. If a test fails, the team knows within minutes, not days.

CD (Continuous Delivery / Continuous Deployment) — the pipeline automatically packages and delivers the application to staging or production, reducing manual errors and shipping faster.

Key distinction:

Continuous Integration: merge → build → test (automated)
Continuous Delivery: …→ deploy to staging (automated), deploy to production (manual gate)
Continuous Deployment: …→ deploy to staging AND production (fully automated, no human approval)

Why it matters: Without CI/CD, teams accumulate "integration debt" — the longer you wait to merge, the more painful it becomes. CI/CD converts that big-bang integration into many small, safe, reversible changes.

Q2 ⭐ Explain the difference between Continuous Delivery and Continuous Deployment.

Continuous Delivery: The pipeline automates everything up to staging. Production deployments require a manual approval gate — a human clicks "approve" before the code goes live. This is common in regulated industries (finance, healthcare).

Continuous Deployment: Fully automated — every commit that passes all tests goes straight to production with no human approval. This requires high confidence in your test suite and monitoring systems.

Interview tip: Most companies practice Continuous Delivery, not Deployment. Saying "we use CD with manual production gates and environment protection rules" shows pragmatism.

Q3: What are the benefits of CI/CD for a development team?

Faster feedback loops — developers know within minutes if their change broke something.
Reduced integration risk — small, frequent merges are far less risky than large, infrequent ones.
Repeatable deployments — the same pipeline runs every time, eliminating "it works on my machine" problems.
Audit trail — every deployment is linked to a commit, a build, and a set of test results.
Developer confidence — teams ship more often when they trust the pipeline to catch mistakes.
Cost reduction — catching bugs in CI is 10–100× cheaper than finding them in production.

Q4: Describe a typical CI/CD pipeline for a microservice.

A standard pipeline for a containerized microservice:

text

Lint → Unit Test → Build Docker Image → Push to Registry
  → Deploy to Staging → Integration Tests → Manual Approval
    → Deploy to Production → Smoke Test

In GitHub Actions terms:

Lint job — ESLint, Pylint, or golangci-lint to catch style and error issues.
Test job — run unit tests with coverage thresholds.
Build & Push job — docker/build-push-action with image tagged as the commit SHA.
Deploy staging job — helm upgrade --install in a staging namespace.
Integration test job — hit staging endpoints and verify responses.
Deploy production job — requires environment: production with reviewer approval.
Smoke test job — verify health endpoint returns 200 in production.

Q5: What is a blue-green deployment?

Two identical environments run in parallel:

Blue = current production (serving live traffic).
Green = new version (deployed, tested, but not yet receiving traffic).

Once green passes health checks, the load balancer switches 100% of traffic from blue to green instantly. If something goes wrong, switch back to blue — instant rollback with zero downtime.

Trade-off: Requires double the infrastructure during the switch, so it's more expensive but safer.

Q6: What is a canary deployment?

Route a small percentage (5–10%) of live traffic to the new version while 90–95% continues hitting the old version. Monitor error rates, latency, and resource usage for a defined period.

If metrics look good → gradually increase traffic to the new version (25% → 50% → 100%).
If metrics degrade → immediately route all traffic back to the old version (rollback).

Canary deployments are less risky than blue-green because only a fraction of users are affected if something goes wrong. Often implemented via Ingress weights, service mesh (Istio), or Flagger.

Q7 ⭐ How do you handle database migrations in CI/CD?

Rule #1: Never run destructive migrations automatically.

Best practices:

Run migrations as a separate step before application deployment.
Use migration tools like Flyway, Liquibase, or language-specific tools (Alembic for Python, TypeORM for Node.js).
Make migrations backward-compatible — the old app version should still work after the migration runs (expand-and-contract pattern).
Separate schema migration from data migration — schema changes are fast, data backfills are slow.
Add a manual gate before destructive operations (dropping columns, truncating tables).

yaml

jobs:
  migrate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: flyway -url=${{ secrets.DB_URL }} migrate
  deploy:
    needs: migrate    # Deploy only after migration succeeds
    runs-on: ubuntu-latest
    steps:
      - run: helm upgrade myapp ./chart

Q8: What is GitOps and how does it relate to CI/CD?

GitOps = Git as the single source of truth for both infrastructure and application state.

CI pushes changes to a Git repository (e.g., updates an image tag in a Helm values file).
CD is handled by a GitOps operator (ArgoCD, Flux) running inside the cluster. The operator watches Git and reconciles cluster state to match the desired state in Git.

Key difference from traditional CI/CD: the pipeline doesn't directly kubectl apply or helm upgrade. Instead, it commits to Git, and the in-cluster operator pulls and applies. This gives you: audit trail (every change is a Git commit), drift detection (operator alerts if someone changes the cluster manually), and declarative state management.

Q9: What's the difference between a build artifact and a container image?

Build artifact: The compiled output of your build process — a JAR file, a Go binary, a webpack bundle, or an npm package. It's portable but needs a compatible runtime environment to execute.

Container image: An artifact + its runtime + OS-level dependencies, packaged as an OCI image (layers). It's a deployable unit — it runs identically on any machine with a container runtime.

Analogy: An artifact is the recipe. A container image is the meal-prep kit — recipe, ingredients, and utensils, all in one box. You just heat and serve.

Q10 ⭐ How do you implement rollback in a CI/CD pipeline?

Multiple strategies, often combined:

Helm rollback: helm rollback myapp <previous-revision> — restores the previous Helm release.
Re-deploy previous image: Trigger the pipeline with the previous commit SHA. Since images are tagged with SHA, this deploys the known-good version.
Git revert: git revert <bad-commit> and push — the pipeline deploys the reverted code automatically.
Blue-green switch: Route traffic back to the previous environment instantly.

Critical practice: Always tag images with the commit SHA, not latest. This gives you full traceability — you can always map a running container back to the exact code that produced it.

⚙️ Category 2 — GitHub Actions Architecture

Q11 ⭐ Explain the relationship between events, workflows, jobs, and steps in GitHub Actions.

The full hierarchy:

text

Event (push, PR, schedule, workflow_dispatch)
  └── Workflow (.github/workflows/ci.yml)
        └── Job (runs on a specific runner)
              └── Step (a single unit: action or shell command)

Event: The trigger — a push, a pull request, a cron schedule, or a manual dispatch.
Workflow: A YAML file in .github/workflows/ that responds to one or more events.
Job: A set of steps that run on the same runner. Jobs run in parallel by default, but you can create dependencies with needs:.
Step: A single unit of work — either a shell command (run:) or an action (uses:). Steps run sequentially within a job and share the same filesystem.

Key detail: Steps share the runner workspace (filesystem + environment variables). Jobs do not — each job gets a fresh runner. To share data between jobs, use outputs or artifacts.

Q12: What is the difference between uses and run in a step?

uses: invokes a pre-built action — a JavaScript action, Docker action, or composite action from the Marketplace or a local path. Example: uses: actions/checkout@v4
run: executes inline shell commands directly on the runner. Example: run: npm test

Important: You cannot combine uses: and run: in the same step. Each step is either an action invocation or a shell command, never both.

yaml

# ✅ Correct — separate steps
- uses: actions/checkout@v4
- run: npm install && npm test

# ❌ Invalid — can't mix uses and run
- uses: actions/checkout@v4
  run: npm test

Q13 ⭐ How do jobs communicate with each other?

Three mechanisms:

needs: — defines execution order. Job B waits for Job A to complete before starting.

Outputs — pass small data (strings) between jobs:

yaml

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ steps.tag.outputs.tag }}
    steps:
      - id: tag
        run: echo "tag=sha-$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - run: echo "Deploying ${{ needs.build.outputs.image-tag }}"

Artifacts — share files between jobs using actions/upload-artifact and actions/download-artifact. Use for build outputs, test reports, or any data too large for outputs (which have a 1 MB limit).

Q14: What is the GITHUB_TOKEN and how does it work?

GITHUB_TOKEN is an auto-generated, short-lived token created for every workflow run. Key properties:

Scoped to the current repository — it cannot access other repos.
Expires when the job completes — not reusable after the run.
Permissions are configurable via the permissions: block in the workflow YAML.
Default permissions depend on repository settings — can be "read all" or "read + write" (Settings → Actions → General → Workflow permissions).

How it differs from a PAT (Personal Access Token): PATs are created by a user, can access any repo the user has access to, and must be rotated manually. GITHUB_TOKEN is automatic, scoped, and ephemeral — always prefer it over PATs in workflows.

Q15: Explain GitHub-hosted vs self-hosted runners. When would you choose each?

Aspect	GitHub-hosted	Self-hosted
Maintenance	Zero — GitHub manages VMs	You maintain hardware, OS, tools
Environment	Fresh VM per job (clean state)	Persistent (state carries over)
Tools	Pre-installed (Node, Python, Docker, etc.)	You install what you need
Cost	Pay per minute (free tier for public repos)	Your hardware cost, no per-minute charges
Network	Public internet only	Access to internal networks, on-prem resources

Choose self-hosted when: you need internal network access, custom hardware (GPUs), compliance requirements, or high volume to reduce cost.

Security warning: Never use self-hosted runners on public repositories. A malicious PR could run arbitrary code on your infrastructure.

Q16 ⭐ What is workflow_dispatch and when would you use it?

workflow_dispatch enables manual triggering of a workflow from the GitHub UI or API, with optional custom inputs.

yaml

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Target environment'
        required: true
        type: choice
        options: [staging, production]
      version:
        description: 'Image tag to deploy'
        required: true
        type: string

Use cases: on-demand deployments, hotfix releases, maintenance tasks (database backups, cache purges), testing workflows during development.

Q17: How does matrix strategy work? Give an example.

Matrix strategy runs the same job across multiple configurations in parallel. It creates a Cartesian product of the matrix dimensions.

yaml

strategy:
  matrix:
    node-version: [18, 20, 22]
    os: [ubuntu-latest, windows-latest]
    # Creates 3 × 2 = 6 parallel jobs

    include:
      - node-version: 22
        os: ubuntu-latest
        coverage: true      # Extra variable for this combo

    exclude:
      - node-version: 18
        os: windows-latest  # Skip this combination

Access values in steps via ${{ matrix.node-version }} and ${{ matrix.os }}. Use include to add specific combos or extra variables, and exclude to remove unwanted combos.

Q18: What are reusable workflows vs composite actions? When do you use each?

Feature	Reusable Workflow	Composite Action
Level	Workflow-level (has jobs)	Step-level (runs in calling job)
Trigger	`workflow_call`	`uses:` in a step
Secrets	Passed explicitly or `inherit`	Via inputs / env
Environments	Can reference environments	Cannot
Best for	Standardized pipelines across repos	Shared step sequences (lint + test)

Rule of thumb: If you need full jobs with environments and secrets → reusable workflow. If you need a few steps bundled together → composite action.

Q19: How does concurrency work in GitHub Actions?

The concurrency key groups workflow runs and controls what happens when multiple runs of the same group are active.

yaml

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

How it works: If a new run starts with the same concurrency group, the previous in-progress run is cancelled. This is critical for PR builds — if a developer pushes 3 commits in quick succession, only the latest commit's run proceeds.

Without cancel-in-progress: the new run queues until the previous one finishes.

Q20 ⭐ Explain the needs keyword and how job dependencies work.

needs: creates a Directed Acyclic Graph (DAG) of job execution order.

yaml

jobs:
  lint:     ...          # Runs immediately
  test:     ...          # Runs immediately (parallel with lint)
  build:
    needs: [lint, test]  # Waits for BOTH lint and test
  deploy:
    needs: build         # Waits for build

Behavior when a needed job fails: all dependent jobs are skipped. To override this, use if: always() or if: failure() on the dependent job.

Accessing outputs: ${{ needs.build.outputs.image-tag }} — you can only access outputs from jobs listed in your needs:.

📝 Category 3 — YAML Pipelines & Syntax

Q21 ⭐ Walk me through a GitHub Actions YAML file from top to bottom.

yaml

name: CI Pipeline              # Display name in the Actions tab

on:                             # Trigger events
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:                    # Token permissions (least privilege)
  contents: read

env:                            # Workflow-level environment variables
  REGISTRY: myacr.azurecr.io

jobs:
  build:                        # Job ID
    runs-on: ubuntu-latest      # Runner selection
    needs: [lint]               # Job dependency (optional)
    if: github.ref == 'refs/heads/main'  # Condition (optional)
    strategy:                   # Matrix (optional)
      matrix:
        node: [18, 20]
    steps:
      - uses: actions/checkout@v4        # Action step
      - run: npm install                 # Shell step
        env:
          NODE_ENV: production           # Step-level env var

Order matters for readability, but only on: and jobs: are required at the top level.

Q22: What are common YAML mistakes that break workflows?

Tabs instead of spaces — YAML only allows spaces. A single tab will silently break the workflow.
Inconsistent indentation — mixing 2-space and 4-space indentation within the same level.
Unquoted colons in run: — run: echo Time: 12:30 breaks because YAML interprets the second colon as a key-value separator.
Missing ${{ }} in expressions — outside if:, you must wrap expressions: ${{ env.MY_VAR }}. Inside if:, expressions are implicit.
Duplicate keys — two env: blocks at the same level silently overwrite.
Wrong file path — workflow files must be in .github/workflows/ with a .yml or .yaml extension.

Q23: How do path filters and branch filters work?

yaml

on:
  push:
    branches: [main, 'release/**']     # Only these branches
    paths: ['src/**', 'package.json']   # Only when these files change
    paths-ignore: ['docs/**', '*.md']   # Ignore these files

Rules:

branches and paths can be combined — both must match.
paths and paths-ignore cannot be used together.
Path filters use glob patterns: * matches single level, ** matches multiple levels.
If only docs changed and your paths filter doesn't include docs, the workflow won't run.

Q24: Explain if: conditions with an example.

Conditional execution at the job or step level:

yaml

# Only deploy from main branch
- run: helm upgrade myapp ./chart
  if: github.ref == 'refs/heads/main'

# Only run on pull requests
- run: echo "This is a PR"
  if: github.event_name == 'pull_request'

# Run even if previous steps failed
- run: echo "Sending failure notification"
  if: failure()

# Always run (cleanup)
- run: rm -rf ./tmp
  if: always()

# Skip CI based on commit message
- run: npm test
  if: "!contains(github.event.head_commit.message, '[skip ci]')"

Note: Expressions in if: are automatically wrapped in ${{ }}, so you don't need to write them explicitly.

Q25: What is the ${{ }} expression syntax?

The expression evaluator for GitHub Actions. It can access contexts and call functions:

Contexts: github, env, secrets, matrix, needs, steps, runner, job, inputs

Functions:

contains(search, item) — string or array contains check.
startsWith(string, prefix) / endsWith()
toJSON(value) / fromJSON(value) — serialize/deserialize (critical for dynamic matrices).
hashFiles('**/package-lock.json') — generates a hash for cache keys.
format() — string formatting.

Key rule: Inside if:, expressions are implicit. Everywhere else, you must use ${{ }}.

Q26 ⭐ How do outputs work between steps and between jobs?

Between steps (same job):

yaml

steps:
  - id: version
    run: echo "tag=v1.2.3" >> $GITHUB_OUTPUT
  - run: echo "Version is ${{ steps.version.outputs.tag }}"

Between jobs:

yaml

jobs:
  build:
    outputs:
      tag: ${{ steps.version.outputs.tag }}
    steps:
      - id: version
        run: echo "tag=v1.2.3" >> $GITHUB_OUTPUT
  deploy:
    needs: build
    steps:
      - run: echo "${{ needs.build.outputs.tag }}"

Important: $GITHUB_OUTPUT replaced the deprecated set-output command. Always use the file-based approach.

Q27: What does continue-on-error: true do vs if: always()?

continue-on-error: true — the step/job reports success even if it fails. The overall job status stays green. Use for non-critical steps (e.g., optional code quality tools).
if: always() — the step runs regardless of previous step results, but its own result affects the job status normally. Use for cleanup or notification steps.

Example: A notification step should use if: always() so it runs when things fail. An experimental lint tool should use continue-on-error: true so it doesn't block the pipeline.

Q28: Explain the timeout-minutes setting.

Default job timeout is 360 minutes (6 hours). This is dangerously high for most workflows.

yaml

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 15       # Job-level timeout
    steps:
      - run: npm test
        timeout-minutes: 10   # Step-level timeout

Best practice: Always set a timeout. A stuck npm install or a test that hangs can run for 6 hours, consuming your minutes budget. Set to 2–3× the expected duration.

🔒 Category 4 — Security & Secrets

Q29 ⭐ How are secrets managed in GitHub Actions? Explain the hierarchy.

Three scopes, from broadest to narrowest:

Organization secrets — shared across selected (or all) repositories in the org.
Repository secrets — available to all workflows in that repo.
Environment secrets — only available to jobs that reference that specific environment.

Override order: Environment > Repository > Organization. If the same secret name exists at multiple levels, the most specific scope wins.

Security properties:

Encrypted at rest using libsodium sealed boxes.
Automatically masked in logs (the raw value is replaced with ***).
Not accessible in fork PRs — this prevents secret exfiltration from malicious forks.
Never exposed in workflow YAML — accessed only through ${{ secrets.NAME }}.

Q30 ⭐ What is OIDC and why is it better than stored secrets for cloud auth?

OIDC (OpenID Connect) enables passwordless authentication between GitHub Actions and cloud providers (Azure, AWS, GCP).

How it works:

GitHub Actions presents a JWT (JSON Web Token) with claims about the workflow (repo, branch, environment).
The cloud provider validates the JWT against GitHub's OIDC endpoint.
If the claims match the trust policy, the provider issues short-lived credentials (15–60 minutes).

Why it's better than stored secrets:

No secret rotation — there's nothing stored to rotate.
No leak risk — no long-lived credentials exist anywhere.
Fine-grained trust — the cloud provider can restrict access by repo, branch, environment, and even workflow.
Audit trail — every token request is logged with full context.

yaml

permissions:
  id-token: write   # Required for OIDC
  contents: read

steps:
  - uses: azure/login@v2
    with:
      client-id: ${{ secrets.AZURE_CLIENT_ID }}
      tenant-id: ${{ secrets.AZURE_TENANT_ID }}
      subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

Trade-off: OIDC is better but requires initial setup in Azure AD (federated credentials). Worth the one-time effort for every production workflow.

Q31: A secret value appears in your workflow logs. How did this happen and how do you fix it?

Common causes:

Secret used in a URL: git clone https://user:${{ secrets.TOKEN }}@github.com/... — the URL gets logged and the masking may not catch it as part of a larger string.
Base64 encoded: masking only hides the original value. The encoded version is a different string.
Sub-string match failure: very short secrets (e.g., 3 characters) might not get masked reliably.
Environment variable expansion: some tools echo environment variables as part of their verbose output.

Fixes:

Always pass secrets via env: blocks, never inline in run:.
Use add-mask for derived values: echo "::add-mask::$DERIVED_VALUE"
Suppress output for sensitive commands: redirect to file or /dev/null.
Audit all steps that reference secrets and check their output behavior.

Q32: How do you handle third-party action security?

Third-party actions run code on your runner with access to your secrets. Best practices:

Pin by SHA, not tag: uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 — tags can be moved; SHAs are immutable.
Enable Dependabot for actions: automatically creates PRs when pinned actions have updates.
Review source code before first use — check what the action does with secrets and network access.
Limit to trusted publishers — GitHub verified creators badge.
Use org-level allow-lists — restrict which actions can be used across all repositories.

Q33: What permissions should a production deployment workflow have?

Follow the principle of least privilege:

yaml

permissions:
  contents: read       # Read repo code
  id-token: write      # OIDC authentication
  # Everything else: implicitly denied

Additionally:

Environment protection rules — required reviewers for the production environment.
Wait timers — enforce a delay between staging and production deploys.
Branch restrictions — only allow deployments from main or release/* branches.
Deployment branch policy — prevent ad-hoc deployments from feature branches.

Never use permissions: write-all in production workflows.

Q34: What happens when a workflow runs from a forked repository?

When a fork opens a PR against your repo:

GITHUB_TOKEN is read-only — cannot push, create releases, or modify the target repo.
Secrets are NOT available — ${{ secrets.MY_SECRET }} resolves to an empty string.
First-time contributors require maintainer approval before any workflow run (configurable in Settings → Actions → Fork pull request workflows).

Why? This prevents secret exfiltration. A malicious actor could fork your repo, modify the workflow to echo ${{ secrets.PROD_PASSWORD }}, and steal credentials. The fork restriction prevents this entirely.

Q35 ⭐ How do you implement environment-based deployment gates?

Create environments in Settings → Environments (e.g., staging, production).
Add protection rules:
- Required reviewers — 1+ people must approve before the job proceeds.
- Wait timer — enforce a cooldown (e.g., 30 minutes between staging and production).
- Branch restrictions — only main can deploy to production.
Reference in workflow:

yaml

jobs:
  deploy-production:
    runs-on: ubuntu-latest
    environment: production      # ← This pauses for approval
    steps:
      - run: helm upgrade myapp ./chart --namespace production

The job will pause and wait for a reviewer to approve in the GitHub UI before executing any steps.

Q36: How would you audit all secrets usage across an org's workflows?

GitHub audit log — tracks which workflows accessed which secrets and when.
Code search — search all workflow files for secrets. references: org:myorg path:.github/workflows "secrets."
Org-level secret policies — centrally manage which repos can access which secrets.
Migrate to OIDC — reduces the number of stored secrets that need auditing.
Automate with a workflow — create a scheduled workflow that scans all repos' workflow files and reports secret usage.

🚀 Category 5 — Real-World Deployment Scenarios

Q37 ⭐⭐ Design a CI/CD pipeline to deploy a containerized app to AKS using GitHub Actions and Helm.

Full pipeline architecture:

text

┌──────┐   ┌──────┐   ┌────────────┐   ┌──────────┐
│ Lint │──▶│ Test │──▶│ Build+Push │──▶│ Deploy   │
└──────┘   └──────┘   │ Docker→ACR │   │ Staging  │
                       └────────────┘   └────┬─────┘
                                             │
                                        ┌────▼─────┐   ┌──────────┐
                                        │ Smoke    │──▶│ Approval │
                                        │ Test     │   │ Gate     │
                                        └──────────┘   └────┬─────┘
                                                             │
                                                        ┌────▼──────┐
                                                        │ Deploy    │
                                                        │ Production│
                                                        └───────────┘

yaml

name: Deploy to AKS
on:
  push:
    branches: [main]

permissions:
  id-token: write
  contents: read

env:
  ACR_NAME: myacr
  IMAGE: myacr.azurecr.io/myapp

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run lint

  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm test

  build-push:
    needs: [lint, test]
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ steps.meta.outputs.tag }}
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - run: az acr login --name $ACR_NAME
      - id: meta
        run: echo "tag=sha-$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT
      - run: |
          docker build -t $IMAGE:${{ steps.meta.outputs.tag }} .
          docker push $IMAGE:${{ steps.meta.outputs.tag }}

  deploy-staging:
    needs: build-push
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - run: az aks get-credentials --resource-group myRG --name myAKS
      - run: |
          helm upgrade --install myapp ./chart \
            --namespace staging \
            --set image.tag=${{ needs.build-push.outputs.image-tag }} \
            --atomic --wait --timeout 5m

  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production    # Pauses for approval
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - run: az aks get-credentials --resource-group myRG --name myAKS
      - run: |
          helm upgrade --install myapp ./chart \
            --namespace production \
            --set image.tag=${{ needs.build-push.outputs.image-tag }} \
            --atomic --wait --timeout 5m

Key decisions: OIDC for Azure auth (no stored credentials), --atomic --wait for safe deploys (auto-rollback on failure), commit SHA tags for traceability, environments for approval gates.

Q38 ⭐ Your production deployment via Helm fails with a timeout. Walk me through your debugging process.

Step-by-step diagnosis:

Check GitHub Actions logs — identify which step failed. Is it the helm upgrade step? What does the error say?

Check pod status:

bash

kubectl get pods -n production
# Look for: CrashLoopBackOff, ImagePullBackOff, Pending

Describe the failing pod:

bash

kubectl describe pod <pod-name> -n production
# Check Events section: image pull errors, resource limits, scheduling failures

Check application logs:

bash

kubectl logs <pod-name> -n production
kubectl logs <pod-name> -n production --previous  # Previous crash

Verify the image exists in ACR: az acr repository show-tags --name myacr --repository myapp
Check resource quotas: kubectl describe resourcequota -n production
Check readiness/liveness probes — misconfigured probes are the #1 cause of Helm timeouts.
Rollback if needed: helm rollback myapp -n production

Root cause hierarchy: 80% of Helm timeouts are caused by: 1) bad readiness probes, 2) wrong image tag, 3) insufficient resources, 4) missing secrets/configmaps.

Q39: How would you implement a canary deployment in GitHub Actions?

Deploy the canary as a separate Helm release with 1 replica and a low traffic weight:

bash

helm upgrade --install myapp-canary ./chart \
  --set replicaCount=1 \
  --set ingress.canaryWeight=10 \
  --namespace production

Run health checks for N minutes — monitor error rates and latency via Prometheus/Grafana or an API health endpoint.
If healthy → promote: update the main release with the new image and remove the canary.
bash
```
helm upgrade myapp ./chart --set image.tag=$NEW_TAG
helm uninstall myapp-canary
```
If unhealthy → rollback: remove the canary, main release is untouched.
bash
```
helm uninstall myapp-canary
```

For advanced canary, use Flagger with Istio or NGINX — it automates the traffic shifting and analysis.

Q40 ⭐ How do you handle rollback in a Helm-based CI/CD pipeline?

Automatic rollback: Use --atomic flag with helm upgrade. If the deployment fails (pods don't become ready within the timeout), Helm automatically rolls back to the previous release.

bash

helm upgrade --install myapp ./chart --atomic --wait --timeout 5m

Manual rollback:

bash

helm history myapp -n production          # List revisions
helm rollback myapp 3 -n production       # Rollback to revision 3

CI-triggered rollback: Create a separate workflow_dispatch workflow that accepts a revision number and runs helm rollback. Or simply re-run a previous successful workflow — since images are tagged with commit SHA, it deploys the known-good version.

Best practice: Always test your rollback procedure. A rollback you've never tested is a rollback that will fail when you need it most.

Q41: A workflow takes 15 minutes. How would you optimize it?

Optimization strategies (ordered by impact):

Parallelize independent jobs — lint, test, and security scan can run in parallel instead of sequentially.

Add dependency caching:

yaml

- uses: actions/setup-node@v4
  with:
    node-version: 20
    cache: 'npm'    # Built-in caching

Docker layer caching — use cache-from: type=gha with docker/build-push-action.
Reduce test scope for PRs — only run tests for changed modules using path filters or a test impact analysis tool.
Use larger runners — ubuntu-latest-4-cores (GitHub-hosted) for compute-heavy jobs.
Skip unnecessary steps — use path filters to skip CI for docs-only changes.
Cancel stale runs — use concurrency with cancel-in-progress: true.

Q42: How do you manage CI/CD for a monorepo with 10+ microservices?

Key strategies:

Path filters per service: each workflow triggers only when its service's files change.

Dynamic matrix to detect changed services and build only those:

yaml

- uses: dorny/paths-filter@v3
  id: changes
  with:
    filters: |
      api: 'services/api/**'
      web: 'services/web/**'
      auth: 'services/auth/**'

# Later: build a dynamic matrix from the changed services
- id: matrix
  run: |
    echo "services=$(echo '${{ toJSON(steps.changes.outputs) }}' | jq -c '[to_entries[] | select(.value=="true") | .key]')" >> $GITHUB_OUTPUT

Reusable workflows for standardized CI — every service calls the same build-and-deploy workflow.
Per-service Helm charts — each service has its own chart in services/<name>/chart/.
Parallel deploys — use a matrix job to deploy all changed services simultaneously.

Q43: How would you implement a multi-region deployment?

Strategy:

Deploy to primary region first.
Run smoke tests against the primary region.

Deploy to secondary regions in parallel using a matrix strategy:

yaml

deploy-secondary:
  needs: deploy-primary
  strategy:
    matrix:
      region: [westeurope, southeastasia, brazilsouth]
      include:
        - region: westeurope
          cluster: aks-eu
          rg: rg-eu
        - region: southeastasia
          cluster: aks-asia
          rg: rg-asia
        - region: brazilsouth
          cluster: aks-brazil
          rg: rg-brazil

Use environment-per-region with individual approval gates if needed.

Considerations: DNS failover (Azure Traffic Manager / Front Door), database replication lag (deploy DB migrations to primary first, wait for replication), and regional configuration differences (CDN endpoints, compliance settings).

Q44 ⭐ Your colleague's workflow passes in their fork but fails in the main repo. What are the possible causes?

Common causes:

Secrets not available in fork: fork PRs don't receive secrets from the upstream repo. The workflow "passes" in the fork because the step that needs the secret silently uses an empty string.
Different branch protection rules: the main repo may require status checks or CODEOWNERS review that the fork doesn't.
Org policy restrictions: the main repo's organization may restrict which actions are allowed.
Environment protection rules: environments with required reviewers exist in the main repo but not the fork.
Different runner availability: the main repo might use org-specific self-hosted runners that the fork doesn't have access to.
CODEOWNERS restrictions: changes to certain paths require review from specific code owners in the main repo.

Debugging approach: Compare the workflow runs — check exactly which step fails and what the error message says. Most often it's a secrets or permissions issue.

🎯 Category 6 — Scenario-Based Deep Dives

These are long-form whiteboard-style questions. Interviewers expect structured, phased answers — not a single sentence.

Q45 ⭐⭐ You are hired as the first DevOps engineer at a startup with 5 developers. They deploy manually via SSH. Design their CI/CD strategy using GitHub Actions.

Phased approach (don't try to do everything at once):

Phase	Timeline	Goal
Phase 1	Week 1	Add basic CI — lint + unit tests on every PR. This gives immediate feedback and builds trust in automation.
Phase 2	Week 2	Containerize the app — create a Dockerfile, add a Docker build step to CI. Ensure every build produces an immutable image.
Phase 3	Week 3	Set up AKS (or ECS), create Helm charts, add automated deployment to a staging environment. Developers start seeing their code running in a real cluster.
Phase 4	Week 4	Add production environment with approval gates. Migrate from SSH deployments to pipeline-driven deployments. Remove SSH access to production.
Phase 5	Ongoing	Add monitoring and alerting (Prometheus + Grafana), add rollback automation, optimize pipeline speed, add security scanning.

Key principles:

Start small and iterate — don't build the entire pipeline in week 1.
Make it the default — remove the old SSH workflow completely once the pipeline is stable.
Get developer buy-in — show them the pipeline saves time, don't force it.
Document everything — the team needs to understand and maintain the pipeline without you.

Q46: Your organization has 200 repositories. How do you standardize CI/CD?

Central .github repository — contains reusable workflows that all repos call via workflow_call. Update once, all 200 repos benefit.
Org-level secrets and variables — shared credentials (registry URL, OIDC config) managed centrally.
Custom actions for common patterns — create composite actions for standard tasks (lint, build Docker, deploy Helm).
Required status checks — branch protection rules enforce that specific workflows must pass before merging.
Org-wide runner groups — self-hosted runners shared across repos with proper access controls.
Dependabot for action updates — a single Dependabot config template keeps all repos' action versions current.
Internal documentation — "How to CI/CD at [Company]" guide with templates and examples.

Q47 ⭐ A deployment to production succeeded in GitHub Actions but users report the app is down. Walk through your incident response.

Incident response sequence:

Verify the pipeline — check GitHub Actions logs. All steps green? Build, push, and deploy all succeeded?

Check pod status:

bash

kubectl get pods -n production
# Are pods Running? Ready? Restarting?

Check application logs: kubectl logs <pod> -n production — look for runtime errors, connection failures, OOM kills.
Check ingress / DNS: Is the ingress controller healthy? Did the DNS update propagate? Can you curl the health endpoint directly from inside the cluster?
Check health endpoints: curl -v https://myapp.com/healthz
Compare Helm release values with expected: helm get values myapp -n production
Rollback immediately: helm rollback myapp -n production — restore service first, investigate later.
Post-mortem: Add smoke tests to the pipeline (they would have caught this), add readiness probe checks, add synthetic monitoring.

Golden rule: Restore service first, investigate second. A 5-minute rollback is better than 30 minutes of debugging while users are impacted.

Q48: How would you implement feature flags with CI/CD?

Core principle: Decouple deployment from release.

Deployment = code is on production servers (handled by CI/CD).
Release = feature is visible to users (handled by feature flags).

Implementation:

Use a feature flag service — LaunchDarkly, Unleash (open source), or a simple config-based system.
Deploy code with flags off — the feature is in production but invisible to users.
Enable gradually via the flag service — 5% → 25% → 50% → 100%.
If issues arise, disable the flag instantly — no redeployment needed.

CI/CD integration: The pipeline deploys code. The flag service controls behavior. They are independent systems. Never use CI/CD to toggle feature flags — that defeats the purpose of instant rollback.

Q49: Explain how you'd set up CI/CD for a regulated industry (healthcare/finance).

Compliance-focused pipeline requirements:

Signed commits — require GPG or SSH commit signatures via branch protection rules.
Required reviewers (2+) — enforce four-eyes principle for all production changes.
SOC2-compliant audit trail — GitHub's audit log captures every workflow run, every secret access, every deployment event.
OIDC only — no stored secrets for cloud authentication. Eliminates credential rotation compliance burden.
SHA-pinned actions — immutable action references prevent supply chain attacks.
SLSA provenance — generate build provenance attestations (actions/attest-build-provenance) to prove what code produced which artifact.
Environment wait timers — mandatory cooling periods between staging and production.
Separate environments — staging → pre-production → production, each with its own approval chain.
Automated compliance scanning — add SAST (CodeQL), dependency scanning (Dependabot), container scanning (Trivy) as required pipeline stages.

Q50: Your CI pipeline costs $5,000/month. How do you reduce it?

Cost reduction strategies (ordered by impact):

Analyze runner minutes usage — find the top 10 workflows by minutes consumed. Focus optimization efforts there.
Add path filters — skip CI entirely for docs-only, README, or config-only changes.
Enable caching — npm ci with cache saves 1–3 minutes per run. Docker layer caching saves 5+ minutes.
Use concurrency — cancel stale runs. If a developer pushes 5 commits, only the last one needs to complete.
Reduce matrix scope for PRs — run the full matrix (all OS × all versions) only on main. For PRs, test on the most common configuration only.
Consider self-hosted runners — for high-volume repos (100+ runs/day), the break-even point is often reached within a month.
Use spot instances for non-critical jobs — self-hosted runners on spot/preemptible VMs cost 60–90% less.
Right-size runner specs — not every job needs a 4-core runner. Use the default 2-core for simple tasks.

Quick win: Adding concurrency: cancel-in-progress and path filters alone often reduces costs by 30–40%.

📌 Tips for Interviews

💡

How to Stand Out

Always mention trade-offs — e.g., "OIDC is better for security but requires Azure AD federated credential setup."
Use specific tool names — say actions/checkout@v4, helm upgrade --install --atomic, dorny/paths-filter instead of vague descriptions.
Draw pipeline diagrams when explaining architecture — interviewers love visual thinkers.
Share real examples from your experience — "In my last project, we reduced pipeline time from 12 minutes to 4 by…"
Know the differences: GitHub Actions vs Azure DevOps vs Jenkins vs GitLab CI — each has trade-offs. Be prepared to compare.

📝 Summary & Course Completion

🎉

Congratulations!

You've completed the GitHub Actions — Zero to Hero course. You've covered CI/CD fundamentals, YAML pipelines, secrets management, reusable workflows, AKS deployments with Helm, security hardening, debugging, and interview preparation.

You now have the knowledge to design, build, secure, and troubleshoot production-grade CI/CD pipelines. Go build something great — and ace that interview. 🚀

← Debugging Workflows ← Back to Course Home

← Back to GitHub Actions Course