Matrix & Advanced Patterns
Matrix strategies, dynamic matrices, concurrency control, conditional workflows, and workflow orchestration patterns.
🧒 Simple Explanation (ELI5)
Imagine you run a restaurant kitchen.
- Matrix strategy is like having multiple cooks prepare the same dish with different ingredients simultaneously. Instead of testing your app on Node 18, then Node 20, then Node 22 one after another, the matrix fires up three kitchens (runners) at the same time — one for each version. Add three operating systems and you have 9 cooks working in parallel (3 OS × 3 Node versions).
- fail-fast is the head chef shouting "Stop everything!" the moment one cook burns their dish. If disabled, every cook finishes regardless — useful when you want the full picture of what's broken.
- Concurrency is the kitchen capacity rule. You can say "only one dessert order at a time" so two soufflés don't collide in the oven. In CI/CD, this means "only one deploy to production at a time" — if a new one starts, cancel the old one that's still baking.
- Dynamic matrix is like the head chef reading tonight's reservation list and dynamically deciding which dishes to prep — instead of a fixed menu, the matrix is generated on the fly based on what's actually needed.
Put it together: the matrix multiplies your testing, concurrency prevents collisions, and dynamic matrices make everything smart and adaptive.
🔢 Matrix Strategy
A matrix strategy lets you run the same job across multiple combinations of variables — OS versions, language versions, configurations — all in parallel.
Basic Matrix
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
node: [18, 20, 22]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
- run: npm ci
- run: npm test
This creates 9 jobs (3 OS × 3 Node versions). Each combination runs as an independent job. Access values via ${{ matrix.os }} and ${{ matrix.node }}.
fail-fast & max-parallel
fail-fast: true(default) — if any matrix job fails, GitHub cancels all remaining jobs immediately. Fast feedback, but you lose visibility into other failures.fail-fast: false— all jobs run to completion regardless of failures. Essential when you need the full compatibility picture.max-parallel— limit how many matrix jobs run concurrently. Useful when jobs consume expensive resources (e.g., limited self-hosted runners).
strategy:
fail-fast: false
max-parallel: 4
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
node: [18, 20, 22]
Include & Exclude
Fine-tune the matrix by adding or removing specific combinations:
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
node: [18, 20]
include:
# Add an extra combination not in the base matrix
- os: ubuntu-latest
node: 22
experimental: true
exclude:
# Remove a specific combination
- os: windows-latest
node: 18
includeadds extra combinations (or extra variables to existing ones). Theexperimental: truevariable is accessible as${{ matrix.experimental }}.excluderemoves specific combinations from the Cartesian product. Here, Windows + Node 18 is skipped entirely.- Result:
ubuntu-18,ubuntu-20,ubuntu-22(included),windows-20— total 4 jobs instead of 4.
Using Extra Matrix Variables
strategy:
matrix:
include:
- os: ubuntu-latest
node: 22
experimental: true
- os: ubuntu-latest
node: 20
experimental: false
steps:
- run: npm test
continue-on-error: ${{ matrix.experimental }}
This lets experimental builds fail without breaking the overall workflow — perfect for testing pre-release versions.
⚡ Dynamic Matrix
Instead of hardcoding the matrix in YAML, generate it dynamically from a previous job's output. This is essential for monorepos, version detection, and conditional testing.
Basic Pattern — JSON File
jobs:
setup:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
- id: set-matrix
run: echo "matrix=$(jq -c . matrix.json)" >> $GITHUB_OUTPUT
build:
needs: setup
runs-on: ubuntu-latest
strategy:
matrix: ${{ fromJson(needs.setup.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- run: echo "Building ${{ matrix.service }} v${{ matrix.version }}"
Example matrix.json
{
"service": ["api", "web", "worker"],
"version": ["1.0", "2.0"]
}
Monorepo — Detect Changed Services
jobs:
detect:
runs-on: ubuntu-latest
outputs:
services: ${{ steps.changed.outputs.services }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- id: changed
run: |
CHANGED=$(git diff --name-only HEAD~1 HEAD | grep -oP '^services/\K[^/]+' | sort -u | jq -Rc . | jq -sc .)
echo "services=$CHANGED" >> $GITHUB_OUTPUT
build:
needs: detect
if: needs.detect.outputs.services != '[]'
runs-on: ubuntu-latest
strategy:
matrix:
service: ${{ fromJson(needs.detect.outputs.services) }}
steps:
- uses: actions/checkout@v4
- run: cd services/${{ matrix.service }} && make build
Use cases: monorepo (only build changed services), dynamic version lists (fetch supported versions from an API), environment-specific deploys (generate targets from a config file).
The fromJson() expression function converts a JSON string into a GitHub Actions object. The JSON must be valid and compact (no newlines). Always pipe through jq -c to ensure compact output before setting it as a step output.
🚦 Concurrency Control
Concurrency groups prevent multiple instances of the same workflow from running simultaneously — critical for deployments, releases, and resource-intensive jobs.
Basic Concurrency
# Workflow-level concurrency
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
on:
push:
branches: [main]
pull_request:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci && npm test
group— a name that identifies concurrent runs. Runs with the same group name are serialized or cancelled.cancel-in-progress: true— if a new run starts in the same group, cancel the currently running one. Perfect for PR checks where only the latest push matters.cancel-in-progress: false— new runs queue and wait for the current one to finish. Use for deployments where you never want to interrupt a running deploy.
Workflow-Level vs Job-Level
# Job-level concurrency — only the deploy job is serialized
jobs:
test:
runs-on: ubuntu-latest
steps:
- run: npm test
deploy:
needs: test
runs-on: ubuntu-latest
concurrency:
group: deploy-production
cancel-in-progress: false
steps:
- run: ./deploy.sh production
Workflow-level concurrency applies to the entire workflow run. Job-level concurrency lets you serialize only specific jobs — tests can run concurrently, but deploys are serialized.
Concurrency Group Naming Patterns
| Pattern | Group Name | Use Case |
|---|---|---|
${{ github.workflow }}-${{ github.ref }} | Per workflow + branch | Cancel stale PR checks, keep branch builds independent |
deploy-${{ github.ref }} | Per branch deploy | Prevent parallel deploys to the same environment |
deploy-production | Global production | Only one production deploy at a time, any branch |
pr-${{ github.event.number }} | Per pull request | Cancel previous checks when new commits are pushed to a PR |
A group name like ${{ github.workflow }} (without ${{ github.ref }}) means all branches share the same concurrency group. A push to feature-x would cancel a running deploy from main. Always include the branch or PR number in the group name unless you intentionally want global serialization.
🧩 Advanced Workflow Patterns
Conditional Workflows
Skip jobs based on event context, file changes, or custom conditions:
jobs:
deploy:
# Skip deploys for draft PRs
if: github.event.pull_request.draft == false
runs-on: ubuntu-latest
steps:
- run: ./deploy.sh
docs:
# Only run when docs change
if: contains(github.event.head_commit.message, '[docs]')
runs-on: ubuntu-latest
steps:
- run: ./build-docs.sh
Path-Based Job Execution
Use dorny/paths-filter to conditionally run jobs based on which files changed:
jobs:
changes:
runs-on: ubuntu-latest
outputs:
backend: ${{ steps.filter.outputs.backend }}
frontend: ${{ steps.filter.outputs.frontend }}
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
backend:
- 'server/**'
frontend:
- 'client/**'
backend-tests:
needs: changes
if: needs.changes.outputs.backend == 'true'
runs-on: ubuntu-latest
steps:
- run: cd server && npm test
frontend-tests:
needs: changes
if: needs.changes.outputs.frontend == 'true'
runs-on: ubuntu-latest
steps:
- run: cd client && npm test
Workflow Chaining — workflow_run
Trigger a workflow after another workflow completes:
# deploy.yml — runs after CI completes on main
on:
workflow_run:
workflows: ["CI"]
types: [completed]
branches: [main]
jobs:
deploy:
if: github.event.workflow_run.conclusion == 'success'
runs-on: ubuntu-latest
steps:
- run: ./deploy.sh
Repository Dispatch — Cross-Repo Triggers
# In Repo B — listens for dispatch events
on:
repository_dispatch:
types: [deploy-frontend]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- run: echo "Deploying version ${{ github.event.client_payload.version }}"
# In Repo A — sends the dispatch event
- uses: peter-evans/repository-dispatch@v3
with:
token: ${{ secrets.REPO_B_PAT }}
repository: myorg/repo-b
event-type: deploy-frontend
client-payload: '{"version": "1.2.3"}'
Timeout & Retry Patterns
jobs:
deploy:
runs-on: ubuntu-latest
timeout-minutes: 30 # Kill job if it exceeds 30 minutes
steps:
- name: Deploy with retry
uses: nick-fields/retry@v3
with:
timeout_minutes: 10
max_attempts: 3
retry_on: error
command: ./deploy.sh
- name: Flaky integration test
run: npm run test:e2e
continue-on-error: true # Don't fail the job if this step fails
timeout-minutes— hard limit on job duration (default: 360 minutes). Set it to prevent stuck jobs from burning runner hours.continue-on-error: true— the step can fail without failing the job. Combine with subsequent status checks for retry logic.- Retry action —
nick-fields/retryretries a command on failure with configurable attempts and backoff.
📋 Pattern Catalog
| Pattern | When to Use | Example Config |
|---|---|---|
| Fan-out / Fan-in | Run parallel jobs, then aggregate results | Matrix builds → single deploy job with needs: [build] |
| Deploy Pipeline | Sequential multi-environment deploys | test → staging → approval → production with environment protection |
| Canary Deploy | Gradual rollout to a subset of users | Deploy to canary environment → run smoke tests → promote to production |
| Blue-Green Deploy | Zero-downtime with instant rollback | Deploy to green slot → health check → swap traffic → tear down blue |
| Feature-Flag Toggle | Enable/disable features without deploy | repository_dispatch event triggers config update in feature flag service |
| Monorepo Selective | Build only changed services | dorny/paths-filter + dynamic matrix from changed directories |
| Scheduled Maintenance | Periodic tasks (cleanup, rotation) | on: schedule: cron: '0 3 * * 1' — every Monday at 3 AM |
📊 Visual Diagrams
Matrix Expansion
strategy:
matrix:
os: [ubuntu, windows, macos]
node: [18, 20, 22]
┌─────────────────────────────────────────────────────┐
│ Matrix Expansion (9 Jobs) │
└─────────────────────────────────────────────────────┘
ubuntu + Node 18 ──┐
ubuntu + Node 20 ──┤
ubuntu + Node 22 ──┤
windows + Node 18 ─┤──▶ All run in parallel
windows + Node 20 ─┤ (limited by max-parallel)
windows + Node 22 ─┤
macos + Node 18 ───┤
macos + Node 20 ───┤
macos + Node 22 ───┘
With exclude: [{os: windows, node: 18}]
→ 8 jobs (windows + Node 18 removed)
With include: [{os: ubuntu, node: 23, experimental: true}]
→ 10 jobs (extra combination added)
Concurrency Group Cancellation Flow
concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true
Timeline:
─────────────────────────────────────────────────────▶ time
Push A (branch: feat-1)
├── Run #1 starts ─────────────────────▶ running...
│
Push B (branch: feat-1) ← same group
│ Run #1 ──❌ CANCELLED
├── Run #2 starts ─────────────────────▶ running...
│
Push C (branch: feat-2) ← different group
├── Run #3 starts ─────────────────────▶ ✅ completes
│ (Run #2 continues — different group)
│
Push D (branch: feat-1) ← same group as #2
Run #2 ──❌ CANCELLED
├── Run #4 starts ─────────────────────▶ ✅ completes
🛠️ Hands-on Lab
Lab 1: Create a Multi-OS × Multi-Node Matrix
- Create
.github/workflows/matrix-test.yml - Define a matrix with
os: [ubuntu-latest, windows-latest]andnode: [18, 20, 22] - Set
fail-fast: falseto see all results - Push and verify 6 jobs appear in the Actions tab
name: Matrix Test
on: push
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
node: [18, 20, 22]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
- run: node --version
- run: npm ci
- run: npm test
Lab 2: Add Include / Exclude
- Exclude
windows-latest + Node 18(known incompatibility) - Include
ubuntu-latest + Node 23withexperimental: true - Use
continue-on-error: ${{ matrix.experimental || false }}on the test step - Verify: 6 jobs (6 base – 1 excluded + 1 included), experimental allowed to fail
Lab 3: Dynamic Matrix from JSON
- Create a
matrix.jsonfile:{"node": [18, 20, 22], "os": ["ubuntu-latest"]} - Create a setup job that reads the file and outputs the matrix
- Create a test job that consumes
fromJson(needs.setup.outputs.matrix) - Modify the JSON and push — observe the matrix changing without editing the workflow
Lab 4: Concurrency for Deployments
- Add a workflow-level concurrency group:
${{ github.workflow }}-${{ github.ref }} - Set
cancel-in-progress: true - Push two commits in rapid succession to the same branch
- Observe the first run being cancelled and only the second completing
- Change to
cancel-in-progress: falseand repeat — observe queuing behavior
🐛 Debugging Common Issues
"Matrix generates 0 jobs"
- Cause: An empty array in the matrix definition, or a dynamic matrix that evaluates to
[] - Fix: Add an
ifguard:if: fromJson(needs.setup.outputs.matrix).service[0] != null. Or ensure the array always has at least one element - Check: Print the matrix output in the setup job:
echo "${{ steps.set-matrix.outputs.matrix }}"
"Concurrency cancels the wrong run"
- Cause: The concurrency group name is too broad — e.g., using
${{ github.workflow }}without${{ github.ref }} - Fix: Include branch/PR context:
group: ${{ github.workflow }}-${{ github.ref }} - Debugging: Check the "Concurrency group" label in the Actions run summary to see which group the run belongs to
"Dynamic matrix: invalid JSON"
- Cause:
fromJson()received a malformed string — newlines, trailing commas, or shell escaping issues - Fix: Always use
jq -c(compact output) and wrap in quotes when setting output. Verify with:echo '$JSON' | jq . - Common mistake: Multi-line JSON in
$GITHUB_OUTPUT— output values must be single-line
"Include adds new combinations but variables are empty"
- Cause:
includeentries that don't match any existing matrix combination create new combinations — but extra variables only apply to matched combos - Fix: Ensure include entries specify all matrix dimensions if they need to match, or explicitly add all needed variables for new combinations
🎯 Interview Questions
Basic (5)
1. What is a matrix strategy in GitHub Actions?
A matrix strategy lets you run the same job multiple times with different variable combinations. You define arrays of values (e.g., OS versions, language versions) and GitHub creates a job for every combination (Cartesian product). Each job runs independently and in parallel, with matrix values accessible via ${{ matrix.<key> }}.
2. How many jobs does a matrix with 3 OS and 4 Node versions create?
12 jobs (3 × 4 = 12). Every combination of OS and Node version gets its own independent runner. Each job appears separately in the Actions UI with the matrix values shown in the job name.
3. What does fail-fast do in a matrix strategy?
fail-fast: true (the default) cancels all remaining matrix jobs as soon as any single job fails. This gives fast feedback but hides other potential failures. Setting fail-fast: false lets all jobs complete regardless, which is useful for seeing the full compatibility picture across all matrix combinations.
4. What is a concurrency group?
A concurrency group is a named label that serializes workflow or job runs. If a new run starts with the same group name as a currently running one, it either queues (waits) or cancels the in-progress run, depending on the cancel-in-progress setting. Groups prevent duplicate deploys and wasted runner minutes.
5. What is the difference between workflow_run and workflow_call?
workflow_run triggers a workflow after another workflow completes — they run as separate workflow runs with their own logs. workflow_call is for reusable workflows — the called workflow runs within the caller's workflow run, sharing context. Think of workflow_run as event-based chaining and workflow_call as function-like composition.
Intermediate (5)
6. How do include and exclude work in a matrix?
exclude removes specific combinations from the Cartesian product — you specify the exact values to skip. include adds extra combinations or extra variables: if an include entry matches an existing combination (same matrix keys), it adds the extra variables to that combo; if it doesn't match, it creates a new combination. Include is processed after the base matrix and exclude.
7. Explain how to create a dynamic matrix.
A dynamic matrix is generated at runtime by a preceding job. The setup job outputs a JSON string (e.g., from a file, API, or script), and the downstream job uses strategy: matrix: ${{ fromJson(needs.setup.outputs.matrix) }} to consume it. The JSON must be compact (single-line) and valid. This pattern enables monorepo selective builds, dynamic version testing, and config-driven pipelines.
8. When should you use cancel-in-progress: false?
Use cancel-in-progress: false for deployments and other operations that shouldn't be interrupted mid-way. Interrupting a database migration or infrastructure provisioning can leave systems in a broken state. With false, new runs queue and wait. Use true for CI checks on PRs where only the latest commit matters.
9. How does dorny/paths-filter help in monorepos?
dorny/paths-filter detects which files changed and outputs boolean flags for each configured filter. Downstream jobs use if: needs.changes.outputs.backend == 'true' to conditionally run. This avoids running all tests for every change — frontend changes only trigger frontend tests, backend changes only trigger backend tests, saving runner minutes and time.
10. What is repository_dispatch and when would you use it?
repository_dispatch is a webhook event that can be triggered via the GitHub API. It's used for cross-repository triggers — Repo A can dispatch an event to Repo B to trigger a deployment. It supports a client_payload for passing data (e.g., version number, commit SHA). The calling repo needs a PAT or GitHub App token with write access to the target repo.
Senior (5)
11. Design a CI matrix strategy for a library supporting 5 OS × 4 language versions with experimental builds.
Base matrix: os: [ubuntu, windows, macos, alpine, amazonlinux] × lang: [3.10, 3.11, 3.12, 3.13] = 20 jobs. Include: add {os: ubuntu, lang: 3.14-rc, experimental: true} for pre-release testing. Exclude: remove known-incompatible combos (e.g., alpine + 3.10). fail-fast: false to see all results. continue-on-error: ${{ matrix.experimental || false }} so pre-release failures don't block the pipeline. max-parallel: 10 to avoid exhausting runner pool. Upload per-job test artifacts for centralized reporting.
12. A monorepo with 30 microservices has slow CI. How do you optimize with dynamic matrices and concurrency?
(1) Path detection job: detect which services changed (git diff + paths-filter). (2) Dynamic matrix: generate a matrix containing only changed services — 30-service builds become 2-3 on average. (3) Concurrency per-PR: group: ci-pr-${{ github.event.number }} with cancel-in-progress: true — only test the latest push. (4) Caching: separate dependency caches per service. (5) Fan-out/fan-in: matrix build jobs → single integration test job → single deploy job. (6) Conditional deploy: only deploy services whose builds changed. Result: 90%+ reduction in runner minutes.
13. Explain the tradeoffs between workflow_run, workflow_call, and repository_dispatch.
workflow_call: reusable workflow that runs inside the caller — shared secrets, shared run, shows as a single workflow. Best for code reuse within the same org. workflow_run: event-based chaining — runs as a separate workflow, has its own permissions and secrets, can react to success/failure/requested. Best for decoupled pipeline stages. repository_dispatch: API-triggered, cross-repo — requires PAT/App token, supports custom payloads, fully decoupled. Best for microservice architectures where repos are independently managed. Tradeoff: tighter coupling = simpler debugging, looser coupling = more flexibility.
14. How would you implement a canary deployment pipeline using GitHub Actions?
(1) Build job: build and tag Docker image. (2) Deploy canary: deploy to canary environment (5% traffic). Concurrency group deploy-canary prevents parallel canaries. (3) Smoke tests: run against canary endpoint — error rate, latency, health checks. (4) Manual approval gate: use GitHub Environments with required reviewers. (5) Progressive rollout: matrix with [25, 50, 100] percent targets, each with smoke test validation. (6) Auto-rollback: if smoke tests fail, automatically scale canary to 0 and notify. Use continue-on-error on validation + conditional rollback step.
15. How do you handle matrix strategies when the matrix dimensions come from external sources that might be unavailable?
(1) Fallback defaults: if the API/file is unavailable, output a hardcoded default matrix. (2) Retry logic: use nick-fields/retry on the setup job's fetch step. (3) Cache previous matrix: store the last successful matrix in an artifact or repo file; fall back to it on failure. (4) Validation: validate JSON schema before passing to fromJson(). (5) Empty matrix guard: if: fromJson(needs.setup.outputs.matrix).versions[0] != null prevents "0 jobs" failures. (6) Alerting: send a Slack/Teams notification if the matrix source is unavailable, then use the cached fallback.
🏭 Real-World Scenario
An open-source Node.js library with 2,000+ stars needed comprehensive cross-platform testing before every release. Here's how they implemented it:
Challenge:
- Support 5 OS targets (Ubuntu, Windows, macOS, Alpine, Amazon Linux) × 4 Node versions (18, 20, 22, 23)
- Node 23 was pre-release — failures should not block the pipeline
- PRs were getting rapid-fire commits, wasting runner minutes on stale checks
- Release builds needed to publish to npm only once, not per-matrix-job
Solution:
- Matrix with include/exclude: Base: 5 OS × 3 stable Node = 15 jobs. Include:
{os: ubuntu-latest, node: 23, experimental: true}. Exclude:{os: amazonlinux, node: 18}(EOL). Total: 15 jobs - fail-fast: false — maintainers wanted to see the full compatibility matrix in every run
- continue-on-error: ${{ matrix.experimental || false }} — Node 23 failures logged but didn't block
- Concurrency:
group: ci-${{ github.event.pull_request.number || github.ref }}withcancel-in-progress: true— only the latest push is tested - Fan-in publish job:
needs: [test]runs after all matrix jobs complete. Only triggers on tag push (if: startsWith(github.ref, 'refs/tags/v')) and publishes a single npm package - Dynamic nightly matrix: A scheduled workflow fetches the list of active Node versions from the Node.js release API and generates the matrix dynamically. No YAML changes needed when Node 24 drops
Results:
- PR check time reduced from 45 minutes to 12 minutes (concurrency cancellation + caching)
- Caught 3 platform-specific bugs in the first month that would have shipped to users
- Node 23 compatibility ready on day one of stable release — zero scramble
- Runner minutes dropped 60% from concurrency alone
📝 Summary
- Matrix strategy: Run the same job across every combination of OS, language, and config. Use
fail-fast: falsefor full visibility,max-parallelto control resource usage - Include / Exclude: Fine-tune the matrix — add experimental builds with
include, remove incompatible combos withexclude - Dynamic matrix: Generate the matrix from JSON files, APIs, or git diffs. Essential for monorepos and config-driven pipelines. Always use
jq -candfromJson() - Concurrency: Prevent duplicate runs with named groups. Use
cancel-in-progress: truefor PR checks,falsefor deployments. Always includegithub.refin the group name - Advanced patterns: Conditional jobs, path-based filtering, workflow chaining (
workflow_run), cross-repo dispatch, timeout and retry strategies - Orchestration: Fan-out/fan-in, canary, blue-green, feature flags — combine matrix, concurrency, and environments for production-grade pipelines