Intermediate Lesson 8 of 14

Artifacts & Caching

Speed up workflows with dependency caching and share build outputs between jobs using artifacts.

🧒 Simple Explanation (ELI5)

Imagine it's moving day.

Artifacts move outputs between jobs. Caching keeps dependencies between runs. Together, they turn a slow, repetitive workflow into a fast, efficient one.

📦 Artifacts

Artifacts let you persist data after a job completes and share it with other jobs in the same workflow — or download it later from the GitHub UI. They're ideal for build outputs, test reports, coverage files, and logs.

Upload & Download Actions

What to Upload

Retention & Limits

Sharing Between Jobs

The most common pattern: a build job uploads the compiled output, and a deploy job (which needs: the build job) downloads it. This avoids rebuilding the same code in every job.

yaml
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build
      - uses: actions/upload-artifact@v4
        with:
          name: dist
          path: dist/
          retention-days: 7

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/download-artifact@v4
        with:
          name: dist
      - run: ls -la dist/
💡
Downloading from the UI

Every artifact uploaded during a workflow run is available on the run's Summary page under the Artifacts section. Click the artifact name to download a ZIP file — handy for grabbing test reports or build outputs without re-running the workflow.

⚠️
Artifact Names Must Be Unique

Within a single workflow run, each artifact must have a unique name. If two jobs upload artifacts with the same name, the second upload will fail. Use dynamic names (e.g., test-results-${{ matrix.os }}) when running matrix builds.

⚡ Dependency Caching

Caching stores files (like node_modules/ or pip wheels) between workflow runs so you don't re-download them every time. The actions/cache@v4 action is the core building block.

How It Works

Cache Key Strategies

The best cache keys include the OS, the package manager name, and a hash of the lockfile. When the lockfile changes, a new cache is created. When it doesn't, you get an instant restore.

EcosystemCache Key Pattern
Node.js${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
Python${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
Go${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
.NET${{ runner.os }}-nuget-${{ hashFiles('**/*.csproj') }}

Cache Limits

Setup Actions with Built-in Caching

Many official setup actions have a cache parameter that handles caching automatically — no need for a separate actions/cache step:

yaml
# Built-in caching — one line does it all
- uses: actions/setup-node@v4
  with:
    node-version: '20'
    cache: 'npm'

This is equivalent to manually configuring the cache, but much simpler. The action automatically determines the correct path and cache key.

Explicit Cache (Full Control)

When you need more control — custom paths, fallback keys, or caching something the setup action doesn't support — use actions/cache@v4 directly:

yaml
- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-npm-
💡
restore-keys Fallback

When the exact key doesn't match, restore-keys provides prefix-based fallback. For example, Linux-npm- would match any previous cache for Linux npm, even if the lockfile hash differs. This gives you a "stale but close" cache that still saves significant download time — npm only fetches the diff.

📊 Artifacts vs Cache — Comparison

FeatureArtifactsCache
PurposeShare outputs between jobs; download resultsSpeed up dependency installation across runs
Lifetime1–90 days (configurable)7 days since last access; LRU eviction
Size limitCounts against Actions storage quota10 GB per repository
Cross-workflowNot shared between workflows (per-run)Shared across all workflows in the repo
Cross-jobYes — upload in one job, download in anotherYes — saved on completion, restored on start
Downloadable from UIYes — ZIP download from run summaryNo — only restored within workflow runs
Typical use casesBuild binaries, test reports, coverage, logsnode_modules, pip packages, Go modules, Docker layers

🐳 Docker Layer Caching

Docker builds can be painfully slow when every layer is rebuilt from scratch. GitHub Actions supports GHA cache backend for Docker BuildKit, which caches individual layers and only rebuilds what changed.

cache-from / cache-to with GHA Backend

Example with docker/build-push-action

yaml
- uses: docker/setup-buildx-action@v3

- uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: myregistry.azurecr.io/myapp:${{ github.sha }}
    cache-from: type=gha
    cache-to: type=gha,mode=max
💡
Dramatic Speedup

Docker layer caching can reduce image build times from 5+ minutes to under 30 seconds when only application code changes (base image and dependency layers are cached). This is especially impactful for large images with heavy system dependencies.

⏱️ Performance Impact

Here's a typical before-and-after when adding caching and artifacts to a CI workflow:

text
BEFORE (no caching, no artifacts — 8 min total)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  install deps ████████████████  3 min
  build        ████████████      2 min
  test         ████████          1.5 min
  docker build ████████          1.5 min

AFTER (with caching + artifacts — 3 min total)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  restore cache ██               0.3 min  (npm cache hit)
  build         ████████████     2 min
  test          ████████         1.5 min  (parallel with build via artifacts)
  docker build  ██               0.3 min  (layer cache hit)
                                          ─── saved ~5 min (62%) ───

The biggest wins come from npm/pip cache restores (skipping dependency downloads) and Docker layer caching (skipping unchanged layers). Artifact sharing between jobs also enables parallelism — test can start as soon as build uploads its artifacts.

🛠️ Hands-on Lab

Lab 1: Upload Test Results as an Artifact

  1. Create a workflow with a test job that runs npm test -- --reporters=junit (or any test framework that outputs JUnit XML)
  2. Add actions/upload-artifact@v4 to upload the test results directory
  3. Run the workflow and download the artifact from the run's Summary page
  4. Verify the XML files are present in the downloaded ZIP

Lab 2: Add npm Caching

  1. Add cache: 'npm' to your actions/setup-node@v4 step
  2. Run the workflow twice — observe the first run says "Cache not found" and the second says "Cache restored"
  3. Compare the npm ci duration: first run (full install) vs second run (cached)
  4. Modify package-lock.json slightly and run again — observe a cache miss and new cache save

Lab 3: Compare Workflow Times

  1. Create two workflow files: no-cache.yml (no caching) and with-cache.yml (npm + Docker caching)
  2. Trigger both on the same commit using workflow_dispatch
  3. Compare total run times on the Actions tab
  4. Document the time savings in a comment on the PR

Lab 4: Artifact Sharing Between Jobs

  1. Create a workflow with two jobs: build and deploy
  2. In build, compile the app and upload dist/ as an artifact
  3. In deploy, use needs: build and download the artifact
  4. Verify the deploy job has access to the build output without re-compiling
💡
Measuring Cache Hits

Check the Post actions/cache step in your workflow logs. It will report "Cache hit" (exact match), "Cache restored from key" (prefix fallback), or "Cache not found." The actions/cache action also sets a cache-hit output you can use in conditional steps.

🐛 Debugging Common Issues

"Cache miss every time"

"Artifact not found"

"Cache size exceeded"

🎯 Interview Questions

Basic (5)

1. What is an artifact in GitHub Actions?

An artifact is a file or collection of files produced during a workflow run that can be persisted and shared between jobs. You upload artifacts using actions/upload-artifact and download them with actions/download-artifact or from the GitHub UI.

2. What is the purpose of caching in CI/CD workflows?

Caching stores dependencies (like node_modules or pip packages) between workflow runs so they don't need to be re-downloaded every time. This significantly reduces workflow execution time, especially for projects with large dependency trees.

3. How long are artifacts retained by default?

By default, artifacts are retained for 90 days. You can override this per-upload using the retention-days parameter, and organization/repository settings can enforce maximum retention periods.

4. What's the difference between artifacts and caches?

Artifacts share outputs between jobs within a run (build binaries, test reports) and are downloadable from the UI. Caches persist dependencies between runs (npm packages, pip wheels) to speed up installations. They have different lifetimes, size limits, and use cases.

5. How do you enable built-in caching in actions/setup-node?

Add cache: 'npm' (or 'yarn' or 'pnpm') to the with: block. The action automatically determines the cache path and generates a key from the lockfile hash.

Intermediate (5)

6. Explain how restore-keys works as a fallback mechanism.

When the exact key doesn't match any cached entry, restore-keys provides prefix-based fallback. GitHub searches for the most recent cache whose key starts with the given prefix. This gives you a "stale but close" cache — the package manager then only downloads the difference, which is much faster than a full install.

7. What is the cache size limit per repository, and how is eviction handled?

The limit is 10 GB per repository across all cache entries. When the limit is reached, GitHub uses LRU (Least Recently Used) eviction — the caches accessed longest ago are deleted first. Caches not accessed within 7 days are also automatically evicted.

8. How do you share build outputs between jobs without rebuilding?

The build job uploads the output directory as an artifact using actions/upload-artifact. Downstream jobs declare needs: build and use actions/download-artifact to retrieve the output. This avoids redundant compilation across jobs.

9. Why is caching ~/.npm preferred over caching node_modules/?

~/.npm is the npm cache directory containing downloaded tarballs. npm ci uses this cache to avoid network downloads but still creates a clean node_modules/ from the lockfile. Caching node_modules/ directly can lead to stale or inconsistent dependencies if the lockfile changes.

10. What happens when an artifact upload step matches no files?

By default, actions/upload-artifact@v4 will warn but not fail if the path matches no files (depending on the if-no-files-found parameter). You can set if-no-files-found: error to make the step fail explicitly, which is recommended so you don't silently lose artifacts.

Senior (5)

11. Design a cache key strategy for a monorepo with multiple services using different package managers.

Use a composite key that includes the service name, OS, package manager, and lockfile hash: ${{ runner.os }}-<service>-npm-${{ hashFiles('services/<service>/package-lock.json') }}. Each service gets its own cache entry, so changes to one service don't invalidate another's cache. Use restore-keys with the service prefix for partial hits. For shared dependencies, consider a separate cache entry for the root lockfile.

12. How does Docker layer caching with type=gha work, and what's the difference between mode=min and mode=max?

The GHA cache backend stores Docker BuildKit layers in GitHub Actions cache. mode=min caches only the layers of the final stage — useful for simple Dockerfiles. mode=max caches all layers including intermediate multi-stage build stages — essential for multi-stage builds where base/dependency stages rarely change. mode=max uses more cache space but provides much better hit rates for complex Dockerfiles.

13. A workflow's cache hit rate dropped from 95% to 10% after a refactor. What do you investigate?

Check: (1) Did lockfile paths change? hashFiles() may no longer match. (2) Did the repo structure change (monorepo reorganization)? The glob pattern may need updating. (3) Were cache keys refactored to include new variables that change frequently? (4) Did branch strategy change? Caches are branch-scoped. (5) Check gh actions-cache list to see what caches exist and their keys. (6) Verify restore-keys still provide valid prefixes.

14. How would you manage the 10 GB cache limit in a large repository with many workflows?

Audit existing caches with gh actions-cache list --sort size. Trim cached paths to essentials (cache ~/.npm not node_modules/). Use granular keys so stale entries get evicted naturally. Implement a scheduled workflow that runs gh actions-cache delete for caches older than a threshold. Avoid caching build outputs (use artifacts instead). For Docker, use mode=min if mode=max consumes too much space. Consider splitting large caches by concern (deps vs build tools).

15. Explain how you'd implement an end-to-end CI pipeline that uses both artifacts and caching optimally.

Structure the pipeline in parallel jobs connected by artifacts: (1) Install job — restore npm cache, run npm ci, upload node_modules as artifact. (2) Lint, Test, Build jobs run in parallel, each downloading the artifact. (3) Build uploads dist/ as artifact. (4) Docker job downloads dist/, builds image with GHA layer cache. (5) Deploy job runs after all checks pass. Caching handles cross-run dependency persistence; artifacts handle cross-job data sharing within a run. This maximizes parallelism while minimizing redundant work.

🏭 Real-World Scenario

A team's CI pipeline for a React + Node.js monorepo was taking 12 minutes per push. Every job re-installed 800+ npm packages from scratch, rebuilt the Docker image from the base layer, and the test job re-compiled the app before running tests.

After optimization:

Result: 12 minutes → 4 minutes — a 67% reduction. The team estimates this saves ~200 developer-hours per month in waiting time across 50+ daily pushes. The cache hit rate stabilized at 93%, and artifact storage costs remained negligible thanks to 7-day retention on non-essential uploads.

📝 Summary

← Back to GitHub Actions Course