Intermediate Lesson 8 of 14

Artifacts & Caching

Speed up workflows with dependency caching and share build outputs between jobs using artifacts.

🧒 Simple Explanation (ELI5)

Imagine it's moving day.

Artifacts are like labeled boxes you pack up and store in the warehouse. You put test reports, build outputs, and coverage files into boxes with clear labels. Later, other people (other jobs) can go to the warehouse, find the right box by name, and unpack it. You can even drive to the warehouse yourself (the GitHub UI) and download a box. The warehouse keeps your boxes for 90 days by default, then recycles them.
Caching is like keeping your tools in a shed near the job site instead of driving to the hardware store every single morning. The first day you buy all the tools (install dependencies), but you leave them in the shed overnight. Next morning, you check the shed — if the tools are still there and your shopping list hasn't changed, you skip the hardware store entirely and go straight to work. If the list changed (lockfile updated), you make one quick trip and restock the shed.

Artifacts move outputs between jobs. Caching keeps dependencies between runs. Together, they turn a slow, repetitive workflow into a fast, efficient one.

📦 Artifacts

Artifacts let you persist data after a job completes and share it with other jobs in the same workflow — or download it later from the GitHub UI. They're ideal for build outputs, test reports, coverage files, and logs.

Upload & Download Actions

actions/upload-artifact@v4 — uploads files from a job to GitHub's artifact storage
actions/download-artifact@v4 — downloads previously uploaded artifacts into a job

What to Upload

Test results and coverage reports (JUnit XML, lcov, Cobertura)
Build binaries and compiled assets (dist/, .jar, .exe)
Logs and diagnostic output for debugging failed runs
Docker images exported as tarballs (docker save)

Retention & Limits

Default retention: 90 days (configurable per-upload via retention-days)
Org/repo settings can override the maximum retention period
Artifacts count against your GitHub Actions storage quota

Sharing Between Jobs

The most common pattern: a build job uploads the compiled output, and a deploy job (which needs: the build job) downloads it. This avoids rebuilding the same code in every job.

yaml

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build
      - uses: actions/upload-artifact@v4
        with:
          name: dist
          path: dist/
          retention-days: 7

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/download-artifact@v4
        with:
          name: dist
      - run: ls -la dist/

💡

Downloading from the UI

Every artifact uploaded during a workflow run is available on the run's Summary page under the Artifacts section. Click the artifact name to download a ZIP file — handy for grabbing test reports or build outputs without re-running the workflow.

⚠️

Artifact Names Must Be Unique

Within a single workflow run, each artifact must have a unique name. If two jobs upload artifacts with the same name, the second upload will fail. Use dynamic names (e.g., test-results-${{ matrix.os }}) when running matrix builds.

⚡ Dependency Caching

Caching stores files (like node_modules/ or pip wheels) between workflow runs so you don't re-download them every time. The actions/cache@v4 action is the core building block.

How It Works

key — unique identifier for the cache entry. If an exact match is found, the cache is restored
restore-keys — fallback prefixes. If no exact match, the most recent cache matching a prefix is restored (partial hit)
path — directory or files to cache

Cache Key Strategies

The best cache keys include the OS, the package manager name, and a hash of the lockfile. When the lockfile changes, a new cache is created. When it doesn't, you get an instant restore.

Ecosystem	Cache Key Pattern
Node.js	`${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}`
Python	`${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}`
Go	`${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}`
.NET	`${{ runner.os }}-nuget-${{ hashFiles('*/.csproj') }}`

Cache Limits

10 GB per repository — total across all cache entries
LRU eviction — when the limit is reached, the least recently used caches are deleted first
Caches not accessed within 7 days are automatically evicted

Setup Actions with Built-in Caching

Many official setup actions have a cache parameter that handles caching automatically — no need for a separate actions/cache step:

yaml

# Built-in caching — one line does it all
- uses: actions/setup-node@v4
  with:
    node-version: '20'
    cache: 'npm'

This is equivalent to manually configuring the cache, but much simpler. The action automatically determines the correct path and cache key.

Explicit Cache (Full Control)

When you need more control — custom paths, fallback keys, or caching something the setup action doesn't support — use actions/cache@v4 directly:

yaml

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-npm-

💡

restore-keys Fallback

When the exact key doesn't match, restore-keys provides prefix-based fallback. For example, Linux-npm- would match any previous cache for Linux npm, even if the lockfile hash differs. This gives you a "stale but close" cache that still saves significant download time — npm only fetches the diff.

📊 Artifacts vs Cache — Comparison

Feature	Artifacts	Cache
Purpose	Share outputs between jobs; download results	Speed up dependency installation across runs
Lifetime	1–90 days (configurable)	7 days since last access; LRU eviction
Size limit	Counts against Actions storage quota	10 GB per repository
Cross-workflow	Not shared between workflows (per-run)	Shared across all workflows in the repo
Cross-job	Yes — upload in one job, download in another	Yes — saved on completion, restored on start
Downloadable from UI	Yes — ZIP download from run summary	No — only restored within workflow runs
Typical use cases	Build binaries, test reports, coverage, logs	node_modules, pip packages, Go modules, Docker layers

🐳 Docker Layer Caching

Docker builds can be painfully slow when every layer is rebuilt from scratch. GitHub Actions supports GHA cache backend for Docker BuildKit, which caches individual layers and only rebuilds what changed.

cache-from / cache-to with GHA Backend

cache-from: type=gha — pull cached layers from GitHub Actions cache
cache-to: type=gha,mode=max — push all layers (including intermediate) to cache
mode=max caches every layer; mode=min only caches the final image layers

Example with docker/build-push-action

yaml

- uses: docker/setup-buildx-action@v3

- uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: myregistry.azurecr.io/myapp:${{ github.sha }}
    cache-from: type=gha
    cache-to: type=gha,mode=max

💡

Dramatic Speedup

Docker layer caching can reduce image build times from 5+ minutes to under 30 seconds when only application code changes (base image and dependency layers are cached). This is especially impactful for large images with heavy system dependencies.

⏱️ Performance Impact

Here's a typical before-and-after when adding caching and artifacts to a CI workflow:

text

BEFORE (no caching, no artifacts — 8 min total)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  install deps ████████████████  3 min
  build        ████████████      2 min
  test         ████████          1.5 min
  docker build ████████          1.5 min

AFTER (with caching + artifacts — 3 min total)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  restore cache ██               0.3 min  (npm cache hit)
  build         ████████████     2 min
  test          ████████         1.5 min  (parallel with build via artifacts)
  docker build  ██               0.3 min  (layer cache hit)
                                          ─── saved ~5 min (62%) ───

The biggest wins come from npm/pip cache restores (skipping dependency downloads) and Docker layer caching (skipping unchanged layers). Artifact sharing between jobs also enables parallelism — test can start as soon as build uploads its artifacts.

🛠️ Hands-on Lab

Lab 1: Upload Test Results as an Artifact

Create a workflow with a test job that runs npm test -- --reporters=junit (or any test framework that outputs JUnit XML)
Add actions/upload-artifact@v4 to upload the test results directory
Run the workflow and download the artifact from the run's Summary page
Verify the XML files are present in the downloaded ZIP

Lab 2: Add npm Caching

Add cache: 'npm' to your actions/setup-node@v4 step
Run the workflow twice — observe the first run says "Cache not found" and the second says "Cache restored"
Compare the npm ci duration: first run (full install) vs second run (cached)
Modify package-lock.json slightly and run again — observe a cache miss and new cache save

Lab 3: Compare Workflow Times

Create two workflow files: no-cache.yml (no caching) and with-cache.yml (npm + Docker caching)
Trigger both on the same commit using workflow_dispatch
Compare total run times on the Actions tab
Document the time savings in a comment on the PR

Lab 4: Artifact Sharing Between Jobs

Create a workflow with two jobs: build and deploy
In build, compile the app and upload dist/ as an artifact
In deploy, use needs: build and download the artifact
Verify the deploy job has access to the build output without re-compiling

💡

Measuring Cache Hits

Check the Post actions/cache step in your workflow logs. It will report "Cache hit" (exact match), "Cache restored from key" (prefix fallback), or "Cache not found." The actions/cache action also sets a cache-hit output you can use in conditional steps.

🐛 Debugging Common Issues

"Cache miss every time"

Key too specific: If your key includes a timestamp, commit SHA, or frequently changing value, every run generates a new key and never matches. Stick to lockfile hashes
Hash mismatch: The hashFiles() pattern might not match your actual lockfile location. Use **/package-lock.json to search recursively, or provide the exact path
Branch isolation: Caches are scoped to branches. A cache saved on main is available to feature branches, but a cache saved on feature-x is not available on main or other branches
7-day eviction: If no workflow accesses the cache for 7 days, it's automatically deleted

"Artifact not found"

Name mismatch: The name in upload-artifact must exactly match the name in download-artifact. This is case-sensitive
Different workflow: Artifacts are scoped to a single workflow run. You cannot download artifacts from a different workflow or a previous run using download-artifact alone (use the REST API for that)
Job ordering: Ensure the downloading job has needs: pointing to the uploading job — otherwise they may run in parallel and the artifact won't exist yet
Upload step failed: Check if the upload step actually succeeded — if the path doesn't match any files, the upload is silently skipped

"Cache size exceeded"

Trim cached paths: Cache only what's needed — ~/.npm instead of node_modules/ (npm ci can reconstruct from the cache)
Prune old caches: Use gh actions-cache list and gh actions-cache delete to manually clean up stale caches
Avoid caching build outputs: Use artifacts for build outputs; reserve cache for dependency files that are expensive to download but rarely change
Monitor usage: Check your repo's Actions cache usage under Settings → Actions → Caches

🎯 Interview Questions

Basic (5)

1. What is an artifact in GitHub Actions?

An artifact is a file or collection of files produced during a workflow run that can be persisted and shared between jobs. You upload artifacts using actions/upload-artifact and download them with actions/download-artifact or from the GitHub UI.

2. What is the purpose of caching in CI/CD workflows?

Caching stores dependencies (like node_modules or pip packages) between workflow runs so they don't need to be re-downloaded every time. This significantly reduces workflow execution time, especially for projects with large dependency trees.

3. How long are artifacts retained by default?

By default, artifacts are retained for 90 days. You can override this per-upload using the retention-days parameter, and organization/repository settings can enforce maximum retention periods.

4. What's the difference between artifacts and caches?

Artifacts share outputs between jobs within a run (build binaries, test reports) and are downloadable from the UI. Caches persist dependencies between runs (npm packages, pip wheels) to speed up installations. They have different lifetimes, size limits, and use cases.

5. How do you enable built-in caching in actions/setup-node?

Add cache: 'npm' (or 'yarn' or 'pnpm') to the with: block. The action automatically determines the cache path and generates a key from the lockfile hash.

Intermediate (5)

6. Explain how restore-keys works as a fallback mechanism.

When the exact key doesn't match any cached entry, restore-keys provides prefix-based fallback. GitHub searches for the most recent cache whose key starts with the given prefix. This gives you a "stale but close" cache — the package manager then only downloads the difference, which is much faster than a full install.

7. What is the cache size limit per repository, and how is eviction handled?

The limit is 10 GB per repository across all cache entries. When the limit is reached, GitHub uses LRU (Least Recently Used) eviction — the caches accessed longest ago are deleted first. Caches not accessed within 7 days are also automatically evicted.

8. How do you share build outputs between jobs without rebuilding?

The build job uploads the output directory as an artifact using actions/upload-artifact. Downstream jobs declare needs: build and use actions/download-artifact to retrieve the output. This avoids redundant compilation across jobs.

9. Why is caching ~/.npm preferred over caching node_modules/?

~/.npm is the npm cache directory containing downloaded tarballs. npm ci uses this cache to avoid network downloads but still creates a clean node_modules/ from the lockfile. Caching node_modules/ directly can lead to stale or inconsistent dependencies if the lockfile changes.

10. What happens when an artifact upload step matches no files?

By default, actions/upload-artifact@v4 will warn but not fail if the path matches no files (depending on the if-no-files-found parameter). You can set if-no-files-found: error to make the step fail explicitly, which is recommended so you don't silently lose artifacts.

Senior (5)

11. Design a cache key strategy for a monorepo with multiple services using different package managers.

Use a composite key that includes the service name, OS, package manager, and lockfile hash: ${{ runner.os }}-<service>-npm-${{ hashFiles('services/<service>/package-lock.json') }}. Each service gets its own cache entry, so changes to one service don't invalidate another's cache. Use restore-keys with the service prefix for partial hits. For shared dependencies, consider a separate cache entry for the root lockfile.

12. How does Docker layer caching with type=gha work, and what's the difference between mode=min and mode=max?

The GHA cache backend stores Docker BuildKit layers in GitHub Actions cache. mode=min caches only the layers of the final stage — useful for simple Dockerfiles. mode=max caches all layers including intermediate multi-stage build stages — essential for multi-stage builds where base/dependency stages rarely change. mode=max uses more cache space but provides much better hit rates for complex Dockerfiles.

13. A workflow's cache hit rate dropped from 95% to 10% after a refactor. What do you investigate?

Check: (1) Did lockfile paths change? hashFiles() may no longer match. (2) Did the repo structure change (monorepo reorganization)? The glob pattern may need updating. (3) Were cache keys refactored to include new variables that change frequently? (4) Did branch strategy change? Caches are branch-scoped. (5) Check gh actions-cache list to see what caches exist and their keys. (6) Verify restore-keys still provide valid prefixes.

14. How would you manage the 10 GB cache limit in a large repository with many workflows?

Audit existing caches with gh actions-cache list --sort size. Trim cached paths to essentials (cache ~/.npm not node_modules/). Use granular keys so stale entries get evicted naturally. Implement a scheduled workflow that runs gh actions-cache delete for caches older than a threshold. Avoid caching build outputs (use artifacts instead). For Docker, use mode=min if mode=max consumes too much space. Consider splitting large caches by concern (deps vs build tools).

15. Explain how you'd implement an end-to-end CI pipeline that uses both artifacts and caching optimally.

Structure the pipeline in parallel jobs connected by artifacts: (1) Install job — restore npm cache, run npm ci, upload node_modules as artifact. (2) Lint, Test, Build jobs run in parallel, each downloading the artifact. (3) Build uploads dist/ as artifact. (4) Docker job downloads dist/, builds image with GHA layer cache. (5) Deploy job runs after all checks pass. Caching handles cross-run dependency persistence; artifacts handle cross-job data sharing within a run. This maximizes parallelism while minimizing redundant work.

🏭 Real-World Scenario

A team's CI pipeline for a React + Node.js monorepo was taking 12 minutes per push. Every job re-installed 800+ npm packages from scratch, rebuilt the Docker image from the base layer, and the test job re-compiled the app before running tests.

After optimization:

npm caching with actions/setup-node + cache: 'npm' reduced dependency install from 3 min to 15 seconds
Docker layer caching with cache-from: type=gha cut image build from 4 min to 40 seconds (only the COPY . . layer changed)
Artifact sharing — the build job uploads dist/, and both the test and deploy jobs download it instead of rebuilding. This also enabled running tests in parallel with deployment prep
Explicit cache keys with restore-keys fallback ensured even branch-first runs got partial cache hits from main

Result: 12 minutes → 4 minutes — a 67% reduction. The team estimates this saves ~200 developer-hours per month in waiting time across 50+ daily pushes. The cache hit rate stabilized at 93%, and artifact storage costs remained negligible thanks to 7-day retention on non-essential uploads.

📝 Summary

Artifacts (upload-artifact / download-artifact) persist build outputs between jobs and are downloadable from the GitHub UI — default retention is 90 days
Caching (actions/cache) stores dependencies between runs using key/restore-keys matching — 10 GB limit per repo with LRU eviction
Cache keys should combine OS + package manager + lockfile hash for optimal hit rates with restore-keys prefix fallback
Setup actions like actions/setup-node offer built-in cache: for zero-config caching
Docker layer caching with cache-from: type=gha / cache-to: type=gha,mode=max dramatically speeds up image builds
Artifacts share data between jobs within a run; caches persist data between runs — use both for maximum efficiency
Cache ~/.npm (not node_modules/), use unique artifact names in matrix builds, and monitor cache usage to stay within limits
A well-optimized pipeline combining caching + artifacts + Docker layer cache can cut CI times by 50–70%

← Reusable Workflows Deploy to AKS with Helm →

← Back to GitHub Actions Course