·10 min read

How We Cut Our CI Pipeline from 45 Minutes to 8 Minutes

Our CI pipeline was the bottleneck every developer complained about. Here's how we broke it down, parallelized it, and what we learned about caching strategies along the way.

The Problem

It started as a joke in standup: “Go grab coffee, the CI will be done by the time you’re back.” Then it stopped being funny. Our CI pipeline was taking 45 minutes on a good day. On bad days — dependency cache misses, flaky tests, runner queue waits — it pushed past an hour.

The pipeline was a monolith: one workflow file, one job, everything sequential. Lint, type-check, unit tests, integration tests, build, e2e tests, deploy preview. All in series.

The Investigation

We started by instrumenting every step:

Step                    Duration    Cache hit?
──────────────────────────────────────────────
Install dependencies    4m 20s      ❌ (miss)
Lint                    2m 10s      N/A
Type check              3m 45s      N/A
Unit tests              8m 30s      N/A
Build                   6m 15s      ❌ (miss)
Integration tests       12m 40s     N/A
E2E tests               7m 20s      N/A
──────────────────────────────────────────────
Total:                  45m 00s

Two things jumped out:

  1. Dependency caching was broken — we were missing 60% of the time
  2. Everything ran in sequence — no parallelism at all

The Solution

Step 1: Fix Dependency Caching

The root cause was naive cache keys. We were using yarn.lock hash, but every PR branch changed the lockfile. The fix was multi-layered:

- name: Cache dependencies
  uses: actions/cache@v4
  with:
    path: |
      **/node_modules
      ~/.cache/yarn
    key: >
      ${{ runner.os }}-yarn-
      ${{ hashFiles('yarn.lock') }}-
      ${{ github.base_ref }}
    restore-keys: |
      ${{ runner.os }}-yarn-${{ hashFiles('yarn.lock') }}-
      ${{ runner.os }}-yarn-

The key insight: we added github.base_ref to the key so PRs against main could share a cache, while restore-keys provided fallback to any prior cache. Cache hit rate went from 40% to 92%.

Step 2: Parallel Job Decomposition

We broke the monolith into parallel jobs with explicit dependencies:

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/setup
      - run: yarn lint

  typecheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/setup
      - run: yarn typecheck

  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/setup
      - run: yarn test:unit

  build:
    needs: [lint, typecheck, unit-tests]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/setup
      - run: yarn build

Step 3: Test Splitting

Unit tests were the longest parallelizable step. We split them across 4 runners:

unit-tests:
  strategy:
    matrix:
      shard: [1, 2, 3, 4]
    steps:
      - run: yarn test:unit --shard=${{ matrix.shard }}/4

The Results

Metric Before After
Pipeline duration 45m 8m
Cache hit rate 40% 92%
Runner minutes/day 1,350 240
Developer wait time ~3h/day ~30min/day

Trade-offs

What we lost:

  • Simplicity — One workflow file became a directory of composable actions
  • Debugging ease — Parallel jobs make it harder to trace failures
  • Runner cost — More parallel runners = more concurrent minutes

What we learned:

  • Cache key design is the most impactful optimization you can make in CI
  • Test splitting should be adaptive, not fixed
  • Always measure before optimizing — our “slow build” assumption was wrong; dependency install was the real culprit