How We Cut Our CI Pipeline from 45 Minutes to 8 Minutes
Our CI pipeline was the bottleneck every developer complained about. Here's how we broke it down, parallelized it, and what we learned about caching strategies along the way.
The Problem
It started as a joke in standup: “Go grab coffee, the CI will be done by the time you’re back.” Then it stopped being funny. Our CI pipeline was taking 45 minutes on a good day. On bad days — dependency cache misses, flaky tests, runner queue waits — it pushed past an hour.
The pipeline was a monolith: one workflow file, one job, everything sequential. Lint, type-check, unit tests, integration tests, build, e2e tests, deploy preview. All in series.
The Investigation
We started by instrumenting every step:
Step Duration Cache hit?
──────────────────────────────────────────────
Install dependencies 4m 20s ❌ (miss)
Lint 2m 10s N/A
Type check 3m 45s N/A
Unit tests 8m 30s N/A
Build 6m 15s ❌ (miss)
Integration tests 12m 40s N/A
E2E tests 7m 20s N/A
──────────────────────────────────────────────
Total: 45m 00s
Two things jumped out:
- Dependency caching was broken — we were missing 60% of the time
- Everything ran in sequence — no parallelism at all
The Solution
Step 1: Fix Dependency Caching
The root cause was naive cache keys. We were using yarn.lock hash, but every PR branch changed the lockfile. The fix was multi-layered:
- name: Cache dependencies
uses: actions/cache@v4
with:
path: |
**/node_modules
~/.cache/yarn
key: >
${{ runner.os }}-yarn-
${{ hashFiles('yarn.lock') }}-
${{ github.base_ref }}
restore-keys: |
${{ runner.os }}-yarn-${{ hashFiles('yarn.lock') }}-
${{ runner.os }}-yarn-
The key insight: we added github.base_ref to the key so PRs against main could share a cache, while restore-keys provided fallback to any prior cache. Cache hit rate went from 40% to 92%.
Step 2: Parallel Job Decomposition
We broke the monolith into parallel jobs with explicit dependencies:
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- run: yarn lint
typecheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- run: yarn typecheck
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- run: yarn test:unit
build:
needs: [lint, typecheck, unit-tests]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- run: yarn build
Step 3: Test Splitting
Unit tests were the longest parallelizable step. We split them across 4 runners:
unit-tests:
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- run: yarn test:unit --shard=${{ matrix.shard }}/4
The Results
| Metric | Before | After |
|---|---|---|
| Pipeline duration | 45m | 8m |
| Cache hit rate | 40% | 92% |
| Runner minutes/day | 1,350 | 240 |
| Developer wait time | ~3h/day | ~30min/day |
Trade-offs
What we lost:
- Simplicity — One workflow file became a directory of composable actions
- Debugging ease — Parallel jobs make it harder to trace failures
- Runner cost — More parallel runners = more concurrent minutes
What we learned:
- Cache key design is the most impactful optimization you can make in CI
- Test splitting should be adaptive, not fixed
- Always measure before optimizing — our “slow build” assumption was wrong; dependency install was the real culprit