The CI Pipeline I Actually Want

I wrote previously about giving Claude permission to deploy. That post was about trust. This one is about the safety net that makes that trust rational.

I just shipped a CI pipeline for Claudia that covers iOS, web, backend, and database — all with the same three-layer pattern. It took a conversation, not a sprint. Here's the design behind it, and the books that informed it.


The Problem With Most CI Setups

Most CI pipelines are accidental. Someone adds a lint step. Someone else adds a test runner. Six months later you have a YAML file nobody understands and a Slack channel full of red notifications nobody reads.

The failure mode isn't "no CI." It's CI that doesn't earn trust. Tests that flake. Checks that run on every push regardless of what changed. Pipelines that take 20 minutes so people stop waiting for them.

I wanted something different: a pipeline I'd actually trust with my deploying-AI-agent workflow. If Claude is going to ship code autonomously, the safety net has to be real.


The Design: Two Stages, Not One

Martin Fowler's CI writing describes a two-stage model that most teams skip:

Stage 1: "Commit build." Fast. Runs locally. Catches the obvious stuff before code leaves your machine. Target: under 15 seconds.

Stage 2: "Secondary build." Thorough. Runs on CI. Takes minutes. Does everything you can't afford to do locally.

Most teams only have Stage 2. They push, wait, context-switch, come back, see red, fix, push again. The feedback loop is measured in minutes.

With both stages, the feedback loop for common mistakes is seconds. The red notification on CI becomes rare — and when it fires, you take it seriously.

Here's what that looks like concretely:

Local (< 15 sec):

Remote (GitHub Actions):

The local hook detects which files changed and only runs relevant checks. Push a Swift file? You don't wait for npm test. Push a migration? You don't wait for SwiftLint.


Same Three Layers, Every Surface

This is the structural idea that made everything click. Every surface — iOS, web, backend, database — gets the same three layers:

1. Lint — Is the code well-formed?
2. Test — Does the code work?
3. Safety net — Will we notice when something drifts?

For iOS, that's SwiftLint → unit tests → snapshot tests.
For web, that's ESLint → Vitest → Playwright screenshots.
For backend, that's Deno lint → function tests → migration safety lint.
For database, that's migration safety lint → RLS policy tests → schema drift detection.

Same pattern. Four times. No surface gets special treatment, no surface gets forgotten.


SwiftLint Baseline: How to Add Linting to an Existing Codebase

One of the best ideas I borrowed: SwiftLint's baseline feature.

The problem with adding a linter to a mature codebase is obvious. You turn it on, get 344 warnings, feel bad, turn it off. Nobody wants to stop feature work for a week of lint fixes.

The baseline freezes all existing violations. Day one: 344 warnings, 0 failures. Day two: if you add one new violation, you get exactly 1 failure. The ratchet only moves forward.

This is the only sane way to adopt a linter retroactively. You get the benefit for all new code immediately, and you can chip away at the baseline over time — or not. Your choice.


Migration Safety Lint: The Check Nobody Writes

Every project has migration files. Almost nobody lints them.

I wrote a 60-line bash script that catches:

These aren't style issues. These are "you just deleted production data" issues. The script runs locally in milliseconds and in CI on every push that touches a migration file.

Most teams discover these problems in production. At 2am. On a Friday.


Path Filtering: Don't Waste Money

GitHub Actions macOS runners cost $0.08/minute. An iOS build takes 5-10 minutes. That's $0.40-0.80 per run.

If your iOS CI runs on every push — including documentation changes, web fixes, migration updates — you're burning money and training your team to ignore CI.

Path filtering fixes this. Each workflow declares which paths trigger it:

paths:
  - 'Claudia/**'
  - 'ClaudiaTests/**'
  - '.swiftlint.yml'

Push a web change? The iOS workflow doesn't run. Push a migration? Only the backend workflow fires. Monthly CI cost: under $15 at 5-10 pushes per week.


What Kent Beck Taught Me About Tests

Kent Beck's Test Desiderata lists 12 properties of good tests. I used six as a checklist for every test I wrote:

  1. Fast — unit tests under 100ms each. My full web suite runs in 770ms.
  2. Isolated — no shared state, no ordering dependency. Every test creates its own data.
  3. Deterministic — same input, same result. No network calls, no Date.now() in assertions.
  4. Behavioral — tests behavior, not implementation. If I rename a private method, no tests break.
  5. Structure-insensitive — tests survive refactoring. They test "what" not "how."
  6. Readable — the test name describes the scenario. "Monthly: Jan 31 advances to Feb 28 in non-leap year" tells you exactly what's being verified.

The temptation with AI-generated tests is to go for quantity. 200 tests that cover every method signature. Beck's framework says: fewer tests that are actually good. Tests you trust enough to deploy when they're green.


The Failure Protocol

Fowler's most quotable CI principle: "Nobody has a higher priority task than fixing the build."

When CI goes red on main, the next push must fix it or revert the breaking change. Don't stack more changes on a red main. Don't open a ticket. Don't "get to it later."

This sounds dogmatic. It is. That's the point. A red build that stays red for a day teaches your team that CI is optional. A red build that gets fixed in the next commit teaches your team that CI is the floor.

For a solo operator or small team, this is even more important. There's no one else to notice. If you let a red build slide, it slides forever.


The DORA Connection

Google's DORA research defines four metrics for elite engineering teams:

A good CI pipeline directly enables all four. Fast feedback → confident deploys → more deploys per day → shorter lead time. Comprehensive checks → fewer broken deploys → lower change failure rate. Automated testing → faster diagnosis → faster restore.

The 2025 DORA report adds an uncomfortable finding: "AI doesn't fix a team; it amplifies what's already there." If your pipeline is solid, AI makes you faster. If your pipeline is missing, AI makes you break things faster.

This is why I built the pipeline before leaning harder into autonomous deployment. The safety net has to exist before you need it.


What I Skipped (For Now)

Charity Majors argues that pre-production testing has diminishing returns. The real safety comes from observability in production: feature flags, canary deploys, structured logging, SLOs.

She's right. But she's talking about teams with production traffic. Claudia isn't there yet. When it is, the pipeline extends naturally:

The pipeline is the foundation. Observability is the next floor.


The Meta-Point

This pipeline took one Claude Code session to design and implement. Not because the work is trivial — it's 21 new files across 5 phases. But because the design was clear before I started.

That's the leverage of reading Fowler, Beck, and the DORA reports before writing YAML. The frameworks compress decades of CI/CD experience into principles you can apply in an afternoon.

Most developers learn CI by copying someone else's workflow file and tweaking it until the tests pass. That gives you a pipeline. Reading the sources gives you a pipeline you can reason about, extend, and trust.


Try This

If you're building a multi-surface product (mobile + web + backend):

  1. Two stages. Local hook for fast feedback, CI for thorough checks.
  2. Same layers everywhere. Lint, test, safety net — every surface.
  3. Path filter everything. Don't run iOS CI on web changes.
  4. Baseline your linter. Freeze existing violations, enforce on new code.
  5. Lint your migrations. 60 lines of bash. Catches real problems.
  6. Read Beck's Test Desiderata. Then delete half the tests you were going to write.

The code is in the Claudia repo. Adapt it for your stack.


Sources

These are worth reading in full — not just for CI, but for how to think about software quality: