Build Log
How we build the AI operating layer that powers every engagement. Real work, shipped to production. Most entries are one line. Some become full posts.
April 2026
- Apr 12 Built Pluma, a document pipeline that turns Markdown into branded Google Docs. Same source file produces the website version and the client-facing version. Sharing is a separate, gated action.
- Apr 12 An agent fabricated quotes in a $3K/mo client-facing synthesis doc. Built a source fidelity gate: 7 new rules that mechanically block unverified claims from reaching clients. The model is not the problem. The guardrails are.
- Apr 12 Shipped the Jamie meeting transcription pipeline end to end. Webhook receiver with HMAC verification, per-speaker attribution, action items extracted and routed to the right agent automatically.
- Apr 12 Named the system. Andale is the AI operating layer. Arriba launches agent sessions. Pulse watches all inbound channels. Dispatch manages tasks. Scribe captures meetings. Pluma renders deliverables. Critique checks quality. Snitch audits outbound sends.
- Apr 11 Killed a potential Zoom AI Companion migration in an hour by verifying four independent kill criteria before writing a single line of code. The 30-second search that saves the 90-minute build.
- Apr 11 Ran a full market eval of bot-free meeting transcription tools in a single session. Reversed the verdict three times as new constraints surfaced. Ended on Jamie after ruling out Fireflies, Otter.ai (active wiretapping lawsuit), Granola (2-stream limit), and six others.
- Apr 11 Migrated all 7 agents from Slack to Telegram in a single afternoon. One new transport module, five script patches. The agents kept working through the transition.
- Apr 10 Shipped Client Mode Phase 1. Agent recommendations are now tier-scoped per client. A $1K/mo engagement gets one next step per session. A $15K/mo engagement gets five. The tier rules are injected at boot, not left to judgment.
- Apr 10 Pressure-tested a client's new monetization thesis against their own CRM data on day one of the engagement. The compiled context surfaced three inconsistencies between what the team believed and what their pipeline showed.
- Apr 9 Type a client codename and a dedicated Opus session opens with that client's entire vault: goals, product roadmap, meeting transcripts dating back to the first call, every email thread. Nothing else loaded. No cross-client bleed.
- Apr 9 Added session resume. Close a client session, reopen it the next day, and the agent picks up exactly where you left off with a lightweight delta of what changed overnight.
- Apr 9 CamTune (now Ojo) optimizes against what video call participants actually see, not the raw camera feed. Switched from camera capture to window capture. Your Zoom tile is the optimization target.
- Apr 8 Built a knowledge wiki using the Karpathy pattern. LLM-compiled reference articles from 638 Lenny Rachitsky episodes and 156 Marketing Examples posts. Agents cite frameworks from the wiki instead of hallucinating advice.
- Apr 8 Shipped 611 tests across the platform. Started at zero in late March. The test suite catches contract mismatches between pipeline components before they reach production.
- Apr 7 Built implicit performance tracking. A plugin detects approvals and corrections from natural conversation without any special syntax. 5,200 scored interactions across 7 agents. The data showed 56% approval in general chat, 98% in dedicated client sessions.
- Apr 7 Upgraded all 7 agents from Sonnet to Opus for interactive sessions. 192 corrections traced to model capability, not bad instructions. Heartbeats stay on a cheap model. Three-tier routing: Opus for client work, Sonnet for background tasks, DeepSeek for breathing.
- Apr 7 Pulse was feeding every agent all 323 contacts and 24 cross-domain projects. The marketing agent got fitness protocols. The Spanish tutor got deal pipelines. Added domain filtering so each agent sees only what belongs to its world.
- Apr 6 Distilled 192 agent corrections into 31 behavioral rules and deployed them as mechanical enforcement. A gateway plugin blocks unauthorized external sends before they leave the system. Rules in a prompt are suggestions. This is a wall.
- Apr 6 Diagnosed why agents kept asking me to explain my own life every 48 hours. Working memory had no persistent layer. Added durable knowledge sections that survive automatic context refreshes.
- Apr 6 Every agent now knows which topics belong to which peer and routes accordingly. Previously one of seven had explicit domain boundaries. The chief of staff was texting me about fitness supplements at 5 AM.
- Apr 6 An agent upgraded its own runtime from inside a Slack session and crashed the entire platform for 8 hours. Added a governance rule: package upgrades require a dedicated session with rollback.
- Apr 6 Built gateway cost tracking after the dashboard showed $146/day but only $3-5 was actually billed. The rest was covered by a flat-rate plan. Without separating billing tiers, every cost report was fiction.
- Apr 5 Built tab notifications for the terminal. When an agent finishes thinking, the tab lights up with a bell icon and a colored border. Small thing. Eliminated 30 minutes of daily context-switch overhead.
- Apr 5 Launched a family WhatsApp group with the chief-of-staff agent as coordinator. It recommends restaurants with walk-time estimates and handles scheduling. My wife uses it more than I do.
- Apr 5 One command shows full system health across 7 dimensions. Pipeline freshness, credential TTLs, agent payload staleness, delivery timeouts, contract test results. Replaced inspecting 17 scattered JSON files.
- Apr 3 Built a shared context layer so all 7 agents know about each other. Two tiers: universal awareness for everyone, business context only for agents that need it. No cross-domain bleed.
- Apr 3 Graded the platform across 8 pillars. Overall: C+. Architecture is B+ material dragged down by D+ observability and D+ engineering quality. Posted the grades publicly on the build log because the point is honesty, not polish.
- Apr 2 Type "piper" in any terminal and a session opens with that agent's full compiled context. Identity, architecture, domain knowledge, live task state. Zero file reads. The CLI is the interface.
- Apr 2 Swapped 7 agent heartbeats from a 671B parameter model to a 3B parameter model. Same job, 85% cost reduction. Low-reasoning tasks run 100x more often than deep work, so model choice there dominates cost.
March 2026
- Mar 30 Built two-stage CI/CD. Local commit hook under 15 seconds, remote GitHub Actions for the full suite. Path filtering so a docs change doesn't trigger a build test.
- Mar 28 Took the codebase from D- to A-. 213 bare exception handlers eliminated. 876 print() calls converted to structured logging. 200 unit tests from zero. 14 shared libraries extracted from 5 monolithic scripts.
- Mar 15 Shipped the intent engine. 31 detectors scan Gmail, Calendar, iMessage, WhatsApp, and task state every 5 minutes. Each agent gets a ranked priority queue of what needs attention right now. The system's nose for what matters.
- Mar 1 One 400-line SOP.md that teaches Claude permission to deploy autonomously. Features go from code to production in a single conversation. No staging environment, no manual deploys.
February 2026
- Feb 24 Built Flare after a single session burned 430M tokens in 90 minutes from an auto-compact loop. Three-layer monitoring: a Claude Code hook tracks usage in real time, a daemon projects pace against the billing window, and the agent sees the projection at boot.
- Feb 19 Built the boot compiler that gives each agent photographic recall. 350 lines of Python, runs every 5 minutes, produces per-agent payloads up to 306KB. Seven agents boot with full domain awareness without reading a single file.
- Feb 19 Built Pulse, the unified inbox. iMessage, WhatsApp, Apple Notes, Slack, Gmail, Granola normalized into one JSONL stream. 37 iMessages and 219 Apple Notes flowed in on the first run. Every agent reads one file.
January 2026
- Jan 20 Built Critique, a two-layer design quality gate. Code analysis checks source files for wrong patterns. Playwright captures screenshots in four variants and Claude vision judges the pixels. The design system document is the rule engine.