Shipping agents in 6 weeks the boutique cadence and where the time actually goes.

the boutique cadence and where the time actually goes.

52.2297° N, 21.0122° E · Warsaw

TL;DR

Six weeks is a forcing function, not a comfortable timeline. The model is 10% of the work — data access, authentication, evaluation, and observability are the other 90%. Four phases: scope lock (week 1), build the skeleton on real data (weeks 2–3), harden and test with real users (weeks 4–5), document and hand over (week 6). Triple your estimate for data access time, then double it again.

Why six weeks?

Six weeks is not a comfortable number. It’s a forcing function.

When I started structuring mid-market agentic AI engagements, I experimented with longer timelines — eight weeks, twelve, six months. The longer the timeline, the more the work expanded to fill the space. A twelve-week engagement became a twelve-week discovery phase with a two-week build at the end, shipped under pressure and missing most of the original scope.

Six weeks is short enough that there’s no room for a discovery phase that replaces the build. It’s long enough to ship something real. That combination is harder to achieve than it sounds.

“The six-week constraint saved us. Without it, we would have spent three months mapping processes we already understood.”

— from a project retrospective, Jan 2026

The four phases

Week 1: Scope lock

One workflow. One owner. One system. That’s the entire output of week one. We write it on a single page — the workflow, the input data, the output action, the human-in-the-loop fallback, the production owner’s name and number.

If we can’t fit it on one page, the scope isn’t narrow enough yet.

Weeks 2–3: Build the skeleton

We build the agent end-to-end in the narrowest possible form. It runs on real data, in a staging environment that mirrors production as closely as possible. It doesn’t have to be good. It has to be real.

This phase is where most clients get surprised. They expect the model to be the hard part. It never is. The hard part is the data access, the authentication, the format normalization, the edge cases that don’t show up in the first ten test runs.

KEY INSIGHT

The model is 10% of the work. The surrounding system — data, auth, evals, handoffs — is 90%.

Weeks 4–5: Harden and test

This is where we add the things that make the agent trustworthy in production: evaluation loops, monitoring, guardrails, the escalation path when the agent hits an edge case it can’t handle.

We also run the first real users through the system in this phase. Real feedback from real users is more valuable than any amount of internal testing.

Week 6: Hand over

We document everything. Runbooks. Alert thresholds. The three edge cases we know about. The one the production owner needs to watch for. We do a live handover session, and we set the retainer terms for ongoing support.

Where the time actually goes

In a typical six-week engagement, the breakdown looks roughly like this:

Data access and normalization — about 30% of total engineering time
Evaluation setup and testing — about 25%
Core agent logic — about 20%
Monitoring and observability — about 15%
Documentation and handover — about 10%

The model itself — the prompts, the context window management, the output parsing — accounts for maybe 15% of total time. Everything else is infrastructure.

What this means for your first pilot

If you’re scoping your first agentic AI pilot, plan for six weeks. Not because it’s always enough — sometimes it isn’t — but because it forces the decisions that make shipping possible. Scope lock. Named owner. Production-first mindset. Real data from day one.

And when you’re estimating the engineering time, triple the time you think you’ll spend on data access. Then double it again.

Work with me

Want to talk through this?

30-minute call with Antoni. No pitch, no deck — just the conversation this essay opened.

Field dispatches

Get the next dispatch.

One essay a week from an active operator. Unsubscribe whenever.

4,200+ operators · 67 countries