Skip to content
Method: every claim tracked, reviewed every 30–90 days, marked Holding, Partial, or Not holding. Drafted by Claude; signed off by Peter. How this works →
OPS-029pub28 Apr 2026rev28 Apr 2026read9 mininAI Implementation

Three launches with AI: what shipping DealVex, Rhino-basketball, and agentmodeai taught me about building as a small-team operator

Three ventures in three categories shipped in the same 90-day window with AI-paired development. The lesson that compounded across all three is that AI inverts the build-vs-buy decision: the bottleneck is no longer engineering capacity, it's whether you can specify the desired behaviour clearly enough.

Holding·reviewed28 Apr 2026·next+59d

Three ventures shipped in three different categories in the same 90-day window: a B2B SaaS for an autodealer, an operations system for a basketball club, and the publication you are reading. All three were built primarily by one person paired with AI tooling. The cross-cutting lesson, the one that compounded across all three, is the lesson this piece is for.

This is the operators-register companion to the enterprise build-log series. The enterprise pieces argue the production-model case for senior IT leaders. This piece is for small-team operators trying to figure out what AI-paired development actually means for a 1-to-50-person business, written from the perspective of someone who shipped three of them and can name what worked and what didn’t.

Three ventures, three categories, three lessons

DealVex is a B2B SaaS for autodealers. It started as a tool Peter’s son-in-law needed for his autodealer business, inventory tracking, taxation workflows, BTW reporting, lead intake from multiple channels. Off-the-shelf options existed but didn’t fit the workflow precisely; the legacy software in the Dutch autodealer market is heavy, slow, and not designed for a small operation. The build started as a one-user internal tool. It became a multi-user system over the 90-day window.

The DealVex lesson: what AI made possible was building for a market segment that the off-the-shelf vendors did not see. Small autodealers (under 200 cars, owner-operated, Dutch-language) are too small to interest the established vendors and too specific to be served by generic SaaS. The economics that previously made this segment unbuildable (a developer week costs more than a year of off-the-shelf license fees) inverted when the developer week became a developer evening with AI assistance. The product exists because the math changed, not because the need was new.

Rhino-basketball is the operations system Peter built (and chairs) for the basketball club in Oosterwolde. The club runs on volunteer labour: scheduling trainers, tracking member attendance, handling subscription payments, communicating with parents about training cancellations. The historical solution was a mix of WhatsApp groups, an Excel spreadsheet, and one person who carried the institutional memory in their head. The institutional-memory person retiring was the trigger.

The Rhino lesson: AI shifted what counts as “small enough to write custom software for”. A basketball club at this scale (a few hundred members, a dozen volunteer trainers, three age categories, two competitive divisions) would never have justified a custom system before AI-paired development. The custom system shipped in roughly one quarter of evenings and weekends, costs nothing per month to operate, and replaces three separate tools the club was using inconsistently. The threshold for “build it ourselves” dropped by an order of magnitude.

agentmodeai is the publication you are reading. Pivoted in April 2026 from a previous autonomous-engine thesis (Postgres-schema, LaunchAgents, Writer/Critic agents, all retired) to a Next.js + MDX publication on Vercel, with Peter drafting briefs and Claude drafting content. The infrastructure layer is custom, claim ledger, retraction register, vendor firewall, build-time credibility scanner, because the publication’s premise required a content infrastructure that does not exist as off-the-shelf software.

The agentmodeai lesson: AI made it economically viable to build the publication infrastructure that the publication’s premise required. A platform like Substack or Ghost would have shipped the publication faster, but neither of them has a public claim ledger, a tracked-correction protocol, or a build-time voice scanner. The premise of the publication is what those tools enforce. Building them required custom development; AI-paired development made the custom build affordable for a one-person operation.

Three different categories. Three different problems. The same shape underneath all three: AI changed what was buildable inside a small-team budget.

The cross-cutting lesson, specification, not capacity

The historical build-vs-buy decision tree for a small-team operator went like this:

  1. Do we have the engineering capacity to build this? Almost always no.
  2. Therefore, buy.
  3. Settle for the closest match in the market and adapt the workflow to fit.

The AI-paired version of the decision tree is different:

  1. Can we specify the desired behaviour clearly enough that AI can build it?
  2. If yes, build. If no, either sharpen the spec or buy.

The bottleneck has moved from capacity to specification. This is the most important shift, and the one most operators are not yet calibrated against.

What “specify clearly enough” looks like is precise. Not “we need a CRM”, that is the level of specification that gets you to a buying decision. Build-grade specification looks like: when a customer abandons a quote on step 3 of the intake form, the system should send a follow-up email after 4 hours containing the partial quote and a 5% discount code, but only if it is the customer’s first abandonment this week, and only between 09:00 and 21:00 in their local timezone, with the discount code being unique per customer to prevent reuse.

That level of specification translates into running code. AI can build to it. A human pair-programmer would have built to it (slower, costlier). The level of specification that does not translate is the level a typical buying brief lives at: we need to follow up with abandoned customers more effectively.

The teams that win at AI-paired development are the teams that can describe their workflow in operational detail. The teams that cannot still cannot ship, regardless of how good the underlying AI is. The bottleneck is upstream of the AI; the AI just made the downstream cheap.

What didn’t work

Three categories of failure across the three ventures. Worth naming explicitly because the marketing around AI-paired development does not name them.

AI-paired development does not eliminate the need for testing. Both DealVex and agentmodeai shipped bugs that automated testing would have caught earlier than manual QA did. The AI is good at producing code that compiles, parses, and looks reasonable. It is not good at predicting which edge cases will fire on real production data. The discipline of writing tests, or at minimum, writing acceptance criteria the code can be checked against, does not transfer from the human to the AI just because the AI is good at code generation. The operator still owns the verification step.

AI-paired development does not eliminate the need for design discipline. agentmodeai went through three significant UI revisions because the first two were AI-suggested patterns that did not match how readers actually behave. The AI suggested reasonable conventions; the conventions were wrong for this specific publication. Design works the same way in AI-paired development as without it: ship something, watch real readers use it, iterate. The AI accelerates the building part of the loop; it does not shortcut the seeing-readers-use-it part.

AI-paired development does not replace domain expertise. Rhino-basketball’s calendar logic took multiple rewrites because the small details of how a basketball club actually schedules trainers (overlapping role assignments, last-minute substitutions, court-availability cascades) required someone who understood the domain to specify them. The AI cannot guess them; it cannot infer them from the generic concept of “basketball club scheduling”; it has to be told. Operators who think they can hand a domain problem to AI without domain expertise are the operators who ship something that looks right and breaks the moment a real edge case fires.

The pattern across all three failure modes is the same: AI accelerates the parts of the workflow that are downstream of decisions, but the decisions themselves still belong to the operator. Testing, design judgement, domain expertise, these are human-owned, and they cost time even with AI in the loop.

What an operator should actually do in 2026

Five steps, derived from what worked across the three ventures. None of them are clever. All of them are doable in 1-to-50-person contexts.

One: pick one specific workflow that is currently painful, and small enough to specify in detail. Not the whole business. One workflow. The customer-onboarding sequence, the supplier-invoice processing, the inventory-reconciliation step. Pick the one where the cost of being wrong is bounded and the size of the prize is visible.

Two: write the specification before involving AI. Describe inputs, outputs, edge cases, acceptance criteria. If you cannot write this in plain language, you cannot build with AI on this workflow yet, sharpen the spec or pick a different workflow. The temptation is to skip this step and let the AI figure out the spec from context. The AI can, partially, and the partial answer is what produces the rewrites.

Three: ship a working version end-to-end before optimising any piece. AI is good at producing the second-best version of every component fast. The trap is optimising components in isolation before the end-to-end flow exists. Ship the ugly working version first. Real users surface the bottleneck; iterate on the bottleneck.

Four: budget 15–25% of build time for maintenance over the first 12 months. Custom systems break. AI-paired custom systems break at roughly the same rate as human-built custom systems, in this builder’s experience across the three ventures. The builder who didn’t budget for maintenance is the builder who hates the system six months later.

Five: treat AI as a paired drafter with a senior intern’s competence. Capable of producing the right output most of the time. Requires review on every output. Improves fast when given specific feedback. This is the right calibration. Treating it as a junior intern under-uses it; treating it as a senior engineer over-trusts it; treating it as a senior intern is the working frame.

These five are not novel. They are what disciplined small-team development has always looked like. The new part is that AI made the package affordable for ventures that previously could not justify the engineering investment.

What we are tracking

Claim OPS-029 is logged with a 60-day review on 27 June 2026. The trackable assertion: across the three Q1 2026 ventures, the build-vs-buy inversion held, specification became the bottleneck rather than engineering capacity. By 27 June 2026, either the pattern has held on the Q2 ventures the publication is tracking, or specific cases have surfaced where AI-paired development failed despite good specification.

Three review checks at 60 days. Has any of the three ventures stalled because of capacity (rather than specification)? Has any new venture launched under the same model in Q2 2026 (would extend the pattern)? Have specific failure modes around specification quality surfaced in defensible enough form to write up?

If the inversion holds on the Q2 cohort and no specification-failure modes surface, claim Holds. If failure modes surface but the inversion still holds, Partial and the next piece names the failure modes. If a Q2 venture stalls because of capacity rather than specification, Not holding and the OPS-build-log series owes the reader an explanation of what changed.

The claim is on the ledger. It will be reviewed in public, and if it does not hold, the correction will be on the same page.

ShareX / TwitterLinkedInEmail

Spotted an error? See corrections policy →

OPS-LEDGER · 60 reviewed