Picking your first AI agent: the 4-question filter for SMBs
Most SMB-deployed agents fail not on technology but on the four questions nobody asked at the demo: what does success look like in numbers, who owns it on Monday, what breaks if it fails silently, what's the rollback. If a candidate use case can't answer all four, it's not ready.
Holding·reviewed26 Apr 2026·next+60dIf your SMB has not yet deployed its first AI agent and a vendor is in your inbox this week, the question we keep getting is which use case to start with. The right answer is not “the one that looks best in the demo.” It is the one that survives a four-question filter we have arrived at after watching the failure modes of agents deployed at fewer than fifty people.
The filter is short, deliberately. A longer checklist gets ignored by the owner-operator who will actually run the deployment. A four-question filter fits on a Post-it, and it catches roughly the same failure modes a 60-row enterprise GRC artefact would catch at this size.
The four questions
For each candidate AI agent use case, in this order:
1. What does success look like in numbers?
If you cannot finish the sentence “this agent is succeeding when X is happening at Y rate” with concrete values, the use case is not ready. Vague answers (“it’ll save us time”, “it’ll improve customer experience”) are not numbers. Numbers are: response time under 4 minutes, classification accuracy above 92%, hours-saved per week above 6, escalation rate under 8%. If the use case cannot generate a number you would post on a wall, the agent has no anchor for the inevitable “is this working” conversation in week three.
The question this filters out: vendor demos that show a single impressive output and skip the operating-rhythm question.
2. Who owns it on Monday?
The agent has a named human owner whose job it is to look at it, fix it when it breaks, and report on it. Not “the team.” Not “we’ll figure it out.” A name and an hour each week. If you cannot name the owner before signing the contract, the agent will be orphaned within thirty days, and orphaned agents either silently degrade or get switched off.
The question this filters out: cross-functional use cases where every department wants the benefit and no department wants the operational burden.
3. What breaks if it fails silently?
Agents fail in two modes: noisy (they error and stop) and silent (they continue producing wrong output that looks right). The silent mode is the one that costs money, reputation, or compliance posture. For each candidate use case, name the worst plausible silent-failure outcome: “the agent classifies an urgent ticket as low-priority and the customer walks.” “The agent agrees to a refund policy we do not have.” “The agent emails a vendor with the wrong PO number.”
If the silent-failure consequence is recoverable in under a day at low cost, the use case is a fit. If the consequence is irreversible (data loss, reputational damage, regulatory breach), the use case needs human-in-the-loop and is not “your first agent.”
The question this filters out: agents pointed at high-stakes outputs (legal, financial, medical, customer-facing irreversible actions) before the team has the operational maturity to monitor them.
4. What’s the rollback?
If the agent stops producing useful output tomorrow, what happens. The honest answer cannot be “we’re back to where we were before the agent” if the agent replaced a process that has since been dismantled. The rollback path is either a human who still does the work (in which case the agent is augmenting, not replacing), a clear vendor SLA-backed alternative, or a documented manual procedure the team has run inside the last 30 days.
If none of the three exists, you do not have a rollback. You have a dependency.
The question this filters out: aggressive “AI-first” deployments that decommission the human process before the agent has run a quarter of clean operation.
How to run the filter in 30 minutes
Take the candidate use case to a 30-minute meeting with three people: the owner-operator (you), the proposed agent owner (whoever will look at it Monday), and one person from the team that the agent’s output reaches. Walk through the four questions in order. Write the answer on the same document. If any of the four is “we don’t know” or “we’ll figure that out,” stop. The agent is not ready.
A 30-minute meeting that ends with “we are not deploying this yet” is not a failed meeting. It is a successful filter.
What this filter is deliberately not
It is not a vendor evaluation framework. There is a separate piece for that (Vendor due diligence in one Saturday). Vendor selection happens after the use case clears this filter; running them simultaneously confuses “is this the right vendor” with “is this the right use case,” and both questions get worse answers.
It is not a measurement of agent capability. The agent might be technically perfect at the use case and still fail the filter because the surrounding operational context is not ready. The filter measures organisational readiness, not model quality.
It is not a substitute for a risk register, a governance policy, or an AI policy. Those are documents that exist at higher organisational scale; this filter is the four questions that an owner-operator can run in 30 minutes before any of those documents exist.
What changes this filter
Cadence on this piece is 60 days because the filter is editorial framework, not data, and the surface that would change it is the failure modes we observe in real SMB deployments. The two things that would flip the recommendation:
- A fifth failure mode emerges that the four questions do not catch. The current four are derived from the most common failure modes observed across SMB-scale deployments through 2025-2026. If a new mode (say, agent-vs-agent collision in multi-agent setups, or regulatory-driven failure modes from EU AI Act enforcement) becomes common at SMB scale, the filter expands.
- Agent operational tooling matures to the point where “who owns it Monday” becomes a self-service surface rather than a named human responsibility. There are early signs of this in observability platforms (Langfuse, Helicone, Arize); when one becomes the SMB default, question two changes from “who” to “what tool.”
We will re-test this filter against actual SMB deployment outcomes on or before 26 Jun 2026. If either of the preceding conditions has triggered, this claim moves to Partial.
Spotted an error? See corrections policy →