You have a bounded business problem that AI might solve: a workflow that needs retrieval, tool use, drafting, classification, review, or multi-step execution. You do not need a vendor to promise delivery; you need a small factory process that turns the problem into an evaluated, operable system without losing the plot halfway through.

This is the engagement: narrow the problem, write the evaluation, put a real user in the loop, build against the eval, and leave your team with the operating assets they need to run the system without us.

What we build

Agentic workflow apps: systems that plan a bounded sequence of steps, call tools, update records, ask for approval, and stop when the next action belongs to a human.
AI tools inside existing products: copilots, review queues, drafting assistants, triage systems, knowledge search, support workflows, sales workflows, and internal operations tools.
Tool-calling infrastructure: typed tool contracts, MCP servers, API adapters, permission checks, audit trails, queue workers, and fallback paths.
Retrieval and knowledge systems: RAG that answers concrete questions, refuses when the source is weak, and measures answerability instead of only semantic similarity.
Verifier-gated agents: loops where a model proposes work and another check verifies structure, grounding, policy, cost, or business rules before anything ships downstream.

The shape

Weeks 1–2: Discovery + evaluation. We write the eval set before the prompt. If we can't write a defensible eval in two weeks, we stop the engagement and refund the discovery fee. We've done this twice in 2024–2025.
Week 2: First real user inside the system. Usually one person on your team. We watch them use what we have so far. Architecture decisions made here, not on whiteboards.
Weeks 3 to 8 (small) or 3 to 12 (medium): Build. Two-week increments. Demo every Friday. The eval gates every merge.
Final 2 weeks: Handover. Repo, runbook, on-call playbook, eval harness, credentials transferred. Your team uses the system live with us standing behind them. Then we leave.

Operating assets

The Production System: Deployed on your infrastructure, running under your credentials, fully under your control.
The Automated Eval Harness: The release gate we used on every merge. Your team runs it on every PR to catch model drift, prompt regressions, tool failures, and quality drops before they ship.
The Immutable Decision Log: The anti-tribal-knowledge asset: model selections, latency/cost tradeoffs, discarded assumptions, and why each call was made.
The Tool and Permission Map: Every API, model, queue, data store, and external action the system can touch, with owners and limits.
The 3 A.M. Runbook: Alerts, degradation modes, escalation paths, and rollback scripts written for the on-call engineer, not the boardroom.
30-Day Engineering Warranty: Bugs in our delivered code that surface within 30 days are resolved on our time.

The arithmetic

Small build: $40–80k, 6–8 weeks. 1 Dedicated Senior Engineer + 1 Lead Architect (part-time).
Medium build: $80–160k, 10–14 weeks. 1 Dedicated Senior Engineer + 1 Dedicated Product Designer + 1 Lead Architect (part-time).
Discovery only: from $8k, 1–2 weeks, written eval + recommendation report. Fee credits against a build engagement if we go ahead.

Fixed fee where the eval is clean. T&M where the problem is genuinely uncertain. We tell you which we're proposing on the first call.

What we won't do

Start without a written eval. If the success criteria can't be written, the factory has no gate.
Take a project where the same three engineers can't keep continuity from scope to handover.
Subcontract. The process depends on project memory staying with the people doing the work.
Treat the prompt as the product. The product is the operating system around it: evals, workflow, observability, safety, and handover.

Who this works for

Engineering or product leaders with a real, bounded problem and a real budget.
Teams that intend to operate the thing after we hand it over. Not teams looking for a vendor to outsource AI to permanently.
Companies where the AI feature has to pass procurement and security review on the same timeline as the rest of the product.

Bring us the problem, the owner, the budget range, and the date. The first call is 30 minutes and we'll tell you on it whether we're a fit.