AWS is where we reach when the client already has mature cloud governance, procurement, and security review around it. It is not the simplest place to start an AI product, but it is often the place production has to live.
The reason to choose AWS is not that AI is easier there. The reason is that the client already knows how to operate AWS: accounts, IAM, VPCs, logs, incident response, procurement, and security exceptions all have an owner.
Where AWS fits
- Enterprise teams that already operate IAM, VPCs, logging, and incident response in AWS.
- AI workflows that need private data paths, queue-backed jobs, and clear ownership boundaries.
- Products where procurement cares more about operational control than the fastest prototype.
- Agentic systems that need queue-backed tool calls, private data paths, scoped credentials, and clear blast-radius boundaries.
What we watch closely
- IAM sprawl. The fastest way to make an AI system unreviewable is to let every worker, retriever, and export path grow its own permissions.
- Hidden orchestration cost. Model tokens are usually not the only bill. Queues, object storage, logs, egress, embeddings, and retries need their own budget line.
- Observability gaps. LLM traces, retrieval traces, and application logs need to join back to the same user action or the runbook will fail during the first incident.
Decisions we tend to make
- Keep customer credentials and data stores under the customer's account from day one.
- Put ingestion, extraction, and export behind explicit queues instead of long request paths.
- Separate model routing from product logic so a cheaper model can win when it passes the eval.
- Treat security review artifacts as product deliverables, not launch-week paperwork.
What we include in handover
- IAM map for model routes, retrievers, workers, export paths, and human operators.
- Queue and retry policy for long-running ingestion, extraction, and agent tasks.
- Cost dashboard separating model tokens, embeddings, logs, queues, storage, and egress.
- Runbook entries for provider outage, cost spike, retrieval miss, malformed tool call, and rollback.
- Security-review notes that explain data flow, credential ownership, and audit logs.
When we avoid it
If the team does not already operate AWS and the project does not need AWS-specific governance, starting there can slow the build without improving the product. We do not choose AWS because it sounds enterprise. We choose it when the operating model already lives there.
Related work
BidGenie is the closest public pattern: document ingestion, retrieval-grounded drafting, human review, export, and audited-provider deployment.