A team building on Azure has the pieces in front of them — Azure AI Foundry, Semantic Kernel, Azure OpenAI models — and a working agent that does the right thing in a demo. The question that decides whether it survives contact with production is not which model they picked. It is whether, when three agents coordinate and the output is quietly wrong, anyone can open the trace and see why. On the Microsoft stack the answer is good, if you wire it deliberately. Here is how the pieces fit.
This is the Azure-specific companion to the general argument that OpenTelemetry is the tracing contract. The same instincts apply on Google — we wrote that one up separately — and the whole point of leaning on OTel is that the two stay comparable.
The shape of the stack
Four layers, and it helps to keep them distinct:
- Platform — Azure AI Foundry (the former Azure AI Studio), including its agent service, is where agents are built, deployed, and run. Foundry has first-class tracing and evaluation features; treat them as the front door, not the whole house.
- Framework — Semantic Kernel and AutoGen, now converging into the Microsoft Agent Framework, are where the orchestration logic lives. This is the layer that decides what becomes a span.
- Models — Azure OpenAI (the GPT and o-series) plus the model catalog. The model is one component; the eval picks it, the trace watches it.
- Telemetry backend — Azure Monitor and Application Insights, where traces, metrics, and logs land and where you actually query them.
Make it speak OpenTelemetry, not Azure-only
The single decision that keeps this stack healthy is to instrument in OpenTelemetry with the GenAI semantic conventions, and let Application Insights be the backend rather than the format. Semantic Kernel and the Agent Framework emit OTel; Azure Monitor ingests OTel natively. So you can have the full Azure-native experience — Foundry tracing, App Insights queries, the integrated evaluation tooling — while the spans themselves remain standard, portable telemetry.
Why insist on this when you are "on Azure anyway"? Because the regulated customer who wants telemetry kept in their own store, the eval platform you want to run LLM-as-judge on, and the multi-cloud future you have not committed to yet are all cheap if the trace is OTel and expensive if it is an Application Insights-shaped blob. Use Azure's backend fully; do not let it become the only thing that can read your traces.
What to capture on the span
The Azure tooling will happily record model calls and latency. The spans that actually save the 3 a.m. shift carry more:
- Agent boundaries and reasoning — which agent acted, why it chose the tool it chose, the confidence it attached. Foundry's agent service gives you the boundaries; you add the semantic context.
- Versions — prompt, tool config, and agent revision on every span, so an adaptive agent's drift is attributable.
- Token and cost attributes —
gen_ai.usage.*on each generation, because cost in a ten-step agent compounds and Azure bills you for every recomputed token.
The Collector and Purview do the governance
For regulated work — and a lot of Azure shops are regulated — put an OTel Collector between the agents and Application Insights and do PII redaction there, once, before telemetry leaves the boundary. Pair it with Microsoft Purview for the governance and lineage story the auditors ask about. The pattern matters more than the product names: redact at the Collector, govern at the platform, and the prompt text and tool arguments that reach the backend are already clean.
The honest assessment
The Microsoft stack is a strong place to run agents with real observability, and the integrated Foundry-plus-App-Insights path is genuinely convenient — its main risk is exactly that convenience. It is easy to end up Azure-shaped all the way down, traces included, and discover the lock-in only when you want out. Lean on the integration for the operator experience; keep the spans OTel so the experience is a choice and not a cage. Do that, and you get the best of the stack — Foundry's tracing, App Insights' query power, the built-in evals — on telemetry you can still pick up and carry.
Closing
On Azure, the model is the easy decision and the trace is the one that decides whether you can operate the thing. Foundry, Semantic Kernel, and Application Insights give you a real observability story out of the box — take it, and keep the spans in OpenTelemetry so the story stays yours. Use the Microsoft stack fully. Just make sure your traces could leave it.