AutoGen Tutorial (Python): adding observability for advanced developers
This tutorial shows how to add structured observability to an AutoGen multi-agent workflow in Python using OpenTelemetry. You need this when your agent system is doing real work and you want traces, spans, and logs that tell you which agent did what, how long each step took, and where failures started.
What You'll Need
- •Python 3.10+
- •
autogen-agentchat - •
autogen-ext - •
opentelemetry-api - •
opentelemetry-sdk - •
opentelemetry-exporter-otlp - •An OpenAI-compatible API key
- •A running OpenTelemetry collector or a local console exporter for development
Install the packages:
pip install autogen-agentchat autogen-ext opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
Step-by-Step
- •Start by wiring up OpenTelemetry before you create any agents. If you initialize tracing late, you miss the startup path and the first model calls.
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
resource = Resource.create({"service.name": "autogen-observability-demo"})
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(ConsoleSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
- •Build your model client and agents with the current AutoGen APIs. This example uses a user proxy plus an assistant, which is enough to show where observability hooks fit in production workflows.
import asyncio
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
model_client = OpenAIChatCompletionClient(
model="gpt-4o-mini",
api_key="YOUR_OPENAI_API_KEY",
)
assistant = AssistantAgent(
name="assistant",
model_client=model_client,
system_message="You are a precise assistant.",
)
user = UserProxyAgent(
name="user",
input_func=lambda prompt: "Summarize the risk of delayed payments in one sentence."
)
- •Wrap each agent turn in spans so you can see latency per step. The important bit is to attach useful attributes like agent name, task type, and result size instead of dumping raw prompts into traces.
async def run_turn():
with tracer.start_as_current_span("agent.workflow") as span:
span.set_attribute("workflow.name", "payment-risk-summary")
with tracer.start_as_current_span("agent.user_input") as user_span:
user_span.set_attribute("agent.name", "user")
message = await user.run(task="Start the task")
user_span.set_attribute("message.type", type(message).__name__)
with tracer.start_as_current_span("agent.assistant_reply") as assistant_span:
assistant_span.set_attribute("agent.name", "assistant")
result = await assistant.run(task=message)
assistant_span.set_attribute("result.type", type(result).__name__)
return result
if __name__ == "__main__":
asyncio.run(run_turn())
- •Add error capture so failed runs still produce useful telemetry. In practice, this is where observability pays off: timeouts, bad tool output, malformed responses, and upstream API issues become visible instead of disappearing into stack traces.
async def run_with_observability():
with tracer.start_as_current_span("agent.workflow") as span:
try:
result = await assistant.run(task="Explain why claims processing can fail.")
span.set_attribute("workflow.status", "ok")
return result
except Exception as exc:
span.record_exception(exc)
span.set_attribute("workflow.status", "error")
raise
if __name__ == "__main__":
asyncio.run(run_with_observability())
- •If you want production-grade visibility, standardize the attributes you attach on every run. Keep them small, consistent, and query-friendly so your tracing backend can group failures by workflow, agent, or customer journey.
def set_common_attributes(span, workflow_name: str, agent_name: str, tenant_id: str):
span.set_attribute("workflow.name", workflow_name)
span.set_attribute("agent.name", agent_name)
span.set_attribute("tenant.id", tenant_id)
async def traced_assistant_task():
with tracer.start_as_current_span("assistant.task") as span:
set_common_attributes(span, "payment-risk-summary", "assistant", "tenant-a")
response = await assistant.run(task="Give me a concise answer about payment delays.")
span.set_attribute("response.kind", type(response).__name__)
return response
if __name__ == "__main__":
asyncio.run(traced_assistant_task())
Testing It
Run the script locally and confirm that spans print to stdout through ConsoleSpanExporter. You should see a parent workflow span plus child spans for each logical step you wrapped.
If the model call fails because the API key is wrong or missing, the exception should be recorded on the active span before it bubbles up. That tells you your tracing path is working even on failure cases.
For a deeper check, replace the console exporter with OTLP export to your collector and inspect whether attributes like workflow.name and agent.name are searchable. If they are not showing up, your tracing provider is being initialized too late or spans are ending before attributes are set.
Next Steps
- •Add tool-call spans around every external dependency: databases, HTTP APIs, queues.
- •Propagate a request ID from your web layer into
tenant.id,session.id, orcorrelation.id. - •Move from console export to OTLP + Grafana Tempo or Jaeger for real dashboards and distributed traces.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit