AutoGen Tutorial (Python): adding observability for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

autogenadding-observability-for-intermediate-developerspython

This tutorial shows how to add practical observability to an AutoGen Python app using structured logging, trace IDs, and event hooks. You need this when your agent workflow stops being a toy and starts making multi-step decisions that you need to debug, audit, or explain later.

What You'll Need

•Python 3.10+
•autogen-agentchat
•autogen-ext
•openai API key set as an environment variable
•python-dotenv if you want local .env loading
•A terminal where you can run the script and inspect logs

Install the packages:

pip install autogen-agentchat autogen-ext openai python-dotenv

Step-by-Step

•Start with a minimal agent setup and a logger that emits JSON-like records. The goal is to capture each run with a correlation ID so you can tie together prompts, tool calls, and final output.

import asyncio
import logging
import os
import uuid

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(name)s trace_id=%(trace_id)s %(message)s",
)

class TraceLoggerAdapter(logging.LoggerAdapter):
    def process(self, msg, kwargs):
        kwargs.setdefault("extra", {})
        kwargs["extra"]["trace_id"] = self.extra["trace_id"]
        return msg, kwargs

client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)

•Create the agent and wrap the execution in a function that assigns a trace ID per request. This is the simplest useful observability boundary: one user request equals one trace.

async def run_once(user_input: str):
    trace_id = str(uuid.uuid4())
    logger = TraceLoggerAdapter(logging.getLogger("autogen.demo"), {"trace_id": trace_id})

    agent = AssistantAgent(
        name="support_agent",
        model_client=client,
        system_message="You are a concise support assistant.",
    )

    logger.info("starting_run")
    result = await agent.run(task=user_input)
    logger.info("finished_run")

    print(f"\nTRACE_ID={trace_id}")
    print(result.messages[-1].content)

if __name__ == "__main__":
    asyncio.run(run_once("Summarize the key risks in an AI agent rollout for banking."))

•Add message-level logging so you can see what went into and came out of the agent. In production, this is where you redact sensitive fields before writing logs.

async def run_with_message_logging(user_input: str):
    trace_id = str(uuid.uuid4())
    logger = TraceLoggerAdapter(logging.getLogger("autogen.messages"), {"trace_id": trace_id})

    agent = AssistantAgent(
        name="support_agent",
        model_client=client,
        system_message="You are a concise support assistant.",
    )

    logger.info("user_input=%r", user_input)
    result = await agent.run(task=user_input)

    for i, message in enumerate(result.messages):
        role = getattr(message, "source", "unknown")
        content = getattr(message, "content", "")
        logger.info("message_%d role=%s content=%r", i, role, content[:500])

if __name__ == "__main__":
    asyncio.run(run_with_message_logging("Explain how observability helps in regulated workflows."))

•If you use tools, log tool inputs and outputs separately from model text. That gives you the split you actually need during incident review: model reasoning versus external side effects.

from autogen_core.tools import FunctionTool

def lookup_policy(policy_id: str) -> str:
    return f"Policy {policy_id}: active, premium paid, no claims in last 12 months."

tool = FunctionTool(lookup_policy, name="lookup_policy", description="Look up policy status by policy ID.")

async def run_with_tool_logging():
    trace_id = str(uuid.uuid4())
    logger = TraceLoggerAdapter(logging.getLogger("autogen.tools"), {"trace_id": trace_id})

    agent = AssistantAgent(
        name="insurance_agent",
        model_client=client,
        tools=[tool],
        system_message="Use tools when needed and be precise.",
    )

    logger.info("starting_tool_run")
    result = await agent.run(task="Check policy P-10021 and summarize its status.")
    logger.info("tool_run_complete")

    print(result.messages[-1].content)

if __name__ == "__main__":
    asyncio.run(run_with_tool_logging())

•Persist traces to disk so you can inspect them after the fact instead of only watching terminal output. For real systems, send the same events to OpenTelemetry or your log pipeline.

import json
from datetime import datetime

def write_trace_event(path: str, trace_id: str, event: str, payload: dict):
    record = {
        "ts": datetime.utcnow().isoformat() + "Z",
        "trace_id": trace_id,
        "event": event,
        "payload": payload,
    }
    with open(path, "a", encoding="utf-8") as f:
        f.write(json.dumps(record) + "\n")

async def run_with_file_tracing(user_input: str):
    trace_id = str(uuid.uuid4())
    write_trace_event("agent-traces.jsonl", trace_id, "start", {"input": user_input})

    result = await client.create([{"role": "user", "content": user_input}])
    write_trace_event(
        "agent-traces.jsonl",
        trace_id,
        "model_response",
        {"content": result.content},
    )

if __name__ == "__main__":
    asyncio.run(run_with_file_tracing("List three observability signals for AutoGen agents."))

Testing It

Run each script with a valid OPENAI_API_KEY exported in your shell. You should see a unique TRACE_ID per execution plus structured log lines that let you connect input, intermediate messages, and final output.

For the file-based version, inspect agent-traces.jsonl after one run and confirm it contains newline-delimited JSON records with timestamps and matching trace_id values. If you add tools later, verify tool events show up as separate records rather than being buried inside raw assistant text.

A good sanity check is to intentionally change the prompt and confirm the logs make it obvious which request produced which output. If two runs look identical in logs when they should not be, your correlation layer is too weak.

Next Steps

•Add OpenTelemetry spans around agent.run() and tool functions.
•Redact PII before writing message content to logs.
•Export traces to Grafana Loki or Datadog instead of local files.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit