AutoGen Tutorial (Python): adding observability for beginners

By Cyprian AaronsUpdated 2026-04-21
autogenadding-observability-for-beginnerspython

This tutorial shows you how to add basic observability to an AutoGen Python app by logging agent messages, tool calls, and turn-by-turn execution. You need this when your agent starts doing non-trivial work and you want to debug failures, trace decisions, and understand what happened without reading raw console noise.

What You'll Need

  • Python 3.10+
  • An OpenAI API key
  • autogen-agentchat
  • autogen-ext
  • python-dotenv for local env loading
  • A terminal and a text editor

Install the packages:

pip install autogen-agentchat autogen-ext python-dotenv

Set your API key in the environment:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start with a minimal AutoGen agent setup. The key observability move is to wrap execution with structured logging, so you can see each message and tool invocation instead of guessing from the final output.
import asyncio
import logging

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(name)s - %(message)s",
)
logger = logging.getLogger("autogen-observability")

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(
        name="assistant",
        model_client=model_client,
        system_message="You are a concise assistant.",
    )

    result = await agent.on_messages(
        [TextMessage(content="Explain observability in one sentence.", source="user")],
        cancellation_token=None,
    )
    print(result.chat_message.content)

if __name__ == "__main__":
    asyncio.run(main())
  1. Add explicit event logging around each request. This gives you a simple trace of inputs and outputs, which is enough for beginners to debug most issues before introducing full tracing infrastructure.
import asyncio
import logging

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("autogen-observability")

async def traced_call(agent: AssistantAgent, user_text: str) -> str:
    logger.info("request.start user_text=%r", user_text)
    result = await agent.on_messages(
        [TextMessage(content=user_text, source="user")],
        cancellation_token=None,
    )
    logger.info("request.end assistant_text=%r", result.chat_message.content)
    return result.chat_message.content

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(name="assistant", model_client=model_client)
    await traced_call(agent, "Give me three benefits of logs.")

if __name__ == "__main__":
    asyncio.run(main())
  1. If your agent uses tools, log the tool boundary too. In production, tool calls are where most surprises happen: bad inputs, slow APIs, or unexpected empty responses.
import asyncio
import logging

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("autogen-observability")

def get_policy_status(policy_id: str) -> str:
    logger.info("tool.start name=get_policy_status policy_id=%s", policy_id)
    status = f"Policy {policy_id} is ACTIVE"
    logger.info("tool.end name=get_policy_status result=%s", status)
    return status

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(
        name="assistant",
        model_client=model_client,
        tools=[get_policy_status],
        system_message="Use tools when needed.",
    )

    result = await agent.on_messages(
        [TextMessage(content="Check policy 12345.", source="user")],
        cancellation_token=None,
    )
    print(result.chat_message.content)

if __name__ == "__main__":
    asyncio.run(main())
  1. Capture latency and token usage from each run. This is the minimum useful telemetry for cost control and performance debugging, especially once multiple agents start talking to each other.
import asyncio
import logging
import time

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("autogen-observability")

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(name="assistant", model_client=model_client)

    start = time.perf_counter()
    result = await agent.on_messages(
        [TextMessage(content="Summarize why observability matters.", source="user")],
        cancellation_token=None,
    )
    elapsed_ms = round((time.perf_counter() - start) * 1000, 2)

    logger.info("run.metrics elapsed_ms=%s", elapsed_ms)
    logger.info("run.output %s", result.chat_message.content)

if __name__ == "__main__":
    asyncio.run(main())
  1. Put the logs into a format your future self can search. Plain text is fine for learning, but structured fields make it much easier to ship logs to Datadog, CloudWatch, or ELK later.
import asyncio
import json
import logging
import time

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("autogen-observability")

def log_event(event: str, **fields) -> None:
    payload = {"event": event, **fields}
    logger.info(json.dumps(payload))

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(name="assistant", model_client=model_client)

    user_text = "List two reasons to trace AI agents."
    log_event("request.start", user_text=user_text)

    start = time.perf_counter()
    result = await agent.on_messages(
        [TextMessage(content=user_text, source="user")],
        cancellation_token=None,
    )
    elapsed_ms = round((time.perf_counter() - start) * 1000, 2)

   log_event(
        "request.end",
        elapsed_ms=elapsed_ms,
        assistant_text=result.chat_message.content,
        agent_name=agent.name,
   )

if __name__ == "__main__":
   asyncio.run(main())

Testing It

Run the script from your terminal and confirm you see three things: the request start log, the request end log, and the assistant response. If you added a tool function, trigger a prompt that forces tool usage and verify both tool.start and tool.end appear.

A good smoke test is to intentionally break something small, like passing an invalid prompt or changing the tool output format. Your logs should show where the failure happened instead of leaving you with only a generic exception at the top level.

If you want stronger validation, redirect stdout/stderr to a file and grep for request.start, tool.start, and run.metrics. That gives you a lightweight trace without introducing extra infrastructure.

Next Steps

  • Add correlation IDs so every request can be traced across multiple agents and tools.
  • Export these JSON logs into OpenTelemetry or your platform’s native tracing stack.
  • Wrap your AutoGen runs in FastAPI middleware so observability works across HTTP requests too.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides