AutoGen Tutorial (Python): adding cost tracking for beginners

By Cyprian AaronsUpdated 2026-04-21

autogenadding-cost-tracking-for-beginnerspython

This tutorial shows you how to add token and dollar cost tracking to a basic AutoGen Python setup. You need this when you want visibility into LLM spend per agent, per run, or per request before you ship an assistant into production.

What You'll Need

•Python 3.10+
•autogen-agentchat
•autogen-ext
•An OpenAI API key
•A terminal with pip
•Basic familiarity with creating AssistantAgent and UserProxyAgent in AutoGen

Install the packages first:

pip install autogen-agentchat autogen-ext openai

Set your API key in the environment:

export OPENAI_API_KEY="your-api-key-here"

Step-by-Step

•Start with a minimal AutoGen agent setup.

This example uses one assistant and one user proxy. The important part is that we keep the chat simple so the cost-tracking output is easy to understand.

import asyncio

from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def main():
    model_client = OpenAIChatCompletionClient(
        model="gpt-4o-mini",
    )

    assistant = AssistantAgent(
        name="assistant",
        model_client=model_client,
        system_message="You are a concise assistant.",
    )

    user = UserProxyAgent(name="user")

    result = await assistant.run(task="Explain what token cost tracking means in one sentence.")
    print(result.messages[-1].content)


if __name__ == "__main__":
    asyncio.run(main())

•Add a usage tracker around the model client.

AutoGen exposes usage data from the model client, and the cleanest beginner pattern is to wrap your run in a context manager that captures token counts and estimated cost. This gives you structured reporting without changing your agent logic.

import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def main():
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    assistant = AssistantAgent(
        name="assistant",
        model_client=model_client,
        system_message="You are a concise assistant.",
    )

    async with model_client:
        result = await assistant.run(task="Summarize cost tracking in AutoGen.")
        print(result.messages[-1].content)
        print("Usage:", result.usage)


if __name__ == "__main__":
    asyncio.run(main())

•Print token usage and estimate dollar cost.

The exact fields depend on the model client version, but result.usage is where you start. For beginners, log prompt tokens, completion tokens, total tokens, and any provided cost estimate in one place.

import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient


def print_usage(usage):
    if usage is None:
        print("No usage data returned.")
        return

    print(f"Prompt tokens: {getattr(usage, 'prompt_tokens', 'n/a')}")
    print(f"Completion tokens: {getattr(usage, 'completion_tokens', 'n/a')}")
    print(f"Total tokens: {getattr(usage, 'total_tokens', 'n/a')}")
    print(f"Cost: {getattr(usage, 'cost', 'n/a')}")


async def main():
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    assistant = AssistantAgent(name="assistant", model_client=model_client)

    async with model_client:
        result = await assistant.run(task="Give me three ways to reduce LLM spend.")
        print(result.messages[-1].content)
        print_usage(result.usage)


if __name__ == "__main__":
    asyncio.run(main())

•Track costs across multiple runs.

If you are building a workflow or multi-turn chat, single-run logging is not enough. Aggregate totals across requests so you can report spend per session or per customer interaction.

import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def main():
    total_tokens = 0

    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    assistant = AssistantAgent(name="assistant", model_client=model_client)

    async with model_client:
        for task in [
            "What is token usage?",
            "Why does it matter for billing?",
            "How can I reduce it?",
        ]:
            result = await assistant.run(task=task)
            usage = result.usage
            total_tokens += getattr(usage, "total_tokens", 0) if usage else 0
            print(result.messages[-1].content)
            print("Run tokens:", getattr(usage, "total_tokens", "n/a"))

    print("Session total tokens:", total_tokens)


if __name__ == "__main__":
    asyncio.run(main())

•Store usage in your app logs.

For production, printing to stdout is not enough. Persist usage alongside request IDs so finance, ops, and engineering can trace spend back to specific workflows.

import asyncio
import json
from datetime import datetime

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient


def log_usage(request_id: str, usage):
    record = {
        "request_id": request_id,
        "timestamp": datetime.utcnow().isoformat(),
        "prompt_tokens": getattr(usage, "prompt_tokens", None) if usage else None,
        "completion_tokens": getattr(usage, "completion_tokens", None) if usage else None,
        "total_tokens": getattr(usage, "total_tokens", None) if usage else None,
        "cost": getattr(usage, "cost", None) if usage else None,
    }
    print(json.dumps(record))


async def main():
    request_id = "req_001"
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    assistant = AssistantAgent(name="assistant", model_client=model_client)

    async with model_client:
        result = await assistant.run(task="Write one sentence about auditability.")
        log_usage(request_id, result.usage)
        print(result.messages[-1].content)


if __name__ == "__main__":
    asyncio.run(main())

Testing It

Run the script and confirm that you get both an answer from the agent and a usage object printed after it. If result.usage is empty, check that your model client supports usage reporting and that your API key is valid.

Then run the same task twice and compare the token counts; they should be similar but not always identical because of generation variance and hidden prompt differences. If you want to verify logging end-to-end, redirect stdout to a file and confirm each request writes one JSON record.

If you are using this in a larger app, test one short prompt and one long prompt. The long prompt should show higher prompt token counts and usually higher total cost.

Next Steps

•Add per-agent cost attribution so you can see which agent spends the most.
•Push usage records into Prometheus, Datadog, or your SIEM instead of stdout.
•Add budget guards that stop a workflow when estimated spend crosses a threshold.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit