AutoGen Tutorial (Python): adding cost tracking for advanced developers

By Cyprian AaronsUpdated 2026-04-21

autogenadding-cost-tracking-for-advanced-developerspython

This tutorial shows you how to add token and dollar-cost tracking to an AutoGen Python workflow, so every agent run can be measured and budgeted. You need this when you’re moving from demos to production and have to answer basic questions like “which agent is expensive?” and “what did that conversation cost?”

What You'll Need

•Python 3.10+
•autogen-agentchat
•autogen-ext
•An OpenAI API key in OPENAI_API_KEY
•Basic familiarity with AutoGen agents, models, and async execution
•A project where you can run asyncio scripts

Install the packages:

pip install autogen-agentchat autogen-ext openai

Step-by-Step

•Start with a model client that exposes usage data.
AutoGen’s OpenAI model client returns token usage in the response metadata, which is the raw input you need for cost tracking.

import asyncio
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)

agent = AssistantAgent(
    name="assistant",
    model_client=client,
    system_message="You are a concise assistant.",
)

•Wrap execution in a small cost tracker.
Keep the accounting outside the agent so you can reuse it across teams, workflows, and model providers. For production systems, this is easier to test than burying logic inside prompts or tools.

from dataclasses import dataclass

@dataclass
class CostTracker:
    input_tokens: int = 0
    output_tokens: int = 0
    total_cost_usd: float = 0.0

    def add(self, input_tokens: int, output_tokens: int, cost_usd: float) -> None:
        self.input_tokens += input_tokens
        self.output_tokens += output_tokens
        self.total_cost_usd += cost_usd

def estimate_cost_gpt_4o_mini(input_tokens: int, output_tokens: int) -> float:
    # Example pricing only; verify against your provider's current rates.
    input_rate = 0.15 / 1_000_000
    output_rate = 0.60 / 1_000_000
    return (input_tokens * input_rate) + (output_tokens * output_rate)

•Run the agent and extract usage from the result.
The exact result object depends on the agent type, but TaskResult includes messages and final output. For advanced usage tracking, inspect the last assistant message metadata after each run.

async def main() -> None:
    tracker = CostTracker()

    result = await agent.run(task="Write one sentence about why cost tracking matters.")
    print("Final answer:", result.messages[-1].content)

    last_message = result.messages[-1]
    usage = getattr(last_message, "models_usage", None)

    if usage is not None:
        input_tokens = getattr(usage, "prompt_tokens", 0)
        output_tokens = getattr(usage, "completion_tokens", 0)
        cost_usd = estimate_cost_gpt_4o_mini(input_tokens, output_tokens)
        tracker.add(input_tokens, output_tokens, cost_usd)

        print("Input tokens:", tracker.input_tokens)
        print("Output tokens:", tracker.output_tokens)
        print("Estimated cost:", f"${tracker.total_cost_usd:.6f}")

asyncio.run(main())

•Make it reusable across multiple runs.
In real systems you usually care about session-level or request-level totals, not one-off calls. The pattern below accumulates spend across multiple tasks using the same tracker.

async def tracked_run(task: str, tracker: CostTracker) -> str:
    result = await agent.run(task=task)
    final_message = result.messages[-1]
    usage = getattr(final_message, "models_usage", None)

    if usage is not None:
        prompt_tokens = getattr(usage, "prompt_tokens", 0)
        completion_tokens = getattr(usage, "completion_tokens", 0)
        estimated_cost = estimate_cost_gpt_4o_mini(prompt_tokens, completion_tokens)
        tracker.add(prompt_tokens, completion_tokens, estimated_cost)

    return final_message.content

async def batch_main() -> None:
    tracker = CostTracker()

    for task in [
        "Summarize why budgets matter in one line.",
        "Give one risk of untracked LLM usage.",
    ]:
        answer = await tracked_run(task, tracker)
        print(answer)

    print("Session tokens:", tracker.input_tokens + tracker.output_tokens)
    print("Session cost:", f"${tracker.total_cost_usd:.6f}")

•Add a hard budget guard before you go to production.
Once you can measure spend, enforce it. That lets you fail fast when a conversation crosses your threshold instead of discovering it later in billing.

BUDGET_USD = 0.01

async def guarded_run(task: str, tracker: CostTracker) -> str:
    if tracker.total_cost_usd >= BUDGET_USD:
        raise RuntimeError(f"Budget exceeded: ${tracker.total_cost_usd:.6f}")

    result = await agent.run(task=task)
    final_message = result.messages[-1]
    usage = getattr(final_message, "models_usage", None)

    if usage is not None:
        prompt_tokens = getattr(usage, "prompt_tokens", 0)
        completion_tokens = getattr(usage, "completion_tokens", 0)
        estimated_cost = estimate_cost_gpt_4o_mini(prompt_tokens, completion_tokens)
        tracker.add(prompt_tokens, completion_tokens, estimated_cost)

    return final_message.content

Testing It

Run the script with a valid OPENAI_API_KEY and confirm that each response prints both content and non-zero usage fields when the provider returns them. Then run two or three tasks back-to-back and verify that your totals increase monotonically across calls.

If your token counts stay at zero, inspect the returned message object and log its attributes; different AutoGen versions may attach usage metadata slightly differently depending on model client and response path. Also confirm your pricing function matches the exact model you configured.

For a production check, set BUDGET_USD very low and make sure the code raises once accumulated spend crosses that threshold.

Next Steps

•Persist per-request cost data to Postgres or OpenSearch for reporting and chargeback.
•Add per-agent labels so you can compare planner vs executor spend.
•Wrap this into middleware so every AutoGen workflow in your codebase gets tracking by default

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit