AutoGen Tutorial (Python): adding cost tracking for advanced developers
This tutorial shows you how to add token and dollar-cost tracking to an AutoGen Python workflow, so every agent run can be measured and budgeted. You need this when you’re moving from demos to production and have to answer basic questions like “which agent is expensive?” and “what did that conversation cost?”
What You'll Need
- •Python 3.10+
- •
autogen-agentchat - •
autogen-ext - •An OpenAI API key in
OPENAI_API_KEY - •Basic familiarity with AutoGen agents, models, and async execution
- •A project where you can run
asyncioscripts
Install the packages:
pip install autogen-agentchat autogen-ext openai
Step-by-Step
- •Start with a model client that exposes usage data.
AutoGen’s OpenAI model client returns token usage in the response metadata, which is the raw input you need for cost tracking.
import asyncio
import os
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
client = OpenAIChatCompletionClient(
model="gpt-4o-mini",
api_key=os.environ["OPENAI_API_KEY"],
)
agent = AssistantAgent(
name="assistant",
model_client=client,
system_message="You are a concise assistant.",
)
- •Wrap execution in a small cost tracker.
Keep the accounting outside the agent so you can reuse it across teams, workflows, and model providers. For production systems, this is easier to test than burying logic inside prompts or tools.
from dataclasses import dataclass
@dataclass
class CostTracker:
input_tokens: int = 0
output_tokens: int = 0
total_cost_usd: float = 0.0
def add(self, input_tokens: int, output_tokens: int, cost_usd: float) -> None:
self.input_tokens += input_tokens
self.output_tokens += output_tokens
self.total_cost_usd += cost_usd
def estimate_cost_gpt_4o_mini(input_tokens: int, output_tokens: int) -> float:
# Example pricing only; verify against your provider's current rates.
input_rate = 0.15 / 1_000_000
output_rate = 0.60 / 1_000_000
return (input_tokens * input_rate) + (output_tokens * output_rate)
- •Run the agent and extract usage from the result.
The exact result object depends on the agent type, butTaskResultincludes messages and final output. For advanced usage tracking, inspect the last assistant message metadata after each run.
async def main() -> None:
tracker = CostTracker()
result = await agent.run(task="Write one sentence about why cost tracking matters.")
print("Final answer:", result.messages[-1].content)
last_message = result.messages[-1]
usage = getattr(last_message, "models_usage", None)
if usage is not None:
input_tokens = getattr(usage, "prompt_tokens", 0)
output_tokens = getattr(usage, "completion_tokens", 0)
cost_usd = estimate_cost_gpt_4o_mini(input_tokens, output_tokens)
tracker.add(input_tokens, output_tokens, cost_usd)
print("Input tokens:", tracker.input_tokens)
print("Output tokens:", tracker.output_tokens)
print("Estimated cost:", f"${tracker.total_cost_usd:.6f}")
asyncio.run(main())
- •Make it reusable across multiple runs.
In real systems you usually care about session-level or request-level totals, not one-off calls. The pattern below accumulates spend across multiple tasks using the same tracker.
async def tracked_run(task: str, tracker: CostTracker) -> str:
result = await agent.run(task=task)
final_message = result.messages[-1]
usage = getattr(final_message, "models_usage", None)
if usage is not None:
prompt_tokens = getattr(usage, "prompt_tokens", 0)
completion_tokens = getattr(usage, "completion_tokens", 0)
estimated_cost = estimate_cost_gpt_4o_mini(prompt_tokens, completion_tokens)
tracker.add(prompt_tokens, completion_tokens, estimated_cost)
return final_message.content
async def batch_main() -> None:
tracker = CostTracker()
for task in [
"Summarize why budgets matter in one line.",
"Give one risk of untracked LLM usage.",
]:
answer = await tracked_run(task, tracker)
print(answer)
print("Session tokens:", tracker.input_tokens + tracker.output_tokens)
print("Session cost:", f"${tracker.total_cost_usd:.6f}")
- •Add a hard budget guard before you go to production.
Once you can measure spend, enforce it. That lets you fail fast when a conversation crosses your threshold instead of discovering it later in billing.
BUDGET_USD = 0.01
async def guarded_run(task: str, tracker: CostTracker) -> str:
if tracker.total_cost_usd >= BUDGET_USD:
raise RuntimeError(f"Budget exceeded: ${tracker.total_cost_usd:.6f}")
result = await agent.run(task=task)
final_message = result.messages[-1]
usage = getattr(final_message, "models_usage", None)
if usage is not None:
prompt_tokens = getattr(usage, "prompt_tokens", 0)
completion_tokens = getattr(usage, "completion_tokens", 0)
estimated_cost = estimate_cost_gpt_4o_mini(prompt_tokens, completion_tokens)
tracker.add(prompt_tokens, completion_tokens, estimated_cost)
return final_message.content
Testing It
Run the script with a valid OPENAI_API_KEY and confirm that each response prints both content and non-zero usage fields when the provider returns them. Then run two or three tasks back-to-back and verify that your totals increase monotonically across calls.
If your token counts stay at zero, inspect the returned message object and log its attributes; different AutoGen versions may attach usage metadata slightly differently depending on model client and response path. Also confirm your pricing function matches the exact model you configured.
For a production check, set BUDGET_USD very low and make sure the code raises once accumulated spend crosses that threshold.
Next Steps
- •Persist per-request cost data to Postgres or OpenSearch for reporting and chargeback.
- •Add per-agent labels so you can compare planner vs executor spend.
- •Wrap this into middleware so every AutoGen workflow in your codebase gets tracking by default
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit