CrewAI Tutorial (Python): adding cost tracking for advanced developers

By Cyprian AaronsUpdated 2026-04-21
crewaiadding-cost-tracking-for-advanced-developerspython

This tutorial shows you how to add per-run cost tracking to a CrewAI project in Python, using real model usage data and a small accounting layer you control. You need this when you want to monitor agent spend by task, compare model choices, or enforce budget limits before your crew burns through API credits.

What You'll Need

  • Python 3.10+
  • crewai
  • python-dotenv
  • An OpenAI API key set as OPENAI_API_KEY
  • Optional but useful:
    • pandas for reporting
    • a database or log sink if you want persistent cost history
  • A CrewAI project with at least one agent, one task, and one kickoff path

Step-by-Step

  1. Start with a minimal CrewAI crew that uses a real LLM and returns usage metadata. The key point is that you must keep the kickoff result object around so you can inspect token usage after execution.
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process, LLM

load_dotenv()

llm = LLM(model="gpt-4o-mini")

researcher = Agent(
    role="Researcher",
    goal="Summarize the business impact of a feature request",
    backstory="You write concise internal memos for product teams.",
    llm=llm,
    verbose=True,
)

task = Task(
    description="Write a 5-bullet summary of why cost tracking matters in AI agents.",
    expected_output="A concise bullet list with practical reasons.",
    agent=researcher,
)

crew = Crew(
    agents=[researcher],
    tasks=[task],
    process=Process.sequential,
    verbose=True,
)
  1. Run the crew and inspect the output object. In CrewAI, the exact shape can vary by version, so the safest pattern is to print the result and probe for usage fields instead of assuming one fixed attribute.
result = crew.kickoff()

print("\n=== RAW RESULT ===")
print(result)

print("\n=== TYPE ===")
print(type(result))

for attr in ["usage_metrics", "token_usage", "usage", "raw"]:
    if hasattr(result, attr):
        print(f"\n=== {attr.upper()} ===")
        print(getattr(result, attr))
  1. Add a small cost calculator that converts token counts into dollars. For production use, keep rates in config so you can change models without editing code.
from dataclasses import dataclass

@dataclass(frozen=True)
class ModelPricing:
    input_per_1m: float
    output_per_1m: float

PRICING = {
    "gpt-4o-mini": ModelPricing(input_per_1m=0.15, output_per_1m=0.60),
}

def estimate_cost(model_name: str, input_tokens: int, output_tokens: int) -> float:
    pricing = PRICING[model_name]
    input_cost = (input_tokens / 1_000_000) * pricing.input_per_1m
    output_cost = (output_tokens / 1_000_000) * pricing.output_per_1m
    return round(input_cost + output_cost, 6)
  1. Wrap kickoff in a helper that extracts usage metrics and emits a structured record. This is the part you can send to logs, OpenTelemetry, S3, Postgres, or whatever accounting system your team already uses.
from datetime import datetime, timezone

def run_with_cost_tracking(crew: Crew, model_name: str) -> dict:
    result = crew.kickoff()

    usage = None
    for attr in ["usage_metrics", "token_usage", "usage"]:
        if hasattr(result, attr):
            usage = getattr(result, attr)
            if usage:
                break

    input_tokens = getattr(usage, "prompt_tokens", 0) if usage else 0
    output_tokens = getattr(usage, "completion_tokens", 0) if usage else 0

    cost_usd = estimate_cost(model_name, input_tokens, output_tokens)

    record = {
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "model": model_name,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "estimated_cost_usd": cost_usd,
        "output": str(result),
    }
    return record

record = run_with_cost_tracking(crew, "gpt-4o-mini")
print(record)
  1. If you want per-task tracking instead of just per-run tracking, split the crew into multiple tasks and execute them in sequence while capturing each run separately. That gives you cleaner attribution when one agent is expensive and another is cheap.
tasks = [
    Task(
        description="List 3 reasons cost tracking matters for AI agents.",
        expected_output="Three short reasons.",
        agent=researcher,
    ),
    Task(
        description="Turn those reasons into an executive-friendly summary.",
        expected_output="A short executive summary.",
        agent=researcher,
    ),
]

crew = Crew(
    agents=[researcher],
    tasks=tasks,
    process=Process.sequential,
    verbose=True,
)

record = run_with_cost_tracking(crew, "gpt-4o-mini")
print(record["estimated_cost_usd"])

Testing It

Run the script once and confirm two things: the crew returns content successfully and your tracking record prints non-zero token counts when CrewAI exposes them. If token fields come back as zero on your version, print the raw result object and inspect its attributes; different releases expose usage slightly differently.

For a real test suite, mock crew.kickoff() and feed your wrapper an object with prompt_tokens and completion_tokens. That lets you validate your cost math without paying for live API calls every time CI runs.

If you are using multiple models across agents, verify the recorded model name matches the actual LLM assigned to each agent. Otherwise your cost estimates will drift fast because pricing is model-specific.

Next Steps

  • Store each run record in Postgres or SQLite so you can query spend by date, team, or workflow.
  • Add budget guards that stop execution when estimated spend crosses a threshold.
  • Extend the tracker to capture latency alongside tokens so you can compare cost vs performance across models.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides