CrewAI Tutorial (Python): adding cost tracking for intermediate developers
This tutorial shows how to add cost tracking to a CrewAI Python project so you can see token usage and estimated spend per run, agent, and task. You need this when you move past prototypes and want basic observability for budgets, client billing, or just keeping LLM usage under control.
What You'll Need
- •Python 3.10+
- •A CrewAI project already working
- •
crewai - •
python-dotenv - •An OpenAI API key in your environment
- •Optional: a logging sink like stdout, file logs, or your observability platform
Install the packages:
pip install crewai python-dotenv
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start with a minimal CrewAI setup that uses environment variables and a couple of agents. The key point is that cost tracking only becomes useful when you consistently run tasks through the same process object.
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process
load_dotenv()
researcher = Agent(
role="Researcher",
goal="Find accurate business facts",
backstory="You are careful with sources and concise with answers."
)
writer = Agent(
role="Writer",
goal="Turn findings into a clear summary",
backstory="You write short, structured internal notes."
)
task_1 = Task(
description="Summarize the main risks of using AI in insurance claims processing.",
expected_output="A concise risk summary with bullet points.",
agent=researcher,
)
task_2 = Task(
description="Rewrite the research into an executive-friendly memo.",
expected_output="A short memo with clear headings.",
agent=writer,
)
- •Add a callback that records token usage after each task. CrewAI exposes task events through callbacks, which is the cleanest place to collect usage without modifying your agent logic.
from pathlib import Path
import json
USAGE_FILE = Path("usage_log.jsonl")
def track_usage(task_output):
usage = getattr(task_output, "token_usage", None)
if not usage:
return
record = {
"task": getattr(task_output.task, "description", "unknown"),
"prompt_tokens": getattr(usage, "prompt_tokens", None),
"completion_tokens": getattr(usage, "completion_tokens", None),
"total_tokens": getattr(usage, "total_tokens", None),
"cost": getattr(usage, "cost", None),
}
with USAGE_FILE.open("a", encoding="utf-8") as f:
f.write(json.dumps(record) + "\n")
- •Attach the callback to your crew and run it. If your model/provider returns usage metadata, CrewAI will pass it through and your logger will capture it per task.
crew = Crew(
agents=[researcher, writer],
tasks=[task_1, task_2],
process=Process.sequential,
task_callback=track_usage,
)
result = crew.kickoff()
print(result)
- •Add a small reporting function so you can summarize costs after each run. In production, this is where you would send metrics to Datadog, CloudWatch, PostHog, or a billing table.
import json
def summarize_usage(path: str = "usage_log.jsonl"):
total_tokens = 0
total_cost = 0.0
try:
with open(path, "r", encoding="utf-8") as f:
for line in f:
row = json.loads(line)
total_tokens += row.get("total_tokens") or 0
total_cost += row.get("cost") or 0.0
except FileNotFoundError:
print("No usage file found yet.")
return
print(f"Total tokens: {total_tokens}")
print(f"Estimated cost: ${total_cost:.6f}")
summarize_usage()
- •If you want more reliable accounting, add explicit model settings and keep one model per tier of work. That makes cost attribution easier because research tasks and writing tasks can be priced differently in your own reporting layer.
researcher.llm = "gpt-4o-mini"
writer.llm = "gpt-4o-mini"
crew = Crew(
agents=[researcher, writer],
tasks=[task_1, task_2],
process=Process.sequential,
task_callback=track_usage,
)
result = crew.kickoff()
print(result)
Testing It
Run the script once and check that usage_log.jsonl gets created in your working directory. If your provider returns token metadata through CrewAI for the model you selected, each completed task should append one JSON line.
If the file stays empty, verify three things: your API key is loaded correctly, the model actually supports usage reporting through the current provider integration, and your tasks are running through kickoff() rather than being executed manually outside the crew. For a quick sanity check, print result and confirm the crew completes without errors.
For production use, compare the recorded totals against your provider dashboard for a few runs. The numbers should be close enough to use for internal budgeting even if they are not identical to invoice-level billing.
Next Steps
- •Add per-agent cost aggregation so you can see which role burns budget fastest.
- •Send usage records to structured logging instead of flat files.
- •Build budget guards that stop execution when estimated spend crosses a threshold.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit