CrewAI Tutorial (Python): optimizing token usage for beginners

By Cyprian AaronsUpdated 2026-04-21
crewaioptimizing-token-usage-for-beginnerspython

This tutorial shows you how to build a small CrewAI workflow in Python that uses fewer tokens by controlling agent scope, limiting context growth, and keeping task outputs tight. You need this when your crews start getting expensive or slow because every agent keeps rereading too much history.

What You'll Need

  • Python 3.10+
  • crewai
  • python-dotenv
  • An LLM API key, such as:
    • OPENAI_API_KEY, or
    • another provider supported by your CrewAI setup
  • A terminal and a virtual environment
  • Basic familiarity with:
    • Agent
    • Task
    • Crew
    • Process

Step-by-Step

  1. Start with a clean project and install only what you need. Token optimization begins before the first agent runs, because extra packages and messy environments make it harder to isolate what is actually being sent to the model.
mkdir crewai-token-optimization
cd crewai-token-optimization
python -m venv .venv
source .venv/bin/activate

pip install crewai python-dotenv
  1. Put your API key in a .env file and keep your model choice explicit. For beginners, the easiest win is using a smaller model for routine tasks instead of defaulting everything to the biggest one available.
OPENAI_API_KEY=your_key_here
from dotenv import load_dotenv
import os

load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("OPENAI_API_KEY is missing")
  1. Define agents with narrow roles and short backstories. Long role descriptions and vague goals waste tokens because the model has to parse more instructions than necessary.
from crewai import Agent

researcher = Agent(
    role="Research Analyst",
    goal="Extract only the relevant facts from short input.",
    backstory="You summarize information for internal operations teams.",
    verbose=False,
    allow_delegation=False,
)

writer = Agent(
    role="Report Writer",
    goal="Turn extracted facts into a concise answer.",
    backstory="You write short operational summaries.",
    verbose=False,
    allow_delegation=False,
)
  1. Keep tasks specific, set output expectations, and avoid asking for long prose unless you need it. The main token saver here is forcing the model to produce compact output instead of open-ended explanations.
from crewai import Task

research_task = Task(
    description=(
        "Read the user request and extract exactly 5 bullet points "
        "with the most relevant operational details."
    ),
    expected_output="Exactly 5 bullets, no intro, no conclusion.",
    agent=researcher,
)

write_task = Task(
    description=(
        "Use the extracted bullets to write a final answer in under 120 words."
    ),
    expected_output="A short paragraph with no more than 120 words.",
    agent=writer,
)
  1. Run the crew with minimal context growth. For beginner workflows, keep the process simple and avoid unnecessary delegation so each agent only sees what it needs.
from crewai import Crew, Process

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential,
    verbose=False,
)

result = crew.kickoff(inputs={"request": "Summarize a customer complaint about delayed claims processing."})
print(result)
  1. Add a simple token-saving pattern: pre-trim inputs before they reach CrewAI. This is one of the highest-value habits because you control prompt size before any agent starts reasoning over it.
def trim_request(text: str, max_words: int = 30) -> str:
    words = text.split()
    return " ".join(words[:max_words])

raw_request = """
Summarize a customer complaint about delayed claims processing.
The customer submitted documents twice.
They want an update on status and next steps.
"""

inputs = {"request": trim_request(raw_request)}
print(inputs["request"])

Testing It

Run the script and check that it returns a short response without long chain-of-thought style output or repeated context. If you want to verify token savings more directly, compare this version against a version where you give agents long backstories and ask for detailed essays.

You should also watch for two failure modes: oversized inputs and overly broad task descriptions. If either one grows, token usage grows with it.

For a practical check, print your trimmed input before kickoff and confirm it stays small. Then tighten the expected_output field until the response length matches what your app actually needs.

Next Steps

  • Add memory only where state truly matters; do not turn it on by default.
  • Learn how to use tools sparingly so agents do not pull in unnecessary context.
  • Build a prompt budget checklist for every new task:
    • input length
    • output length
    • number of agents
    • delegation settings

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides