CrewAI Tutorial (Python): optimizing token usage for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

crewaioptimizing-token-usage-for-intermediate-developerspython

This tutorial shows you how to reduce token spend in a CrewAI project without breaking task quality. You’ll wire in smaller context windows, tighter task prompts, controlled memory, and output shaping so your agents stop burning tokens on unnecessary context.

What You'll Need

•Python 3.10+
•crewai
•python-dotenv
•An LLM API key for the model provider you use with CrewAI
•Basic familiarity with Agent, Task, and Crew
•A terminal and a virtual environment

Install the packages:

pip install crewai python-dotenv

Set your API key in .env:

OPENAI_API_KEY=your_key_here

Step-by-Step

•Start by using a smaller, cheaper model for routine work and reserve bigger models only when needed. Token optimization starts with model choice, not prompt tricks.

from dotenv import load_dotenv
load_dotenv()

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Research Analyst",
    goal="Summarize customer support issues from short notes",
    backstory="You are concise and factual.",
    llm="gpt-4o-mini",
    verbose=False,
)

writer = Agent(
    role="Summary Writer",
    goal="Turn notes into a short executive summary",
    backstory="You write tight summaries with no filler.",
    llm="gpt-4o-mini",
    verbose=False,
)

•Keep task instructions short and specific. Long prompts increase input tokens on every run, so define exactly what the agent should return and what it should ignore.

research_task = Task(
    description=(
        "Read the notes and extract the top 3 recurring support issues.\n"
        "Return only bullet points.\n"
        "Ignore greetings, signatures, and duplicated text."
    ),
    expected_output="Three bullet points with issue name and short explanation.",
    agent=researcher,
)

summary_task = Task(
    description=(
        "Write a 4-sentence executive summary from the extracted issues.\n"
        "Use plain English.\n"
        "Do not repeat the raw notes."
    ),
    expected_output="A concise 4-sentence summary.",
    agent=writer,
)

•Control context flow so agents do not keep dragging irrelevant history into every step. For intermediate CrewAI projects, this is usually where token waste starts to creep in.

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, summary_task],
    process=Process.sequential,
    memory=False,
    verbose=True,
)

result = crew.kickoff(inputs={
    "notes": (
        "Customer reports login failures after password reset. "
        "Several users mention slow response times in chat support. "
        "One customer says the refund status page is confusing."
    )
})

print(result)

•Reduce output size by forcing structured responses. If you ask for a paragraph when you only need fields, you pay for extra tokens on both output generation and downstream parsing.

from crewai import Agent, Task

triage_agent = Agent(
    role="Support Triage",
    goal="Classify support issues with minimal text",
    backstory="You return compact structured output.",
    llm="gpt-4o-mini",
)

triage_task = Task(
    description=(
        "Classify the issue into one of: login, billing, performance, ui.\n"
        "Return JSON with keys: category, severity, next_action."
    ),
    expected_output='JSON object like {"category":"login","severity":"high","next_action":"reset flow review"}',
    agent=triage_agent,
)

•Add a lightweight gate before calling higher-cost agents. In production, this pattern saves tokens by sending only high-value cases to expensive reasoning steps.

def needs_escalation(category: str) -> bool:
    return category in {"billing", "login"}

if __name__ == "__main__":
    classification = crew.kickoff(inputs={"notes": "User cannot log in after password reset."})
    
    if needs_escalation("login"):
        print("Escalate to specialist agent")

Testing It

Run the script twice: once with short input and once with noisy input that includes signatures or repeated text. The output should stay compact if your task instructions are tight enough.

Watch the console logs from verbose=True and compare how much text each task receives and returns. If you see long repeated context between tasks, turn off memory or split the workflow into smaller crews.

A practical check is to compare token-heavy prompts against trimmed prompts using the same input data. You should see lower latency and less model output drift when your prompts are constrained properly.

Next Steps

•Add explicit output schemas using Pydantic-style validation around CrewAI task results
•Split large crews into separate stages so only relevant data moves forward
•Learn when to use caching and when to disable memory for stateless workflows

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit