CrewAI Tutorial (Python): optimizing token usage for advanced developers

By Cyprian AaronsUpdated 2026-04-21
crewaioptimizing-token-usage-for-advanced-developerspython

This tutorial shows how to build a CrewAI workflow that spends fewer tokens without breaking task quality. You’ll use tighter agent instructions, smaller context windows, explicit task outputs, and short-lived memory patterns so your crew stops paying for unnecessary prompt bloat.

What You'll Need

  • Python 3.10+
  • A virtual environment
  • crewai
  • python-dotenv
  • An LLM API key, such as:
    • OPENAI_API_KEY
    • or another provider supported by your CrewAI setup
  • Basic familiarity with:
    • Agent
    • Task
    • Crew
    • kickoff()

Install the packages:

pip install crewai python-dotenv

Step-by-Step

  1. Start by setting up a minimal environment and loading secrets from .env. Token optimization begins here: if you let prompts grow uncontrolled across experiments, you’ll never know what changed.
from dotenv import load_dotenv

load_dotenv()

import os

api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("OPENAI_API_KEY is missing from the environment")
  1. Define agents with narrow responsibilities and strict output limits. The main trick is to reduce prompt surface area: short role descriptions, clear constraints, and no extra backstory.
from crewai import Agent

researcher = Agent(
    role="Research Analyst",
    goal="Extract only the facts needed for the task",
    backstory="You summarize source material in compact bullet points.",
    verbose=False,
    allow_delegation=False,
)

writer = Agent(
    role="Technical Writer",
    goal="Produce concise implementation notes",
    backstory="You write direct, production-focused explanations.",
    verbose=False,
    allow_delegation=False,
)
  1. Make tasks force compact outputs. Use explicit formatting rules so the model doesn’t wander into long explanations that inflate token usage downstream.
from crewai import Task

research_task = Task(
    description=(
        "Review the input topic and return exactly 5 bullets.\n"
        "Each bullet must be under 12 words.\n"
        "Focus only on token-saving tactics for CrewAI."
    ),
    expected_output="5 short bullets with concrete token optimization tactics",
    agent=researcher,
)

write_task = Task(
    description=(
        "Turn the research bullets into a practical implementation plan.\n"
        "Use short paragraphs and include one code-oriented recommendation per section."
    ),
    expected_output="A concise implementation plan with actionable steps",
    agent=writer,
)
  1. Keep context tight when creating the crew. For advanced workflows, avoid handing every agent everything; pass only what each step needs and keep verbosity off unless you are debugging.
from crewai import Crew, Process

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential,
    verbose=False,
)

result = crew.kickoff(inputs={
    "topic": "optimizing token usage in CrewAI for advanced Python developers"
})

print(result)
  1. Add a small pre-processing layer before CrewAI sees the request. This is where you cut token waste most effectively: trim user input, normalize it, and remove repeated phrasing before it becomes part of the prompt.
def compress_topic(topic: str) -> str:
    topic = topic.strip()
    replacements = {
        "advanced developers": "advanced devs",
        "token usage": "token spend",
        "CrewAI Tutorial (Python)": "CrewAI Python tutorial",
    }
    for old, new in replacements.items():
        topic = topic.replace(old, new)
    return topic

topic = compress_topic("CrewAI Tutorial (Python): optimizing token usage for advanced developers")
print(topic)
  1. If you need even tighter control, split work into stages and only pass forward the final distilled output. That prevents long intermediate reasoning chains from being reintroduced into later prompts.
def build_crew(topic: str):
    return Crew(
        agents=[researcher, writer],
        tasks=[
            Task(
                description=f"Return 5 compact bullets about: {topic}",
                expected_output="5 short bullets",
                agent=researcher,
            ),
            Task(
                description="Rewrite those bullets into an implementation checklist.",
                expected_output="A short checklist",
                agent=writer,
            ),
        ],
        process=Process.sequential,
        verbose=False,
    )

final_result = build_crew(compress_topic("optimizing token usage for advanced developers")).kickoff()
print(final_result)

Testing It

Run the script and inspect both the length and usefulness of the output. You want fewer tokens spent on setup and intermediate steps, but still enough detail to produce a usable result.

A good test is to compare this version against a looser version with verbose agents and long backstories. If your optimized version returns shorter, more focused outputs without losing key implementation details, you’re on the right track.

Also check whether your tasks are producing deterministic structure: bullets should stay bullets, checklists should stay checklists. If outputs drift into essays, tighten expected_output and shorten task descriptions again.

Next Steps

  • Add response caching so repeated prompts don’t hit the model twice
  • Introduce model routing: cheap model for extraction, stronger model for final synthesis
  • Measure prompt size per task before each kickoff and log it alongside latency

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides