How to Fix 'token limit exceeded during development' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-21

token-limit-exceeded-during-developmentcrewaipython

What the error means

token limit exceeded during development usually means one of your CrewAI agents is being fed more text than the model can handle in a single request. It shows up when you pass too much context into an agent, especially long task descriptions, huge tool outputs, or accumulated chat history.

In CrewAI, this often happens during iterative development because you keep adding more instructions, more tools, and more memory without checking what actually gets sent to the LLM.

The Most Common Cause

The #1 cause is dumping large raw data into the task prompt or agent context instead of trimming it first.

A common pattern is reading an entire file, API response, or database dump and passing it straight into Task(description=...). That works until the prompt crosses the model’s token limit and CrewAI surfaces errors like:

•ValueError: token limit exceeded during development
•litellm.BadRequestError: This model's maximum context length is ...
•openai.BadRequestError: This model's maximum context length is ...

Broken vs fixed pattern

Broken pattern	Fixed pattern
Pass full raw payload into the task	Pre-summarize or chunk before task creation
Let the agent see everything	Give the agent only what it needs
No size guardrails	Enforce max input length

# BROKEN
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

analyst = Agent(
    role="Policy Analyst",
    goal="Review customer policy data",
    backstory="You analyze insurance policy records.",
    llm=llm,
)

with open("large_policy_export.txt", "r", encoding="utf-8") as f:
    raw_policy_data = f.read()

task = Task(
    description=f"""
    Analyze this policy export and find anomalies.

    DATA:
    {raw_policy_data}
    """,
    agent=analyst,
)

crew = Crew(agents=[analyst], tasks=[task])
result = crew.kickoff()

# FIXED
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI

def trim_text(text: str, max_chars: int = 12000) -> str:
    return text[:max_chars]

llm = ChatOpenAI(model="gpt-4o-mini")

analyst = Agent(
    role="Policy Analyst",
    goal="Review customer policy data",
    backstory="You analyze insurance policy records.",
    llm=llm,
)

with open("large_policy_export.txt", "r", encoding="utf-8") as f:
    raw_policy_data = f.read()

safe_policy_data = trim_text(raw_policy_data)

task = Task(
    description=f"""
    Analyze this policy export and find anomalies.

    DATA:
    {safe_policy_data}
    """,
    agent=analyst,
)

crew = Crew(agents=[analyst], tasks=[task])
result = crew.kickoff()

If the file is truly large, truncation is only a temporary fix. The production fix is chunking plus summarization before the agent sees anything.

Other Possible Causes

1. Memory is accumulating too much conversation

If you enabled memory and keep running multi-step crews, older messages can pile up fast.

crew = Crew(
    agents=[analyst],
    tasks=[task],
    memory=True,
)

If you don’t need long-running conversational state, disable memory. If you do need it, use a smaller window or summarize prior turns before reusing them.

2. Tool output is too verbose

A tool that returns full HTML pages, logs, PDFs converted to text, or large JSON blobs can blow up your context.

# BAD: returning entire response body
def fetch_claim_history(claim_id: str):
    return requests.get(f"https://api.example.com/claims/{claim_id}").text

Fix it by returning only the fields the agent needs.

# BETTER: return compact structured data
def fetch_claim_history(claim_id: str):
    data = requests.get(f"https://api.example.com/claims/{claim_id}").json()
    return {
        "claim_id": data["claim_id"],
        "status": data["status"],
        "last_updated": data["last_updated"],
        "flags": data.get("flags", [])[:5],
    }

3. Too many agents/tasks are chained with oversized outputs

Each task output becomes input for another task if you wire crews carelessly. Three small tasks are fine; three tasks each emitting 20k tokens are not.

crew = Crew(
    agents=[researcher, analyst, reviewer],
    tasks=[research_task, analysis_task, review_task],
)

Make intermediate outputs concise. Tell each agent to produce bullet points or structured JSON instead of long prose.

4. Your model context window is smaller than your prompt assumptions

Some models tolerate far less context than others. If you’re using a smaller model through ChatOpenAI, LiteLLM, or another provider wrapper, your prompt may exceed limits even if it looked fine in testing.

llm = ChatOpenAI(model="gpt-4o-mini")  # smaller context than larger GPT variants

If your workload requires long context, move to a larger-context model or redesign the pipeline so the agent never receives full raw inputs.

How to Debug It

•
Print the exact prompt size before kickoff
- •Log task descriptions, tool outputs, and any memory content.
- •If one string is huge, that’s your first suspect.
•
Disable features one at a time
- •Turn off memory.
- •Remove tools.
- •Replace multi-agent flows with a single task.
- •Re-run until the error disappears.
•
Inspect tool return values
- •Add logging inside every custom tool.
- •Check whether a tool returns full documents instead of summaries.
- •A bad tool output often looks harmless until it gets injected into the next step.
•
Compare against model limits
- •Check the token window for your provider/model pair.
- •Remember that system prompts + task text + tool output + history all count together.
- •If you’re close to the limit in logs, reduce input size by half and test again.

Prevention

•
Keep task descriptions short and specific.
- •Don’t paste source files or API payloads into prompts.
- •Send only relevant excerpts.
•
Make tools return structured summaries.
- •Prefer JSON with a few fields over raw text dumps.
- •Cap arrays and truncate long strings.
•
Add token guards early.
- •Reject oversized inputs before they reach Task.
- •Log approximate size in chars or tokens during development.

A good rule in CrewAI projects: if an agent doesn’t need to read it to complete the job, don’t put it in context. That one habit prevents most token limit exceeded during development failures before they start.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit