How to Fix 'token limit exceeded in production' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceeded-in-productioncrewaipython

What the error means

token limit exceeded in production usually means one of your CrewAI agents sent too much context to the model. In practice, this happens when task history, tool output, or long prompts keep getting appended until the provider rejects the request.

You’ll see it most often in multi-step crews, recursive task flows, or agents that keep re-reading large documents without trimming context.

The Most Common Cause

The #1 cause is uncontrolled context growth across tasks. In CrewAI, this usually happens when you pass large outputs from one task into the next, then let the agent keep full history enabled.

Here’s the broken pattern:

# broken.py
from crewai import Agent, Task, Crew
from crewai_tools import FileReadTool

researcher = Agent(
    role="Researcher",
    goal="Summarize the document",
    backstory="You analyze financial reports.",
    tools=[FileReadTool()],
    verbose=True,
)

writer = Agent(
    role="Writer",
    goal="Write a client-ready summary",
    backstory="You write concise reports.",
    verbose=True,
)

read_task = Task(
    description="Read this entire 80-page report and extract all relevant details: {report_text}",
    agent=researcher,
)

write_task = Task(
    description="Use the full research output and produce a final memo: {research_output}",
    agent=writer,
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[read_task, write_task],
    verbose=True,
)

result = crew.kickoff(inputs={"report_text": huge_report_text})

And here’s the fixed pattern:

# fixed.py
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Extract only decision-grade facts",
    backstory="You analyze financial reports.",
    verbose=True,
)

writer = Agent(
    role="Writer",
    goal="Write a concise client memo",
    backstory="You write concise reports.",
    verbose=True,
)

read_task = Task(
    description=(
        "Summarize this report in 10 bullets max. "
        "Include only material risks, key figures, and recommendations. "
        "Do not quote long sections."
    ),
    agent=researcher,
)

write_task = Task(
    description=(
        "Use the summarized bullets only. "
        "Write a 1-page memo with executive summary and action items."
    ),
    agent=writer,
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[read_task, write_task],
    verbose=True,
)

result = crew.kickoff(inputs={"report_text": huge_report_text[:12000]})

The point is simple: don’t feed raw documents into every step. Extract once, compress hard, then pass forward only the minimum useful output.

Other Possible Causes

1) Tool output is too large

If your tool returns entire files, logs, or database dumps, CrewAI will happily stuff that into the prompt unless you trim it.

# bad
def fetch_policy_docs():
    return open("all_policies.txt").read()

# better
def fetch_policy_docs():
    text = open("all_policies.txt").read()
    return text[:4000]

If you’re using a custom tool class:

from crewai_tools import BaseTool

class PolicyTool(BaseTool):
    name: str = "policy_tool"
    description: str = "Fetch policy snippets"

    def _run(self):
        docs = load_docs()
        return "\n".join(docs[:5])  # top 5 snippets only

2) Your prompt is doing too much work

A giant Task.description can burn tokens before the model even starts reasoning.

# bad
Task(
  description=f"""
  Analyze all of this:
  {huge_policy}
  {huge_contract}
  {huge_email_thread}
  {huge_customer_history}
  """
)

Use references instead of embedding everything:

# better
Task(
  description="""
  Analyze the provided summary and identify:
  - compliance risk
  - missing fields
  - recommended next action
  Use no more than 8 bullets.
  """
)

3) Memory is accumulating across runs

If you use memory-enabled agents or reuse objects across requests in production, old context can pile up.

agent = Agent(
    role="Analyst",
    goal="Review claims",
    backstory="...",
    memory=True,
)

If you don’t need persistent memory for that workflow, disable it or scope it per request. In stateless APIs, fresh agent instances per job are safer.

4) You are chaining too many tasks without compression

A six-task crew that passes verbose outputs between each step will hit limits fast.

PatternResult
Raw output from each task passed to nextToken growth explodes
Summarize after each taskContext stays bounded
Final synthesis only uses compressed notesStable in production

A practical fix is to insert a compression task:

compress_task = Task(
    description="Compress previous findings into a strict JSON summary with max 15 fields.",
    agent=researcher,
)

How to Debug It

  1. Print the exact payload size

    • Log len(prompt) for your task descriptions.
    • Log tool outputs before returning them.
    • If one output is huge, that’s your first suspect.
  2. Turn on verbose mode

    • Set verbose=True on agents and crews.
    • Look for repeated context being re-sent across steps.
    • Watch for long tool responses being injected into later prompts.
  3. Binary search the workflow

    • Remove half your tasks.
    • If the error disappears, add them back one by one.
    • Do the same with tools and memory until you find the trigger.
  4. Check provider-side limits

    • The error may show up as:
      • context_length_exceeded
      • BadRequestError: This model's maximum context length is...
      • token limit exceeded in production
    • Verify model context window and compare it to your estimated input size plus expected output size.

Prevention

  • Keep every task output short and structured.
  • Summarize before passing data to another agent.
  • Use stateless crews for API workloads unless you truly need memory.
  • Cap tool responses at source instead of relying on the LLM to ignore excess text.
  • Set hard limits in code for prompt size and document chunking.

If you want CrewAI to survive production traffic, treat tokens like memory in a backend service: bounded inputs, bounded outputs, no surprise accumulation. That’s what keeps this error from coming back.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides