How to Fix 'token limit exceeded when scaling' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceeded-when-scalingcrewaipython

What the error means

token limit exceeded when scaling in CrewAI usually means one of your agents is trying to pass too much context into a model call while the framework is expanding tasks, tool outputs, or conversation history. It shows up most often when you scale from a single task to multiple agents, longer task chains, or verbose tool responses.

In practice, this is not a CrewAI bug in isolation. It’s usually a context management problem: too much text is being carried forward into Agent, Task, or Crew execution.

The Most Common Cause

The #1 cause is feeding large intermediate outputs back into later tasks without trimming them first. In CrewAI, this happens when one agent returns a huge markdown blob, JSON dump, or full document, and the next agent gets that entire payload as context.

Broken vs fixed pattern

Broken patternFixed pattern
Pass full raw output from one task into the nextExtract only the fields the next task needs
Let agents summarize long documents repeatedlyPre-chunk or pre-summarize before handing off
Use verbose expected_output that encourages giant responsesConstrain output shape explicitly
# BROKEN
from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Researcher",
    goal="Research the policy",
    backstory="You analyze insurance policy documents."
)

writer = Agent(
    role="Writer",
    goal="Write a concise summary",
    backstory="You write short executive summaries."
)

research_task = Task(
    description="Read this full policy and extract everything useful:\n\n" + open("policy.txt").read(),
    expected_output="A complete detailed analysis of the policy.",
    agent=researcher
)

write_task = Task(
    description="Use the research output to write a summary.",
    expected_output="A polished summary.",
    agent=writer,
    context=[research_task]   # full output gets carried forward
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential
)

result = crew.kickoff()
# FIXED
from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Researcher",
    goal="Extract only key policy facts",
    backstory="You analyze insurance policy documents."
)

writer = Agent(
    role="Writer",
    goal="Write a concise summary",
    backstory="You write short executive summaries."
)

research_task = Task(
    description=(
        "Read the policy and return ONLY these fields:\n"
        "- coverage_limit\n"
        "- exclusions\n"
        "- renewal_terms\n"
        "- cancellation_terms"
    ),
    expected_output="A short structured JSON object with four keys.",
    agent=researcher
)

write_task = Task(
    description=(
        "Write a 5-bullet summary using only the structured fields below.\n"
        "Do not restate raw policy text."
    ),
    expected_output="Exactly 5 bullets.",
    agent=writer,
    context=[research_task]
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential
)

The fix is simple: reduce what gets forwarded. If you need more detail later, store it outside the prompt and retrieve only what’s relevant.

Other Possible Causes

1. Tool output is too large

A common trigger is an agent tool returning an entire file, HTML page, API payload, or database dump.

# Problematic tool result
def fetch_claims():
    return requests.get("https://api.example.com/claims").text  # huge payload

Fix it by filtering at source:

def fetch_claims():
    data = requests.get("https://api.example.com/claims").json()
    return {
        "claim_id": data["claim_id"],
        "status": data["status"],
        "amount": data["amount"]
    }

2. Recursive or multi-step crews keep appending history

If you run many sequential tasks with long outputs, each step compounds token usage.

crew = Crew(
    agents=[a1, a2, a3, a4],
    tasks=[t1, t2, t3, t4],
    process=Process.sequential,
)

If each task emits long text, later tasks inherit too much context. Shorten outputs or split crews into smaller stages.

3. expected_output is too open-ended

This encourages verbose completions that balloon context.

Task(
    description="Analyze this underwriting document.",
    expected_output="A comprehensive analysis covering all possible details.",
)

Use constrained outputs instead:

Task(
    description="Analyze this underwriting document.",
    expected_output="Return only: risk_level, top_3_risks, recommendation."
)

4. Large files are injected directly into prompts

This is common when developers do string concatenation with PDFs converted to text.

description = f"Review this document:\n{big_document_text}"

Instead:

  • chunk the document
  • summarize chunks first
  • pass only retrieved chunks relevant to the question

How to Debug It

  1. Print every task input and output size

    • Log character count before each Task.
    • If one output jumps from 2 KB to 80 KB, that’s your culprit.
  2. Inspect tool responses

    • Check whether any custom tool returns raw JSON arrays, HTML pages, or full tables.
    • Trim at the tool boundary before CrewAI sees it.
  3. Run tasks individually

    • Execute each Task outside the full Crew.
    • If one task works alone but fails in sequence, your issue is accumulated context.
  4. Reduce context aggressively

    • Remove context=[...] temporarily.
    • Replace long descriptions with short structured instructions.
    • If the error disappears, you’ve confirmed prompt bloat.

A useful rule: if you see something like BadRequestError, context length exceeded, or CrewAI surfaces an OpenAI/Anthropic token limit message during kickoff(), start by shrinking what flows between tasks.

Prevention

  • Keep task outputs structured and small.
    • Prefer JSON with a few keys over free-form essays.
  • Never pass raw documents through multiple agents.
    • Chunk first, retrieve second.
  • Put size limits on tools.
    • Truncate large responses before returning them to an agent.
  • Design crews so each step has one job.
    • One extractor, one summarizer, one decision-maker beats one giant chain.

If you’re seeing token limit exceeded when scaling in CrewAI with Python, treat it as a context design issue first. In most cases, trimming task handoffs and tightening tool outputs fixes it fast.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides