How to Fix 'context length exceeded when scaling' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-21

context-length-exceeded-when-scalingcrewaipython

What the error means

context length exceeded when scaling usually means one of your CrewAI agents tried to send too much text into the model at once. In practice, this happens when task outputs, memory, tool results, or repeated agent chatter keep accumulating until the prompt crosses the model’s token limit.

You’ll usually see it during multi-agent runs, long task chains, or when an agent keeps reusing full transcripts instead of compact summaries. The failure often surfaces as a model-side error wrapped by CrewAI during Agent.execute_task() or inside a Crew.kickoff() run.

The Most Common Cause

The #1 cause is unbounded context growth: you keep passing full outputs from one task to the next, then add memory on top. In CrewAI, this is easy to do when tasks are chained and each task output is stored verbatim in downstream prompts.

Here’s the broken pattern versus the fixed pattern:

Broken	Fixed
Passes full raw output into every next task	Summarizes or extracts only the needed fields
Keeps memory enabled without trimming	Limits memory or uses compact state
Reuses entire transcripts as context	Uses structured outputs

# BROKEN
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Collect everything",
    backstory="You are thorough.",
    verbose=True,
)

writer = Agent(
    role="Writer",
    goal="Write a report",
    backstory="You write concise reports.",
    verbose=True,
)

task1 = Task(
    description="Research the customer complaint and return all findings.",
    expected_output="A detailed report with all notes.",
    agent=researcher,
)

task2 = Task(
    description=f"Write a summary using this full research output:\n{task1.description}",
    expected_output="A short summary.",
    agent=writer,
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[task1, task2],
    verbose=True,
)
result = crew.kickoff()

# FIXED
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Collect only relevant facts",
    backstory="You extract structured facts.",
    verbose=True,
)

writer = Agent(
    role="Writer",
    goal="Write a report",
    backstory="You write concise reports.",
    verbose=True,
)

task1 = Task(
    description=(
        "Research the customer complaint and return ONLY:\n"
        "- issue_summary\n"
        "- root_cause\n"
        "- recommended_action\n"
        "No extra commentary."
    ),
    expected_output="Structured JSON-like fields with three keys.",
    agent=researcher,
)

task2 = Task(
    description=(
        "Write a summary using only these fields from research output:\n"
        "{issue_summary}\n{root_cause}\n{recommended_action}"
    ),
    expected_output="A short summary under 150 words.",
    agent=writer,
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[task1, task2],
    verbose=True,
)
result = crew.kickoff()

The key change is not “make the model smarter.” It’s “stop feeding it garbage volume.” If you need downstream work, pass only extracted fields, not raw transcripts.

Other Possible Causes

1. Memory is accumulating too much conversation

If you enabled memory and your agents keep long chat histories, token usage grows on every step.

crew = Crew(
    agents=[researcher, writer],
    tasks=[task1, task2],
    memory=True,
)

Fix it by disabling memory where it’s not needed or by summarizing state before reuse.

crew = Crew(
    agents=[researcher, writer],
    tasks=[task1, task2],
    memory=False,
)

2. Tool output is huge

A tool that returns entire PDFs, logs, HTML pages, or database dumps can blow up context immediately.

@tool("fetch_case_file")
def fetch_case_file(case_id: str) -> str:
    return open(f"/data/cases/{case_id}.txt").read()

Trim the result before returning it to the agent.

@tool("fetch_case_file")
def fetch_case_file(case_id: str) -> str:
    text = open(f"/data/cases/{case_id}.txt").read()
    return text[:4000]  # return only what the agent needs

3. Verbose prompts are stacking instructions

Long role, goal, backstory, and task descriptions all count toward context.

agent = Agent(
    role="Senior compliance analyst with 20 years of experience...",
    goal="Review every possible edge case...",
)

Keep prompts tight and specific.

agent = Agent(
    role="Compliance analyst",
    goal="Classify claims documents for escalation",
)

4. Context window is too small for your workload

Some models have smaller limits than you expect. A setup that works on one provider can fail on another.

llm_config = {
  "model": "gpt-4o-mini"  # may be too small for your current prompt size
}

Switch to a larger-context model if your workflow genuinely needs more room.

llm_config = {
  "model": "gpt-4o"
}

How to Debug It

•
Print every task input and output size
- •Log character counts before each Task runs.
- •If one step jumps from small to massive, that’s your culprit.
•
Disable memory first
- •Run the same crew with memory=False.
- •If the error disappears, you’re dealing with accumulated conversation state.
•
Inspect tool payloads
- •Log raw tool responses.
- •If a tool returns thousands of lines or full documents, truncate or summarize at source.
•
Reduce the pipeline to two tasks
- •Remove extra agents/tasks until it works.
- •Add them back one by one until context explodes again.

A practical logging helper looks like this:

def log_size(label: str, text: str):
    print(f"{label}: chars={len(text)}")

log_size("task1_output", task1.description)
log_size("task2_input", task2.description)

If you want better signal than character counts, log approximate tokens using your tokenizer of choice before calling Crew.kickoff().

Prevention

•
Keep task outputs structured:
- •Return JSON-like fields or short summaries.
- •Don’t pass raw transcripts between agents unless you absolutely need them.
•
Treat tools as bounded interfaces:
- •Truncate large files.
- •Paginate search results.
- •Summarize documents before handing them to an agent.
•
Set hard limits early:
- •Cap memory usage.
- •Use concise prompts.
- •Prefer smaller intermediate artifacts over “everything in one prompt.”

If you’re seeing context length exceeded when scaling in CrewAI Python code, start by looking at what gets passed from one step to the next. In most cases, the fix is not in CrewAI itself — it’s in how much text you’re feeding into each agent run.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit