How to Fix 'context length exceeded during development' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceeded-during-developmentcrewaipython

What the error means

context length exceeded during development usually means CrewAI sent too much text to the model in one call. In practice, this happens when task outputs, tool results, chat history, or agent memory keep growing until the LLM’s token limit is hit.

You’ll see this most often in multi-step crews where agents pass large outputs to each other, or when a tool returns a huge payload and CrewAI blindly includes it in the next prompt.

The Most Common Cause

The #1 cause is passing full, untrimmed task output into the next task or agent context.

CrewAI doesn’t care that your data is “just JSON” or “just logs”. If you stuff 20 KB of raw text into description, expected_output, memory, or tool output, the next LLM call can blow past the model’s context window.

Broken vs fixed pattern

Broken patternFixed pattern
Passes full verbose output to the next taskSummarizes or extracts only what the next step needs
Uses raw tool output directlyTruncates, filters, or stores raw data outside the prompt
Lets memory accumulate everythingKeeps memory scoped and minimal
# BROKEN
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Collect all relevant data",
    backstory="You are thorough."
)

writer = Agent(
    role="Writer",
    goal="Write a concise report",
    backstory="You turn research into clean summaries."
)

task_1 = Task(
    description="Search the web and return everything you find about ACME Corp.",
    expected_output="A full dump of all findings.",
    agent=researcher
)

task_2 = Task(
    description="Use the research output below and write a report:\n\n{task_1_output}",
    expected_output="A concise report.",
    agent=writer
)

crew = Crew(agents=[researcher, writer], tasks=[task_1, task_2])
result = crew.kickoff()
# FIXED
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Collect key facts only",
    backstory="You extract signal from noise."
)

writer = Agent(
    role="Writer",
    goal="Write a concise report",
    backstory="You turn facts into summaries."
)

task_1 = Task(
    description=(
        "Search the web for ACME Corp and return only:\n"
        "- company overview\n"
        "- revenue\n"
        "- recent news\n"
        "- 5 bullet insights max"
    ),
    expected_output="A structured summary under 500 words.",
    agent=researcher
)

task_2 = Task(
    description=(
        "Use only the summarized research output. "
        "Do not include raw source text."
    ),
    expected_output="A concise report under 300 words.",
    agent=writer
)

crew = Crew(agents=[researcher, writer], tasks=[task_1, task_2])
result = crew.kickoff()

If you need raw data for auditing, store it in S3, a database, or local files. Don’t inject it into every downstream prompt.

Other Possible Causes

1) Tool output is too large

A common trap is returning an entire API response or dataframe from a tool.

# BAD: returns huge payload
@tool("fetch_customer_history")
def fetch_customer_history(customer_id: str) -> str:
    return requests.get(f"https://api.example.com/customers/{customer_id}/history").text
# BETTER: return only what the agent needs
@tool("fetch_customer_history")
def fetch_customer_history(customer_id: str) -> str:
    data = requests.get(f"https://api.example.com/customers/{customer_id}/history").json()
    recent = data["events"][:10]
    return json.dumps({"recent_events": recent})

2) Memory is turned on for long-running crews

CrewAI memory can be useful, but if every turn gets appended forever, you’ll eventually hit token limits.

crew = Crew(
    agents=[researcher, writer],
    tasks=[task_1, task_2],
    memory=True  # can grow too large in long runs
)

If you don’t need persistent conversational state, disable it. If you do need it, summarize older turns before reusing them.

3) Prompts include too much static context

People often paste policies, schemas, logs, and examples into every Task.description.

task = Task(
    description=f"""
Company policy:
{very_large_policy_text}

Customer history:
{huge_customer_blob}

Now answer the user question.
""",
    agent=agent
)

Move large reference material out of the prompt. Use retrieval, file lookup, or precomputed summaries.

4) Model context window is smaller than your workload

Sometimes the code is fine and the model choice is wrong. A smaller model will fail sooner with messages like:

  • Context length exceeded
  • This model's maximum context length is ...
  • openai.BadRequestError: This model's maximum context length...

Fix by switching to a larger-context model or reducing prompt size.

llm = LLM(model="gpt-4o-mini")  # may be too small for your workload
llm = LLM(model="gpt-4o")  # larger context window for heavier tasks

How to Debug It

  1. Find which task fails

    • Wrap crew.kickoff() in logging.
    • Check whether it dies on Task 1, Task 2, or during tool execution.
    • In many cases you’ll see an exception like openai.BadRequestError bubbling up from a specific agent call.
  2. Print token-heavy inputs before execution

    • Log Task.description, tool outputs, and any memory state.
    • Look for giant strings, repeated content, or nested JSON blobs.
    • If one field is several thousand words long, that’s your culprit.
  3. Disable features one at a time

    • Turn off memory first.
    • Replace tools with stubbed responses.
    • Remove previous task references like {task_1_output}.
    • Re-run until the error disappears.
  4. Measure prompt size explicitly

    • Count characters as a rough proxy.
    • Better: estimate tokens with your tokenizer library before sending.
    • If your combined prompt is close to model limits, trim aggressively.

Prevention

  • Keep task outputs structured and short:

    • bullets instead of essays
    • summaries instead of raw dumps
    • top-N results instead of full lists
  • Treat tool output as untrusted prompt input:

    • truncate long responses
    • filter fields before returning them to CrewAI
    • store raw artifacts outside the LLM loop
  • Set hard boundaries in prompts:

    • “max 200 words”
    • “return only JSON with these fields”
    • “do not repeat source text”

If you’re seeing context length exceeded during development in CrewAI Python code, start by shrinking what flows between tasks. In most cases, that fixes it faster than changing models or rewriting the whole crew.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides