How to Fix 'context length exceeded during development' in CrewAI (Python)
What the error means
context length exceeded during development usually means CrewAI sent too much text to the model in one call. In practice, this happens when task outputs, tool results, chat history, or agent memory keep growing until the LLM’s token limit is hit.
You’ll see this most often in multi-step crews where agents pass large outputs to each other, or when a tool returns a huge payload and CrewAI blindly includes it in the next prompt.
The Most Common Cause
The #1 cause is passing full, untrimmed task output into the next task or agent context.
CrewAI doesn’t care that your data is “just JSON” or “just logs”. If you stuff 20 KB of raw text into description, expected_output, memory, or tool output, the next LLM call can blow past the model’s context window.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Passes full verbose output to the next task | Summarizes or extracts only what the next step needs |
| Uses raw tool output directly | Truncates, filters, or stores raw data outside the prompt |
| Lets memory accumulate everything | Keeps memory scoped and minimal |
# BROKEN
from crewai import Agent, Task, Crew
researcher = Agent(
role="Researcher",
goal="Collect all relevant data",
backstory="You are thorough."
)
writer = Agent(
role="Writer",
goal="Write a concise report",
backstory="You turn research into clean summaries."
)
task_1 = Task(
description="Search the web and return everything you find about ACME Corp.",
expected_output="A full dump of all findings.",
agent=researcher
)
task_2 = Task(
description="Use the research output below and write a report:\n\n{task_1_output}",
expected_output="A concise report.",
agent=writer
)
crew = Crew(agents=[researcher, writer], tasks=[task_1, task_2])
result = crew.kickoff()
# FIXED
from crewai import Agent, Task, Crew
researcher = Agent(
role="Researcher",
goal="Collect key facts only",
backstory="You extract signal from noise."
)
writer = Agent(
role="Writer",
goal="Write a concise report",
backstory="You turn facts into summaries."
)
task_1 = Task(
description=(
"Search the web for ACME Corp and return only:\n"
"- company overview\n"
"- revenue\n"
"- recent news\n"
"- 5 bullet insights max"
),
expected_output="A structured summary under 500 words.",
agent=researcher
)
task_2 = Task(
description=(
"Use only the summarized research output. "
"Do not include raw source text."
),
expected_output="A concise report under 300 words.",
agent=writer
)
crew = Crew(agents=[researcher, writer], tasks=[task_1, task_2])
result = crew.kickoff()
If you need raw data for auditing, store it in S3, a database, or local files. Don’t inject it into every downstream prompt.
Other Possible Causes
1) Tool output is too large
A common trap is returning an entire API response or dataframe from a tool.
# BAD: returns huge payload
@tool("fetch_customer_history")
def fetch_customer_history(customer_id: str) -> str:
return requests.get(f"https://api.example.com/customers/{customer_id}/history").text
# BETTER: return only what the agent needs
@tool("fetch_customer_history")
def fetch_customer_history(customer_id: str) -> str:
data = requests.get(f"https://api.example.com/customers/{customer_id}/history").json()
recent = data["events"][:10]
return json.dumps({"recent_events": recent})
2) Memory is turned on for long-running crews
CrewAI memory can be useful, but if every turn gets appended forever, you’ll eventually hit token limits.
crew = Crew(
agents=[researcher, writer],
tasks=[task_1, task_2],
memory=True # can grow too large in long runs
)
If you don’t need persistent conversational state, disable it. If you do need it, summarize older turns before reusing them.
3) Prompts include too much static context
People often paste policies, schemas, logs, and examples into every Task.description.
task = Task(
description=f"""
Company policy:
{very_large_policy_text}
Customer history:
{huge_customer_blob}
Now answer the user question.
""",
agent=agent
)
Move large reference material out of the prompt. Use retrieval, file lookup, or precomputed summaries.
4) Model context window is smaller than your workload
Sometimes the code is fine and the model choice is wrong. A smaller model will fail sooner with messages like:
- •
Context length exceeded - •
This model's maximum context length is ... - •
openai.BadRequestError: This model's maximum context length...
Fix by switching to a larger-context model or reducing prompt size.
llm = LLM(model="gpt-4o-mini") # may be too small for your workload
llm = LLM(model="gpt-4o") # larger context window for heavier tasks
How to Debug It
- •
Find which task fails
- •Wrap
crew.kickoff()in logging. - •Check whether it dies on
Task 1,Task 2, or during tool execution. - •In many cases you’ll see an exception like
openai.BadRequestErrorbubbling up from a specific agent call.
- •Wrap
- •
Print token-heavy inputs before execution
- •Log
Task.description, tool outputs, and any memory state. - •Look for giant strings, repeated content, or nested JSON blobs.
- •If one field is several thousand words long, that’s your culprit.
- •Log
- •
Disable features one at a time
- •Turn off memory first.
- •Replace tools with stubbed responses.
- •Remove previous task references like
{task_1_output}. - •Re-run until the error disappears.
- •
Measure prompt size explicitly
- •Count characters as a rough proxy.
- •Better: estimate tokens with your tokenizer library before sending.
- •If your combined prompt is close to model limits, trim aggressively.
Prevention
- •
Keep task outputs structured and short:
- •bullets instead of essays
- •summaries instead of raw dumps
- •top-N results instead of full lists
- •
Treat tool output as untrusted prompt input:
- •truncate long responses
- •filter fields before returning them to CrewAI
- •store raw artifacts outside the LLM loop
- •
Set hard boundaries in prompts:
- •“max 200 words”
- •“return only JSON with these fields”
- •“do not repeat source text”
If you’re seeing context length exceeded during development in CrewAI Python code, start by shrinking what flows between tasks. In most cases, that fixes it faster than changing models or rewriting the whole crew.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit