How to Fix 'context length exceeded' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceededcrewaipython

If you hit context length exceeded in CrewAI, the model is receiving more tokens than the context window allows. In practice, this shows up when agents keep accumulating task history, tool outputs, or large documents until the LLM refuses the request.

This usually happens in longer multi-agent runs, especially when you pass full files, verbose tool output, or a growing conversation state into every step.

The Most Common Cause

The #1 cause is stuffing too much text into a single agent/task prompt. In CrewAI, this often happens when you pass an entire document, logs dump, or previous agent output directly into Task(description=...) or Agent(goal=...).

Here’s the broken pattern:

BrokenFixed
Passes huge raw text into the taskPreprocesses and chunks the input
Reuses full history every runKeeps only relevant context
Lets tool output grow unboundedSummarizes before passing forward
# BROKEN
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Analyze this document",
    backstory="You are a careful analyst."
)

long_text = open("policy_document.txt").read()

task = Task(
    description=f"""
    Analyze this document and extract all risks:
    {long_text}
    """,
    agent=researcher
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()
# FIXED
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Analyze a bounded excerpt of the document",
    backstory="You are a careful analyst."
)

def chunk_text(text: str, chunk_size: int = 4000):
    for i in range(0, len(text), chunk_size):
        yield text[i:i + chunk_size]

doc = open("policy_document.txt").read()
first_chunk = next(chunk_text(doc))

task = Task(
    description=f"""
    Analyze this excerpt and extract risks:
    {first_chunk}
    
    If more context is needed, request the next chunk.
    """,
    agent=researcher
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()

If you see errors like:

  • litellm.BadRequestError: context_length_exceeded
  • openai.BadRequestError: This model's maximum context length is ...
  • BadRequestError: input too long

then your prompt payload is too large before the model even starts reasoning.

Other Possible Causes

1) Tool output is too verbose

A common failure mode is a tool returning pages of JSON or logs that get appended to the next LLM call.

# Problematic tool output
def fetch_records():
    return huge_json_blob  # thousands of lines

Fix it by trimming at the tool boundary:

def fetch_records():
    data = huge_json_blob
    return {
        "count": len(data),
        "sample": data[:5],
        "summary": "Use pagination for more results"
    }

2) Memory is retaining too much conversation

If you enable memory or keep passing prior messages between tasks, the prompt grows on every step.

# Example pattern that can balloon context
agent = Agent(
    role="Support Analyst",
    goal="Resolve customer issue",
    memory=True
)

If your workflow doesn’t need long-term memory, disable it. If it does, store summaries instead of raw transcripts.

agent = Agent(
    role="Support Analyst",
    goal="Resolve customer issue",
    memory=False
)

3) You’re using a small-context model

Some models have tight limits. If you send long inputs to something with an 8k or 16k window, you’ll hit the wall quickly.

llm_config = {
    "model": "gpt-4o-mini"  # may still fail if your prompt is huge
}

Switch to a larger-context model when the workflow genuinely needs it:

llm_config = {
    "model": "gpt-4.1"
}

4) Task chaining duplicates context

In multi-agent pipelines, one task’s full output often becomes another task’s input. If each stage adds more text instead of reducing it, context explodes.

analysis_task = Task(...)
review_task = Task(
    description=f"Review this analysis:\n{analysis_result}",
    agent=reviewer
)

Make the intermediate step produce structured summaries instead:

review_task = Task(
    description="""
    Review the analysis summary below.
    Return only:
    - issues_found
    - confidence_score
    - recommended_next_step
    
    Summary:
    {analysis_summary}
    """,
    agent=reviewer
)

How to Debug It

  1. Print the actual payload size

    • Log lengths of task descriptions, tool outputs, and memory content before kickoff.
    • If one string is massive, that’s your culprit.
  2. Isolate the failing step

    • Run each Task independently.
    • In CrewAI workflows, the failure often happens at one specific handoff where output was duplicated.
  3. Disable memory and tools temporarily

    • Set memory=False.
    • Remove tools from Agent(tools=[...]).
    • If the error disappears, reintroduce components one by one.
  4. Check your model limit

    • Confirm which provider/model is actually being used.
    • Compare estimated tokens against that model’s max context window.

A quick token estimate helper helps:

def rough_tokens(text: str) -> int:
    return len(text) // 4  # crude but useful for debugging

print("task tokens:", rough_tokens(task.description))
print("tool tokens:", rough_tokens(tool_output))

Prevention

  • Keep agent prompts short and task-specific.
  • Summarize tool output before passing it to another agent.
  • Use chunking for documents, transcripts, and logs instead of dumping raw text into one task.
  • Add token-size checks in your pipeline before calling crew.kickoff().
  • Prefer structured outputs like JSON with fixed fields over free-form prose when chaining tasks.

If you’re building production CrewAI workflows in Python, treat context as a budget. Once you stop feeding agents unbounded text, this error usually disappears fast.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides