How to Fix 'OOM error during inference' in CrewAI (Python)
What the error means
OOM error during inference means your process ran out of memory while an LLM call was being executed. In CrewAI, this usually shows up when an agent, tool, or task pushes too much context into the model, or when you try to run too many heavy inference jobs at once.
You’ll typically see it during long task chains, large document processing, or when multiple agents share a bloated prompt history.
The Most Common Cause
The #1 cause is oversized context being sent into the model. In CrewAI, this often happens when you keep appending full tool outputs, full documents, or long chat history into Task.description, Agent.goal, or memory-backed conversations.
Here’s the broken pattern and the fixed pattern side by side:
| Broken | Fixed |
|---|---|
| Sends raw document text into every task | Passes only the relevant chunk |
| Keeps accumulating history in memory | Truncates or summarizes state |
| Reuses huge outputs as prompts | Stores outputs externally and references them |
# BROKEN
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0)
researcher = Agent(
role="Researcher",
goal="Analyze the following report in full detail: " + open("claims_report.txt").read(),
backstory="You are a senior claims analyst.",
llm=llm,
)
task = Task(
description="Read this entire file and extract all risks: " + open("claims_report.txt").read(),
agent=researcher,
)
crew = Crew(agents=[researcher], tasks=[task])
crew.kickoff()
# FIXED
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
def load_chunk(path: str, start: int = 0, size: int = 4000) -> str:
with open(path, "r", encoding="utf-8") as f:
f.seek(start)
return f.read(size)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chunk = load_chunk("claims_report.txt")
researcher = Agent(
role="Researcher",
goal="Extract risks from the provided excerpt only.",
backstory="You are a senior claims analyst.",
llm=llm,
)
task = Task(
description=f"Analyze this excerpt and return only high-signal risks:\n\n{chunk}",
agent=researcher,
)
crew = Crew(agents=[researcher], tasks=[task])
crew.kickoff()
The mistake is not “using CrewAI wrong.” It’s feeding the model more tokens than your runtime can handle. On smaller machines, even gpt-4o class models can trigger OOM error during inference if the prompt is massive enough.
Other Possible Causes
1) Running too many agents in parallel
If you fan out multiple agents or tasks at once, each inference call needs its own memory footprint.
# Problematic parallelism
crew = Crew(
agents=[agent1, agent2, agent3],
tasks=[task1, task2, task3],
process="parallel"
)
Fix it by running sequentially when memory is tight:
crew = Crew(
agents=[agent1, agent2, agent3],
tasks=[task1, task2, task3],
process="sequential"
)
2) Using a large local model on limited RAM/VRAM
If you’re using a local backend through Ollama, vLLM, llama.cpp, or Transformers under the hood, the model itself may be too large for your machine.
from langchain_community.llms import Ollama
llm = Ollama(model="llama3:70b") # likely to OOM on modest hardware
Use a smaller model:
llm = Ollama(model="llama3:8b")
3) Long-running memory accumulation across tasks
CrewAI memory can help with continuity, but it can also grow without bound if you keep storing every intermediate result.
crew = Crew(
agents=[agent],
tasks=[task1, task2, task3],
memory=True
)
If you don’t need persistent memory across steps, disable it:
crew = Crew(
agents=[agent],
tasks=[task1, task2, task3],
memory=False
)
Or summarize state before passing it forward:
summary_task = Task(
description="Summarize prior findings in under 200 words.",
agent=agent,
)
4) Oversized tool outputs being injected back into prompts
A common trap is returning huge JSON blobs from tools and then feeding them straight into another agent.
@tool("fetch_claims")
def fetch_claims():
return open("all_claims.json").read() # huge payload
Return a compact result instead:
@tool("fetch_claims")
def fetch_claims():
data = open("all_claims.json").read()
return data[:5000] # or better: filter before returning
Better still: store large results in S3/database and pass a pointer:
return {"artifact_id": "claims_2024_q1", "location": "s3://bucket/claims_2024_q1.json"}
How to Debug It
- •
Check whether the prompt is exploding
- •Log
Task.description, tool outputs, and any memory payload being passed between tasks. - •If you see multi-page text dumps or giant JSON objects, that’s your first suspect.
- •Log
- •
Reduce to one agent and one small task
- •Run a minimal crew with a tiny input.
- •If the error disappears, add complexity back one piece at a time until it returns.
- •
Switch to a smaller model
- •Move from
gpt-4otogpt-4o-mini, or from a local 70B model to an 8B model. - •If OOM disappears immediately, your issue is capacity-related rather than logic-related.
- •Move from
- •
Disable memory and parallel execution
- •Set
memory=False. - •Use sequential processing.
- •If that fixes it, your problem is accumulation or concurrency pressure.
- •Set
A good diagnostic workflow looks like this:
crew = Crew(
agents=[agent],
tasks=[small_task],
memory=False,
process="sequential"
)
result = crew.kickoff()
print(result)
If that works but your real pipeline fails, reintroduce one variable at a time:
- •bigger input
- •memory enabled
- •multiple tasks
- •parallel execution
- •larger model
Prevention
- •Keep task inputs small and specific. Pass excerpts, IDs, summaries, or retrieved chunks instead of full documents.
- •Prefer sequential execution unless you’ve measured that your runtime can handle parallel inference safely.
- •Put hard limits on tool output size and summarize before handing results to another agent.
- •Match model size to hardware. A local 70B model on a laptop is not an optimization; it’s an OOM ticket.
If you’re seeing OOM error during inference in CrewAI Python code right now, start by shrinking context. In practice that fixes most cases faster than changing frameworks or rewriting agents.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit