How to Fix 'OOM error during inference' in CrewAI (TypeScript)
When CrewAI throws OOM error during inference, it means the model process ran out of memory while generating a response. In TypeScript projects, this usually shows up during long agent runs, large context windows, or when you accidentally keep feeding the model more history than it can handle.
In practice, this is rarely a “CrewAI bug.” It’s usually a prompt size problem, a runaway loop, or an oversized local model config.
The Most Common Cause
The #1 cause is unbounded conversation growth. You keep appending every tool result, every agent message, and every intermediate step into the next inference call until the model runtime blows up.
Here’s the broken pattern:
// Broken: keeps growing the same messages array forever
import { Agent } from "crewai";
const agent = new Agent({
role: "Claims analyst",
goal: "Summarize claim documents",
backstory: "You analyze insurance claims.",
llm: {
provider: "ollama",
model: "llama3.1",
},
});
const messages: any[] = [];
for (const chunk of documentChunks) {
messages.push({
role: "user",
content: `Analyze this chunk:\n${chunk}`,
});
const result = await agent.execute(messages);
messages.push({
role: "assistant",
content: result,
});
}
And here’s the fixed pattern:
// Fixed: keep each inference bounded and summarize state explicitly
import { Agent } from "crewai";
const agent = new Agent({
role: "Claims analyst",
goal: "Summarize claim documents",
backstory: "You analyze insurance claims.",
llm: {
provider: "ollama",
model: "llama3.1",
},
});
let runningSummary = "";
for (const chunk of documentChunks) {
const prompt = `
Current summary:
${runningSummary}
New chunk:
${chunk}
Update the summary in under 200 words.
`;
const result = await agent.execute([
{ role: "user", content: prompt },
]);
runningSummary = result;
}
The difference is simple:
- •Broken code grows context on every loop
- •Fixed code replaces raw history with a compact working summary
If you’re using Crew, Task, or an agent executor wrapper, the same rule applies. Do not keep re-sending full transcripts unless you actually need them.
Other Possible Causes
Oversized local model settings
If you’re running a local model through Ollama, LM Studio, or a similar backend, your context window may be too large for available RAM.
// Problematic for low-memory machines
llm: {
provider: "ollama",
model: "llama3.1",
options: {
num_ctx: 32768,
temperature: 0.2,
},
}
Try lowering context first:
llm: {
provider: "ollama",
model: "llama3.1",
options: {
num_ctx: 4096,
temperature: 0.2,
},
}
Tool outputs are too large
A common failure mode in CrewAI is returning full PDFs, HTML pages, or giant JSON blobs from tools and feeding them straight back into inference.
// Bad tool output shape
return {
rawHtml,
extractedText,
allHeaders,
};
Return only what the agent needs:
// Better tool output shape
return {
title,
summary,
keyFields,
};
If you need full artifacts, store them outside the prompt and pass references instead.
Recursive task loops
If an agent keeps retrying the same task or calling another agent without a stop condition, memory usage climbs until inference fails with something like:
- •
OOM error during inference - •
CUDA out of memory - •
llama runner process exited with code ...
Example of a bad retry loop:
while (!done) {
const result = await agent.execute([{ role: "user", content: prompt }]);
}
Add hard limits:
for (let attempt = 0; attempt < 3; attempt++) {
const result = await agent.execute([{ role: "user", content: prompt }]);
}
Too many agents sharing one huge context
In multi-agent flows, each handoff can accumulate state if you pass full histories between agents.
| Pattern | Risk |
|---|---|
| Pass full transcript to every agent | High memory growth |
| Pass only task-specific summary | Stable |
| Persist raw data externally and reference it | Best |
Keep handoff payloads small. Give each agent only what it needs to finish its task.
How to Debug It
- •
Check whether the failure happens on first call or after several iterations
- •First-call failure usually means your prompt or model settings are too large.
- •Failure after repeated steps usually means context accumulation or recursion.
- •
Log prompt size before each inference
console.log("Prompt chars:", prompt.length); console.log("Messages:", messages.length);If these numbers keep climbing across iterations, you found your problem.
- •
Reduce everything to a minimal run
- •One agent
- •One task
- •One short input
- •No tools
If that works, add pieces back one at a time until it breaks.
- •
Inspect your backend logs Look for related errors:
- •
CUDA out of memory - •
Killed - •
exit code 137 - •
model runner terminated
Those usually mean the crash is below CrewAI, inside the model server or container.
- •
Prevention
- •Keep prompts bounded.
- •Summarize state instead of replaying entire histories.
- •Cap retries and loops.
- •Every task should have a max attempt count and an exit condition.
- •Trim tool outputs aggressively.
- •Return summaries, IDs, and references — not raw dumps.
- •Match model size to hardware.
| Hardware | Safer starting point |
|---|---|
| Laptop / small VM | Smaller model + num_ctx around 4096 |
| Medium server | Moderate context + controlled tool output |
| GPU box with plenty of VRAM | Larger context, but still bounded |
If you’re seeing OOM error during inference in CrewAI TypeScript, start by looking at message growth first. In real projects that’s the culprit most of the time, and fixing it usually removes the error without changing your architecture.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit