How to Fix 'OOM error during inference' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

oom-error-during-inferencecrewaitypescript

When CrewAI throws OOM error during inference, it means the model process ran out of memory while generating a response. In TypeScript projects, this usually shows up during long agent runs, large context windows, or when you accidentally keep feeding the model more history than it can handle.

In practice, this is rarely a “CrewAI bug.” It’s usually a prompt size problem, a runaway loop, or an oversized local model config.

The Most Common Cause

The #1 cause is unbounded conversation growth. You keep appending every tool result, every agent message, and every intermediate step into the next inference call until the model runtime blows up.

Here’s the broken pattern:

// Broken: keeps growing the same messages array forever
import { Agent } from "crewai";

const agent = new Agent({
  role: "Claims analyst",
  goal: "Summarize claim documents",
  backstory: "You analyze insurance claims.",
  llm: {
    provider: "ollama",
    model: "llama3.1",
  },
});

const messages: any[] = [];

for (const chunk of documentChunks) {
  messages.push({
    role: "user",
    content: `Analyze this chunk:\n${chunk}`,
  });

  const result = await agent.execute(messages);
  messages.push({
    role: "assistant",
    content: result,
  });
}

And here’s the fixed pattern:

// Fixed: keep each inference bounded and summarize state explicitly
import { Agent } from "crewai";

const agent = new Agent({
  role: "Claims analyst",
  goal: "Summarize claim documents",
  backstory: "You analyze insurance claims.",
  llm: {
    provider: "ollama",
    model: "llama3.1",
  },
});

let runningSummary = "";

for (const chunk of documentChunks) {
  const prompt = `
Current summary:
${runningSummary}

New chunk:
${chunk}

Update the summary in under 200 words.
`;

  const result = await agent.execute([
    { role: "user", content: prompt },
  ]);

  runningSummary = result;
}

The difference is simple:

•Broken code grows context on every loop
•Fixed code replaces raw history with a compact working summary

If you’re using Crew, Task, or an agent executor wrapper, the same rule applies. Do not keep re-sending full transcripts unless you actually need them.

Other Possible Causes

Oversized local model settings

If you’re running a local model through Ollama, LM Studio, or a similar backend, your context window may be too large for available RAM.

// Problematic for low-memory machines
llm: {
  provider: "ollama",
  model: "llama3.1",
  options: {
    num_ctx: 32768,
    temperature: 0.2,
  },
}

Try lowering context first:

llm: {
  provider: "ollama",
  model: "llama3.1",
  options: {
    num_ctx: 4096,
    temperature: 0.2,
  },
}

Tool outputs are too large

A common failure mode in CrewAI is returning full PDFs, HTML pages, or giant JSON blobs from tools and feeding them straight back into inference.

// Bad tool output shape
return {
  rawHtml,
  extractedText,
  allHeaders,
};

Return only what the agent needs:

// Better tool output shape
return {
  title,
  summary,
  keyFields,
};

If you need full artifacts, store them outside the prompt and pass references instead.

Recursive task loops

If an agent keeps retrying the same task or calling another agent without a stop condition, memory usage climbs until inference fails with something like:

•OOM error during inference
•CUDA out of memory
•llama runner process exited with code ...

Example of a bad retry loop:

while (!done) {
  const result = await agent.execute([{ role: "user", content: prompt }]);
}

Add hard limits:

for (let attempt = 0; attempt < 3; attempt++) {
  const result = await agent.execute([{ role: "user", content: prompt }]);
}

Too many agents sharing one huge context

In multi-agent flows, each handoff can accumulate state if you pass full histories between agents.

Pattern	Risk
Pass full transcript to every agent	High memory growth
Pass only task-specific summary	Stable
Persist raw data externally and reference it	Best

Keep handoff payloads small. Give each agent only what it needs to finish its task.

How to Debug It

•
Check whether the failure happens on first call or after several iterations
- •First-call failure usually means your prompt or model settings are too large.
- •Failure after repeated steps usually means context accumulation or recursion.
•
Log prompt size before each inference
```
console.log("Prompt chars:", prompt.length);
console.log("Messages:", messages.length);
```
If these numbers keep climbing across iterations, you found your problem.
•
Reduce everything to a minimal run
- •One agent
- •One task
- •One short input
- •No tools
If that works, add pieces back one at a time until it breaks.
•
Inspect your backend logs Look for related errors:
- •CUDA out of memory
- •Killed
- •exit code 137
- •model runner terminated
Those usually mean the crash is below CrewAI, inside the model server or container.

Prevention

•
Keep prompts bounded.
- •Summarize state instead of replaying entire histories.
•
Cap retries and loops.
- •Every task should have a max attempt count and an exit condition.
•
Trim tool outputs aggressively.
- •Return summaries, IDs, and references — not raw dumps.
•Match model size to hardware.

Hardware	Safer starting point
Laptop / small VM	Smaller model + `num_ctx` around `4096`
Medium server	Moderate context + controlled tool output
GPU box with plenty of VRAM	Larger context, but still bounded

If you’re seeing OOM error during inference in CrewAI TypeScript, start by looking at message growth first. In real projects that’s the culprit most of the time, and fixing it usually removes the error without changing your architecture.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit