How to Fix 'OOM error during inference in production' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

oom-error-during-inference-in-productioncrewaitypescript

OOM in CrewAI means your process ran out of memory while the agent was building context, running tools, or holding too much state during inference. In production, this usually shows up under real traffic when an agent loop grows too large, a tool returns massive payloads, or you run too many concurrent tasks on a small container.

The important part: this is usually not an LLM problem. It’s almost always a memory management problem in your agent pipeline.

The Most Common Cause

The #1 cause is unbounded context growth inside CrewAgentExecutor or AgentExecutor. In TypeScript, this often happens when you keep appending full tool outputs, chat history, and intermediate reasoning into every turn.

Here’s the broken pattern:

import { Agent, Task, Crew } from "crewai";

const agent = new Agent({
  role: "Claims analyst",
  goal: "Summarize claim documents",
  backstory: "You work for an insurance ops team.",
});

const task = new Task({
  description: "Analyze all uploaded claim notes and produce a summary.",
  agent,
});

const crew = new Crew({
  agents: [agent],
  tasks: [task],
});

// ❌ Broken: huge document blob gets injected directly into the prompt/context
const result = await crew.kickoff({
  documents: largeClaimBundle, // multi-MB JSON/PDF text
});

And here’s the fixed pattern:

import { Agent, Task, Crew } from "crewai";

const agent = new Agent({
  role: "Claims analyst",
  goal: "Summarize claim documents",
  backstory: "You work for an insurance ops team.",
});

const task = new Task({
  description: "Analyze claim notes and produce a summary using only retrieved excerpts.",
  agent,
});

const crew = new Crew({
  agents: [agent],
  tasks: [task],
});

// ✅ Fixed: pass only a small slice of relevant data
const result = await crew.kickoff({
  documents: extractedRelevantChunks.slice(0, 5),
});

If you are using memory-enabled agents, the same issue appears when conversation history is never trimmed.

Broken	Fixed
`messages.push(...fullToolOutput)`	Keep only the last N turns
`context += entireDocumentText`	Chunk + retrieve top-k passages
`memory.store(allResponses)`	Summarize older turns

A production-safe rule: never feed raw PDFs, full JSON payloads, or entire ticket histories directly into an agent loop.

Other Possible Causes

1) Tool output is too large

A search tool or database tool may return thousands of rows, then CrewAI tries to stuff that output into the next LLM call.

// ❌ Broken
const rows = await db.query("SELECT * FROM claims");
return JSON.stringify(rows);

// ✅ Fixed
const rows = await db.query("SELECT id, status, amount FROM claims LIMIT 20");
return JSON.stringify(rows);

If you need more data, paginate it and retrieve only what the agent actually needs.

2) Too much concurrency in worker processes

If you run multiple crews in parallel on a small pod, memory spikes fast. This is common with Node.js services behind queue workers.

// ❌ Broken
await Promise.all(jobs.map((job) => runCrew(job)));

// ✅ Fixed
for (const job of jobs) {
  await runCrew(job);
}

If you must parallelize, cap concurrency with a limiter.

import pLimit from "p-limit";

const limit = pLimit(2);
await Promise.all(jobs.map((job) => limit(() => runCrew(job))));

3) Long-running agent loops without stop conditions

An agent that keeps retrying tool calls or self-correcting can build huge internal state before crashing with something like:

•JavaScript heap out of memory
•FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
•OOMKilled in Kubernetes events

Make sure your task has hard limits.

const task = new Task({
  description: "Try up to 3 times to extract policy fields.",
  agent,
});

Also set runtime limits at the orchestration layer:

•max iterations
•max tool calls
•max response tokens

4) Container memory limit is too low

Sometimes the code is fine and the pod is just undersized. A Node.js process can OOM even with moderate prompts if your container limit is too aggressive.

resources:
  requests:
    memory: "512Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

For inference workloads with document processing, 512Mi is often not enough. Increase it and watch whether failures disappear under load.

How to Debug It

•
Check whether the crash is Node-level or Kubernetes-level
- •
  Node crash looks like:
  - •FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
- •
  Kubernetes crash looks like:
  - •OOMKilled
  - •exit code 137
•
Log payload sizes before every CrewAI call
- •Measure prompt size, tool output size, and message count.
- •If one request has a much larger payload than others, that’s your culprit.

console.log({
  messages: messages.length,
  promptBytes: Buffer.byteLength(prompt, "utf8"),
});

•
Disable one feature at a time
- •Turn off memory.
- •Replace tools with stubs.
- •Reduce concurrency to 1.
- •Remove document attachments.
The feature that makes OOM disappear is where the leak lives.
•
Watch RSS during execution
- •Use process metrics from Prometheus or simple runtime logging.
- •If memory climbs steadily per step inside CrewAgentExecutor, you are accumulating state instead of replacing it.

Prevention

•
Keep prompts small and deterministic.
- •Chunk documents.
- •Retrieve top-k matches.
- •Summarize old turns before reusing them.
•
Put hard caps on everything.
- •Max tool output size.
- •Max iterations.
- •Max concurrent crews per pod.
•
Treat agent I/O like API contracts.
- •Return structured fields only.
- •Never dump whole tables or raw logs back into context.

If you are seeing OOM in production with CrewAI TypeScript, start by inspecting what gets added to context on every turn. In most cases, fixing that one path removes the crash without touching the model at all.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit