How to Fix 'OOM error during inference during development' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

oom-error-during-inference-during-developmentcrewaitypescript

When you see OOM error during inference during development in CrewAI TypeScript, it usually means your process is exhausting memory while the model is generating a response. In practice, this shows up during local development when you run long agent loops, pass huge prompts, or keep too much conversation state in memory.

The key thing: this is usually not a “CrewAI is broken” problem. It’s almost always a prompt-size, context-growth, or runaway-loop problem.

The Most Common Cause

The #1 cause is uncontrolled context growth inside an agent loop.

In CrewAI TypeScript, developers often keep appending full task outputs back into the next prompt. That works for one or two iterations, then memory spikes and inference blows up with errors like:

•OOM error during inference during development
•JavaScript heap out of memory
•CrewAIError: Inference failed due to insufficient memory

Here’s the broken pattern versus the fixed pattern.

Broken	Fixed
Reuses full history on every iteration	Trims history to only what the model needs
Appends raw tool output back into prompt	Summarizes or extracts only relevant fields
Lets agent loop run indefinitely	Caps iterations and output size

// ❌ Broken: context grows every iteration
import { Agent } from "crewai";

const agent = new Agent({
  name: "ClaimsAgent",
  role: "Claims triage",
  goal: "Classify claims documents",
  instructions: [
    "Read the document and classify it.",
    "Use previous outputs as context.",
  ],
});

let context = "";

for (const chunk of largeDocumentChunks) {
  const result = await agent.run(`
    Previous context:
    ${context}

    New chunk:
    ${chunk}
  `);

  // This keeps growing without bound
  context += `\n${result.output}`;
}

// ✅ Fixed: keep only bounded state
import { Agent } from "crewai";

const agent = new Agent({
  name: "ClaimsAgent",
  role: "Claims triage",
  goal: "Classify claims documents",
  instructions: [
    "Read the document and classify it.",
    "Return only JSON with classification and reason.",
  ],
});

let summary = "";

for (const chunk of largeDocumentChunks.slice(0, 10)) {
  const result = await agent.run(`
    Current summary:
    ${summary}

    New chunk:
    ${chunk}
  `);

  // Replace with a short summary, not full output
  summary = JSON.stringify({
    classification: result.output.classification,
    reason: result.output.reason,
  });
}

If you are passing entire PDFs, chat logs, or tool traces into every call, this is almost certainly your issue.

Other Possible Causes

1) Huge tool outputs being injected into the prompt

A common mistake is returning massive JSON from a tool and feeding it directly into the next agent step.

// Bad: returns everything
const searchToolResult = await searchTool.run(query);
await agent.run(`Analyze this data:\n${JSON.stringify(searchToolResult)}`);

Fix it by slicing to only what matters:

const searchToolResult = await searchTool.run(query);

await agent.run(`
Analyze these top matches only:
${JSON.stringify(searchToolResult.items.slice(0, 5))}
`);

2) Too many concurrent inference calls

If you fan out dozens of agents at once, local dev machines can run out of RAM fast.

// Bad: unbounded concurrency
await Promise.all(tasks.map((task) => crew.execute(task)));

Use a concurrency limit:

import pLimit from "p-limit";

const limit = pLimit(2);

await Promise.all(
  tasks.map((task) => limit(() => crew.execute(task)))
);

3) Model/context window too large for your machine

Some models are expensive even before they hit token limits. If you are using a larger local model or high-context configuration, memory usage climbs quickly.

const agent = new Agent({
  name: "Analyst",
  model: "gpt-4.1", // may be fine remotely, heavy locally depending on setup
});

Try a smaller model or lower context settings in dev:

const agent = new Agent({
  name: "Analyst",
  model: "gpt-4o-mini",
});

4) Recursive task delegation without stop conditions

If an agent keeps delegating to itself or another agent without a hard cap, memory usage grows until inference fails.

// Bad: no stop condition
while (true) {
  const result = await agent.run(prompt);
  prompt += result.output;
}

Add explicit bounds:

for (let i = 0; i < 3; i++) {
  const result = await agent.run(prompt);
  prompt = buildNextPrompt(result.output);
}

How to Debug It

•
Check whether prompt size grows every iteration
- •Log the length of every prompt before calling agent.run().
- •If it keeps increasing, you have context bloat.
•
Inspect tool outputs
- •Print the raw output from tools.
- •If one tool returns megabytes of data, truncate or summarize it before passing it to CrewAI.
•
Disable concurrency
- •Run one task at a time.
- •If the OOM disappears, your issue is parallel execution pressure rather than prompt size.
•
Swap to a smaller model
- •Test with a lighter model like gpt-4o-mini.
- •If the problem goes away, your current model/context settings are too heavy for your dev environment.

Prevention

•
Keep agent memory bounded.
- •Store summaries, not full transcripts.
•
Cap loops and delegation depth.
- •Every iterative workflow should have a hard stop.
•
Truncate tool results before sending them back into inference.
- •Pass top-N rows, extracted fields, or compressed summaries only.

If you want this class of bug to disappear permanently in CrewAI TypeScript, treat every agent call like an expensive bounded resource. The moment you let prompts accumulate unchecked, you are building an OOM incident generator.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit