How to Fix 'OOM error during inference during development' in LangChain (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
oom-error-during-inference-during-developmentlangchaintypescript

What the error means

OOM error during inference during development usually means your Node.js process ran out of memory while LangChain was building prompts, loading documents, or sending a large payload through an LLM call. In TypeScript projects, this often shows up during local dev because you’re re-running chains with hot reload, huge .env-driven context, or unbounded document ingestion.

The actual runtime symptom is usually one of these:

  • FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
  • RangeError: Invalid string length
  • Error: Request body too large
  • OpenAIError: 413 Request Entity Too Large

The Most Common Cause

The #1 cause is stuffing too much data into a single prompt. In LangChain terms, this happens when you use StuffDocumentsChain, createStuffDocumentsChain, or a naive RunnableSequence that concatenates all documents into one giant context window.

Broken vs fixed pattern

BrokenFixed
Concatenates every document into one promptSplits docs and processes in batches
No token controlUses chunking + retrieval
Easy to trigger OOM locallyKeeps memory and request size bounded
// BROKEN: loads all docs into one prompt
import { ChatOpenAI } from "@langchain/openai";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { Document } from "@langchain/core/documents";

const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

const docs: Document[] = await loadAllDocsSomehow(); // hundreds/thousands of docs

const chain = await createStuffDocumentsChain({
  llm,
  prompt: myPrompt,
});

const result = await chain.invoke({
  docs, // everything gets stuffed into one request
});
// FIXED: chunk first, retrieve relevant docs only
import { ChatOpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
import { createRetrievalChain } from "langchain/chains/retrieval";

const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const embeddings = new OpenAIEmbeddings();

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 150,
});

const chunks = await splitter.createDocuments([bigText]);
const vectorStore = await MemoryVectorStore.fromDocuments(chunks, embeddings);

const retriever = vectorStore.asRetriever(4);

const chain = await createRetrievalChain({
  retriever,
  combineDocsChain: myCombineDocsChain,
});

const result = await chain.invoke({
  input: "Summarize the policy exceptions",
});

If you’re using createStuffDocumentsChain, treat it as a last resort for small inputs only. For anything that can grow in dev, switch to retrieval or map-reduce style processing.

Other Possible Causes

1) Loading too many documents into memory at once

This is common with file loaders like DirectoryLoader, PDFLoader, or custom ingestion scripts.

// BAD
const docs = await loader.load(); // loads entire corpus
// BETTER
for await (const doc of loader.lazyLoad()) {
  await processDoc(doc);
}

If your loader doesn’t support streaming, batch the work yourself.


2) Embedding or vectorizing huge batches

Calling vectorStore.addDocuments(allDocs) with thousands of chunks can spike memory.

// BAD
await vectorStore.addDocuments(allChunks);
// BETTER
const batchSize = 50;
for (let i = 0; i < allChunks.length; i += batchSize) {
  await vectorStore.addDocuments(allChunks.slice(i, i + batchSize));
}

This matters more in dev because hot reload can rerun the same ingestion path multiple times.


3) Recursive agent loops or runaway tool calls

Agents can keep calling tools until memory grows and the process dies. This often shows up with AgentExecutor when stop conditions are weak.

// BAD
const agentExecutor = new AgentExecutor({
  agent,
  tools,
});
// BETTER
const agentExecutor = new AgentExecutor({
  agent,
  tools,
  maxIterations: 5,
});

Also watch for tools that return massive payloads. A single tool response can balloon the conversation state.


4) Node heap is too small for local development

Sometimes the code is fine, but your dev machine is running Node with the default heap ceiling.

node --max-old-space-size=4096 dist/index.js

For TypeScript dev scripts:

{
  "scripts": {
    "dev": "NODE_OPTIONS=--max-old-space-size=4096 tsx watch src/index.ts"
  }
}

This is a mitigation, not a fix. If you need to double the heap just to run a single request, your chain design still needs work.

How to Debug It

  1. Find the exact LangChain step that spikes memory

    • Add logs before and after each major step:
      • document loading
      • splitting
      • embedding
      • retriever calls
      • LLM invocation
  2. Check whether input size explodes

    • Log token-ish proxies:
      console.log("docs:", docs.length);
      console.log("first doc chars:", docs[0]?.pageContent.length);
      console.log("prompt chars:", promptValue.toString().length);
      
  3. Disable parallelism

    • If you’re using Promise.all() over many files or requests, replace it with sequential processing.
    • Parallel ingestion often looks fine until local dev hits memory pressure.
  4. Run with heap diagnostics

    • Start Node with:
      NODE_OPTIONS="--trace-gc --max-old-space-size=4096" npm run dev
      
    • If GC thrashes before the crash, you’re overloading memory rather than hitting a logic bug.

Prevention

  • Use retrieval-first patterns (Retriever, createRetrievalChain) instead of stuffing full corpora into prompts.
  • Batch ingestion and embedding jobs; never assume local dev can handle production-sized inputs in one call.
  • Put hard limits on agents:
    • maxIterations
    • max tool output size
    • max document count per request

If you want a simple rule: if your LangChain code ever builds one giant string from unbounded data, you’ve built an OOM bug waiting to happen.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides