How to Fix 'OOM error during inference during development' in LangChain (TypeScript)
What the error means
OOM error during inference during development usually means your Node.js process ran out of memory while LangChain was building prompts, loading documents, or sending a large payload through an LLM call. In TypeScript projects, this often shows up during local dev because you’re re-running chains with hot reload, huge .env-driven context, or unbounded document ingestion.
The actual runtime symptom is usually one of these:
- •
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory - •
RangeError: Invalid string length - •
Error: Request body too large - •
OpenAIError: 413 Request Entity Too Large
The Most Common Cause
The #1 cause is stuffing too much data into a single prompt. In LangChain terms, this happens when you use StuffDocumentsChain, createStuffDocumentsChain, or a naive RunnableSequence that concatenates all documents into one giant context window.
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Concatenates every document into one prompt | Splits docs and processes in batches |
| No token control | Uses chunking + retrieval |
| Easy to trigger OOM locally | Keeps memory and request size bounded |
// BROKEN: loads all docs into one prompt
import { ChatOpenAI } from "@langchain/openai";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { Document } from "@langchain/core/documents";
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const docs: Document[] = await loadAllDocsSomehow(); // hundreds/thousands of docs
const chain = await createStuffDocumentsChain({
llm,
prompt: myPrompt,
});
const result = await chain.invoke({
docs, // everything gets stuffed into one request
});
// FIXED: chunk first, retrieve relevant docs only
import { ChatOpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
import { createRetrievalChain } from "langchain/chains/retrieval";
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const embeddings = new OpenAIEmbeddings();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 150,
});
const chunks = await splitter.createDocuments([bigText]);
const vectorStore = await MemoryVectorStore.fromDocuments(chunks, embeddings);
const retriever = vectorStore.asRetriever(4);
const chain = await createRetrievalChain({
retriever,
combineDocsChain: myCombineDocsChain,
});
const result = await chain.invoke({
input: "Summarize the policy exceptions",
});
If you’re using createStuffDocumentsChain, treat it as a last resort for small inputs only. For anything that can grow in dev, switch to retrieval or map-reduce style processing.
Other Possible Causes
1) Loading too many documents into memory at once
This is common with file loaders like DirectoryLoader, PDFLoader, or custom ingestion scripts.
// BAD
const docs = await loader.load(); // loads entire corpus
// BETTER
for await (const doc of loader.lazyLoad()) {
await processDoc(doc);
}
If your loader doesn’t support streaming, batch the work yourself.
2) Embedding or vectorizing huge batches
Calling vectorStore.addDocuments(allDocs) with thousands of chunks can spike memory.
// BAD
await vectorStore.addDocuments(allChunks);
// BETTER
const batchSize = 50;
for (let i = 0; i < allChunks.length; i += batchSize) {
await vectorStore.addDocuments(allChunks.slice(i, i + batchSize));
}
This matters more in dev because hot reload can rerun the same ingestion path multiple times.
3) Recursive agent loops or runaway tool calls
Agents can keep calling tools until memory grows and the process dies. This often shows up with AgentExecutor when stop conditions are weak.
// BAD
const agentExecutor = new AgentExecutor({
agent,
tools,
});
// BETTER
const agentExecutor = new AgentExecutor({
agent,
tools,
maxIterations: 5,
});
Also watch for tools that return massive payloads. A single tool response can balloon the conversation state.
4) Node heap is too small for local development
Sometimes the code is fine, but your dev machine is running Node with the default heap ceiling.
node --max-old-space-size=4096 dist/index.js
For TypeScript dev scripts:
{
"scripts": {
"dev": "NODE_OPTIONS=--max-old-space-size=4096 tsx watch src/index.ts"
}
}
This is a mitigation, not a fix. If you need to double the heap just to run a single request, your chain design still needs work.
How to Debug It
- •
Find the exact LangChain step that spikes memory
- •Add logs before and after each major step:
- •document loading
- •splitting
- •embedding
- •retriever calls
- •LLM invocation
- •Add logs before and after each major step:
- •
Check whether input size explodes
- •Log token-ish proxies:
console.log("docs:", docs.length); console.log("first doc chars:", docs[0]?.pageContent.length); console.log("prompt chars:", promptValue.toString().length);
- •Log token-ish proxies:
- •
Disable parallelism
- •If you’re using
Promise.all()over many files or requests, replace it with sequential processing. - •Parallel ingestion often looks fine until local dev hits memory pressure.
- •If you’re using
- •
Run with heap diagnostics
- •Start Node with:
NODE_OPTIONS="--trace-gc --max-old-space-size=4096" npm run dev - •If GC thrashes before the crash, you’re overloading memory rather than hitting a logic bug.
- •Start Node with:
Prevention
- •Use retrieval-first patterns (
Retriever,createRetrievalChain) instead of stuffing full corpora into prompts. - •Batch ingestion and embedding jobs; never assume local dev can handle production-sized inputs in one call.
- •Put hard limits on agents:
- •
maxIterations - •max tool output size
- •max document count per request
- •
If you want a simple rule: if your LangChain code ever builds one giant string from unbounded data, you’ve built an OOM bug waiting to happen.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit