How to Fix 'OOM error during inference during development' in LangGraph (TypeScript)
What this error actually means
OOM error during inference during development usually means your LangGraph app is building up too much state, too many tokens, or too many concurrent calls before the model can finish a step. In TypeScript projects, this often shows up during local dev when you loop on the same graph state, keep full message history in memory, or accidentally fan out multiple model calls at once.
The key point: this is usually not a LangGraph bug. It’s almost always a graph design problem, a memory-growth problem, or an inference payload that keeps getting bigger on every node execution.
The Most Common Cause
The #1 cause is unbounded message accumulation in graph state.
If you keep appending full chat history on every node run, your MessagesAnnotation state grows until the next LLM call hits token or memory limits. In LangGraph JS/TS, this often surfaces as a runtime failure around the model call, sometimes alongside provider errors like:
- •
Error: OOM error during inference during development - •
BadRequestError: Request too large - •
context_length_exceeded - •
RangeError: Invalid string length
Broken pattern vs fixed pattern
| Broken | Fixed |
|---|---|
| Appends every message forever | Keeps only the last N messages or summarizes |
| Passes full state into every node | Passes only the minimal slice needed |
| Reuses raw history without trimming | Uses a reducer or checkpointed summary |
// ❌ Broken: state grows on every turn
import { Annotation, StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
const State = Annotation.Root({
messages: Annotation<string[]>({
reducer: (prev, next) => [...prev, ...next],
default: () => [],
}),
});
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const graph = new StateGraph(State)
.addNode("chat", async (state) => {
const response = await llm.invoke(state.messages);
return { messages: [response.content as string] };
})
.addEdge("__start__", "chat")
.addEdge("chat", "__end__")
.compile();
// ✅ Fixed: trim history before inference
import { Annotation, StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
const State = Annotation.Root({
messages: Annotation<string[]>({
reducer: (prev, next) => [...prev, ...next].slice(-12),
default: () => [],
}),
});
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const graph = new StateGraph(State)
.addNode("chat", async (state) => {
const trimmed = state.messages.slice(-12);
const response = await llm.invoke(trimmed);
return { messages: [response.content as string] };
})
.addEdge("__start__", "chat")
.addEdge("chat", "__end__")
.compile();
If you need long-term context, don’t keep everything in prompt state. Summarize older turns into a compact memory field and keep recent turns only.
Other Possible Causes
1) Recursive graph loops with no stop condition
A conditional edge that keeps routing back into the same node can blow up memory fast.
// ❌ Broken
graph.addConditionalEdges("router", (state) => "router");
// ✅ Fixed
graph.addConditionalEdges("router", (state) =>
state.iteration > 3 ? "__end__" : "router"
);
If you see repeated node execution in logs before the crash, this is likely it.
2) Parallel branches duplicating large payloads
If you fan out to multiple nodes and each branch receives the full transcript or document set, memory usage multiplies quickly.
// ❌ Broken: huge shared payload copied into every branch
graph.addNode("a", handlerA);
graph.addNode("b", handlerB);
graph.addEdge("__start__", "a");
graph.addEdge("__start__", "b");
Fix it by sending only branch-specific inputs:
// ✅ Fixed: split minimal inputs per branch
const docsForA = docs.slice(0, 3);
const docsForB = docs.slice(3, 6);
3) Tool outputs returning massive blobs
A tool that returns entire HTML pages, PDFs converted to text, or large JSON objects can explode your prompt size on the next step.
// ❌ Broken
return { content: JSON.stringify(hugeApiResponse) };
// ✅ Fixed
return {
content: JSON.stringify({
id: hugeApiResponse.id,
status: hugeApiResponse.status,
summary: hugeApiResponse.summary,
}),
};
Keep tool results small. If you need details later, store them externally and pass back references.
4) Using an oversized model payload in dev
Sometimes the issue is not LangGraph state but the request itself. A large system prompt plus long history plus tool schema can exceed provider limits.
// Example of a risky setup
const llm = new ChatOpenAI({
model: "gpt-4o",
temperature: 0,
});
Check:
- •system prompt length
- •number of tools registered
- •function schema size
- •message count before each invoke
How to Debug It
- •
Log state size before every model call
- •Print
messages.length, total character count, and any large fields. - •If size climbs each turn, you found the leak.
- •Print
- •
Turn off branches and tools
- •Run only one node with one LLM call.
- •If the error disappears, re-enable components one at a time.
- •
Inspect graph execution traces
- •Look for repeated visits to the same node.
- •A loop like
agent -> tool -> agent -> toolwith no exit condition is a common culprit.
- •
Reduce input aggressively
- •Temporarily cap messages to the last
3or5. - •Replace tool outputs with tiny summaries.
- •If the error stops immediately, your fix is in state trimming.
- •Temporarily cap messages to the last
A quick diagnostic helper:
function logPayload(state: { messages?: unknown[] }) {
const json = JSON.stringify(state);
console.log("messages:", state.messages?.length ?? 0);
console.log("payload bytes:", Buffer.byteLength(json, "utf8"));
}
If payload bytes keep increasing across turns without bound, don’t keep debugging inference first. Fix state growth first.
Prevention
- •Keep graph state small.
- •Store only recent messages and compact summaries.
- •Put hard caps on loops and retries.
- •Every recursive route should have an exit condition.
- •Treat tool outputs as untrusted payloads.
- •Return IDs and summaries, not raw dumps.
If you’re building production LangGraph agents in TypeScript for regulated environments like banking or insurance, this is one of those bugs that gets expensive fast. The fix is usually simple once you stop feeding the model more context than it needs.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit