How to Fix 'OOM error during inference' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

oom-error-during-inferencelanggraphtypescript

What the error means

OOM error during inference means your process ran out of memory while LangGraph was executing a node, usually during model invocation or while carrying too much state between nodes. In TypeScript projects, this usually shows up when a graph keeps appending messages or large tool outputs until the runtime gets killed by Node, V8, or the container.

The failure often appears as one of these:

•FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
•Error: OOM error during inference
•Node.js process exited with signal SIGKILL
•LangGraphError: Failed to execute node "agent"

The Most Common Cause

The #1 cause is unbounded state growth in your MessagesAnnotation or custom state. You keep returning the full message history on every step, and each loop adds more tokens, more tool output, and more serialized objects until inference blows up.

This is common in agent loops with ToolNode, retries, or recursive graphs.

Broken vs fixed pattern

Broken pattern	Fixed pattern
Appends everything forever	Trims or replaces state
Stores raw tool payloads in graph state	Stores only compact summaries/IDs
Re-sends full history to every LLM call	Uses bounded message window

// BROKEN: unbounded message growth
import { StateGraph, MessagesAnnotation } from "@langchain/langgraph";
import { HumanMessage, AIMessage } from "@langchain/core/messages";

const graph = new StateGraph(MessagesAnnotation)
  .addNode("agent", async (state) => {
    const response = await llm.invoke(state.messages); // keeps growing
    return {
      messages: [...state.messages, response], // appends forever
    };
  })
  .addEdge("__start__", "agent")
  .addEdge("agent", "agent"); // loop

// FIXED: bounded history + compact state updates
import { trimMessages } from "@langchain/core/messages";

const graphFixed = new StateGraph(MessagesAnnotation)
  .addNode("agent", async (state) => {
    const recentMessages = trimMessages(state.messages, {
      maxTokens: 4000,
      strategy: "last",
      tokenCounter: llm,
    });

    const response = await llm.invoke(recentMessages);

    return {
      messages: [
        ...state.messages.slice(-6), // keep only recent context
        response,
      ],
    };
  })
  .addEdge("__start__", "agent")
  .addEdge("agent", "agent");

If you are using tools, the same mistake happens when a tool returns a giant JSON blob and you push it directly into state. Store a summary instead:

// BAD
return {
  messages: [...state.messages, new ToolMessage(JSON.stringify(hugeResult))],
};

// GOOD
return {
  messages: [...state.messages, new ToolMessage("Tool completed successfully.")],
  toolResultRef: hugeResult.id,
};

Other Possible Causes

1) Your model context window is too small for the prompt size

If your prompt plus history exceeds the model limit, the runtime may spike memory while tokenizing and assembling inputs.

// Bad: sending everything to a small-context model
await llm.invoke(allMessages);

// Better: trim before invoke
await llm.invoke(trimMessages(allMessages, { maxTokens: 3000, strategy: "last", tokenCounter: llm }));

2) A tool is returning massive payloads

This happens with search results, PDF extraction, database dumps, or API responses that get stored as-is.

// Bad
const result = await fetchBigPayload();
return { messages: [...state.messages, new AIMessage(JSON.stringify(result))] };

// Better
const result = await fetchBigPayload();
return {
  messages: [...state.messages, new AIMessage(`Fetched ${result.items.length} items.`)],
};

3) Recursive edges are causing runaway execution

A graph that loops without a hard stop will keep allocating memory until it dies.

// Bad
graph.addEdge("agent", "agent");

// Better
graph.addConditionalEdges("agent", (state) => {
  if (state.steps >= 5) return "__end__";
  return "agent";
});

4) Your runtime/container memory limit is too low

Sometimes the code is fine, but the deployment is capped at a tiny heap.

node --max-old-space-size=4096 dist/index.js

For Docker:

ENV NODE_OPTIONS="--max-old-space-size=4096"

How to Debug It

•
Log state size at every node
- •Print state.messages.length
- •Print approximate serialized size with Buffer.byteLength(JSON.stringify(state))
- •If it grows every turn, you found the leak
•
Isolate the failing node
- •Comment out tools first
- •Then comment out recursion/loops
- •Then test a single llm.invoke() call with static input
  This tells you whether the issue is prompt size, tool output, or graph structure.
•
Check your LangGraph update logic
- •Look for patterns like [...state.messages, ...newMessages]
- •Look for returning entire API responses into state
- •Look for reducers that never discard old entries
•
Run with more heap and compare
- •If increasing memory fixes it temporarily, you have a growth problem rather than a one-off spike.
- •If it still fails immediately, inspect prompt construction and tool payload size.

Prevention

•
Keep graph state small.
- •Store references, IDs, summaries, not raw documents and full API payloads.
•
Trim message history before every model call.
- •Use trimMessages() or explicit slicing on long-running agents.
•
Put hard limits on loops.
- •Every agent cycle should have a max step count or termination condition.
•
Treat tool output as untrusted input.
- •Serialize only what the next node actually needs.

If you hit OOM error during inference in LangGraph TypeScript, start by checking state growth. In practice, that’s where most of these failures come from.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit