How to Fix 'OOM error during inference in production' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
oom-error-during-inference-in-productionlanggraphtypescript

What the error means

An OOM error during inference in production usually means your process ran out of memory while LangGraph was building state, holding message history, or keeping too many large objects alive across nodes. In TypeScript apps, this often shows up under real traffic when a graph that looks fine in local tests starts accumulating tokens, documents, tool outputs, or retries.

In LangGraph terms, the failure is rarely “the model is too big” alone. It’s usually your graph state, checkpointing strategy, or node output shape causing memory growth until Node.js gets killed by the runtime.

The Most Common Cause — unbounded state growth

The #1 cause is returning the full conversation or full tool payload into graph state on every node execution. With StateGraph, that means each step keeps appending to messages, documents, or custom arrays without trimming.

A common pattern looks harmless until production traffic hits it.

Broken patternFixed pattern
Keep appending full messages and tool output into stateStore only the last N messages or a compact summary
Return raw documents from retrieval nodesReturn IDs, snippets, or compressed context
Let reducers concatenate foreverUse bounded reducers
// ❌ Broken: state grows without bound
import { StateGraph, Annotation } from "@langchain/langgraph";
import { HumanMessage, AIMessage } from "@langchain/core/messages";

const GraphState = Annotation.Root({
  messages: Annotation<any[]>({
    reducer: (left = [], right = []) => [...left, ...right],
    default: () => [],
  }),
});

const graph = new StateGraph(GraphState)
  .addNode("chat", async (state) => {
    const response = await llm.invoke(state.messages);
    return {
      messages: [new AIMessage(response.content)],
    };
  })
  .addEdge("__start__", "chat")
  .addEdge("chat", "__end__")
  .compile();
// ✅ Fixed: keep only bounded context
import { StateGraph, Annotation } from "@langchain/langgraph";
import { trimMessages } from "@langchain/core/messages";

const GraphState = Annotation.Root({
  messages: Annotation<any[]>({
    reducer: (left = [], right = []) => {
      const merged = [...left, ...right];
      return trimMessages(merged, { maxTokens: 4000 });
    },
    default: () => [],
  }),
});

const graph = new StateGraph(GraphState)
  .addNode("chat", async (state) => {
    const response = await llm.invoke(state.messages.slice(-10));
    return {
      messages: [response],
    };
  })
  .addEdge("__start__", "chat")
  .addEdge("chat", "__end__")
  .compile();

If you’re using a retrieval node, the same issue appears when you dump full Document[] objects into state. Keep the retrieved text small and deterministic.

Other Possible Causes

1. Recursive loops with no hard stop

A cycle in your graph can keep allocating memory until Node dies.

// ❌ Missing stop condition
graph.addConditionalEdges("agent", (state) => "tools");
graph.addEdge("tools", "agent");
// ✅ Add a max-iteration guard
graph.addConditionalEdges("agent", (state) =>
  state.iterations >= 5 ? "__end__" : "tools"
);

If you see repeated node execution in logs before the crash, this is likely it.

2. Large tool outputs stored in state

Tool calls that return HTML pages, PDFs converted to text, or giant JSON blobs can blow up memory fast.

// ❌ Store raw tool output
return { toolResult: hugeJsonResponse };
// ✅ Store only what the LLM needs
return {
  toolResult: {
    id: hugeJsonResponse.id,
    summary: hugeJsonResponse.summary,
  },
};

If a tool returns megabytes of data, write it to object storage and store a pointer in LangGraph state.

3. Checkpointer retaining too much per thread

When using a checkpointer like MemorySaver, every thread’s history stays resident in RAM. That’s fine for local dev; it’s a bad default for production.

import { MemorySaver } from "@langchain/langgraph";

const checkpointer = new MemorySaver(); // ❌ not for high-traffic prod

Use a persistent store instead:

// ✅ Use durable storage-backed checkpointing
const checkpointer = yourPostgresCheckpointer;

4. Parallel branches duplicating large payloads

If multiple branches each receive the same large state object, you multiply memory use by branch count.

// ❌ Fan-out with heavy shared state
graph.addNode("branchA", handlerA);
graph.addNode("branchB", handlerB);

Before branching, slim the state:

// ✅ Pass minimal branch input
return {
  query: state.query,
  messageIds: state.messages.map((m) => m.id).slice(-20),
};

How to Debug It

  1. Check whether memory grows with each graph step

    • Add logs around every node.
    • If RSS climbs after each invocation of agent, tools, or retriever, you have an accumulation problem.
  2. Inspect your reducer functions

    • Look for patterns like [..., left, ...right].
    • Any reducer that always appends without trimming is suspect.
  3. Measure payload size at node boundaries

    • Log JSON.stringify(state).length or use a serializer-safe size estimate.
    • If one node suddenly returns multi-MB objects, that’s your trigger.
  4. Run with smaller concurrency

    • If OOM disappears when concurrency drops to 1, parallel fan-out is amplifying memory pressure.
    • Also check whether multiple threads share an in-memory checkpointer like MemorySaver.

Prevention

  • Keep LangGraph state small and explicit.
    • Store IDs, summaries, and short excerpts instead of full documents or raw tool output.
  • Put hard limits on loops and message history.
    • Use max iteration counters and trim chat history before invoking the model.
  • Don’t use in-memory checkpointing in production.
    • Back checkpoints with Postgres or another durable store so thread history doesn’t sit in RAM forever.

If you want a quick rule: if a node returns something you wouldn’t want to duplicate ten times in memory, don’t put it directly into LangGraph state.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides