How to Fix 'OOM error during inference when scaling' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
oom-error-during-inference-when-scalinglanggraphtypescript

When you see OOM error during inference when scaling in LangGraph, it usually means your graph is holding too much state, too many tokens, or too many concurrent runs in memory. In TypeScript projects, this shows up most often when you scale from one test conversation to multiple parallel sessions or long-running agent workflows.

The error is rarely “LangGraph is broken.” It’s usually your state shape, checkpointing strategy, or model invocation pattern causing memory to balloon until Node gets killed.

The Most Common Cause

The #1 cause is passing the full conversation history through every node and re-attaching large message arrays on each step. In LangGraph, that often happens when developers store messages in state and keep appending without trimming or summarizing.

Here’s the broken pattern:

BrokenFixed
Keeps growing messages foreverTrims or summarizes messages
Re-invokes model with full history every nodeSends only the recent context needed
Copies large objects into stateStores references/IDs instead
import { StateGraph, Annotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { AIMessage, HumanMessage } from "@langchain/core/messages";

const model = new ChatOpenAI({ model: "gpt-4o-mini" });

const GraphState = Annotation.Root({
  messages: Annotation<any[]>({
    reducer: (left, right) => [...left, ...right],
    default: () => [],
  }),
});

const app = new StateGraph(GraphState)
  .addNode("agent", async (state) => {
    const response = await model.invoke(state.messages);
    return { messages: [response] };
  })
  .addEdge("__start__", "agent")
  .addEdge("agent", "__end__")
  .compile();

// This looks fine until the same thread runs for a while.
await app.invoke({
  messages: [
    new HumanMessage("Start the case"),
    // ... hundreds of prior turns
  ],
});

The fixed version trims context before calling the model. If you need long-term memory, persist summaries or externalize raw history to a store.

import { trimMessages } from "@langchain/core/messages";

const GraphState = Annotation.Root({
  messages: Annotation<any[]>({
    reducer: (left, right) => [...left, ...right],
    default: () => [],
  }),
});

const app = new StateGraph(GraphState)
  .addNode("agent", async (state) => {
    const recentMessages = trimMessages(state.messages, {
      maxTokens: 4000,
      tokenCounter: (msgs) => msgs.length * 200,
      strategy: "last",
      includeSystem: true,
    });

    const response = await model.invoke(recentMessages);
    return { messages: [response] };
  })
  .addEdge("__start__", "agent")
  .addEdge("agent", "__end__")
  .compile();

If you’re using MemorySaver, this matters even more. Checkpointing preserves state between runs, so an oversized messages array gets replayed again and again.

Other Possible Causes

1. Too much concurrency

If you fan out many graph executions at once, Node can OOM even if each run is individually reasonable.

// Bad: unbounded parallelism
await Promise.all(cases.map((c) => app.invoke(c)));

Use a concurrency limit:

import pLimit from "p-limit";

const limit = pLimit(4);

await Promise.all(
  cases.map((c) => limit(() => app.invoke(c)))
);

2. Large binary or JSON payloads inside state

Do not put PDFs, base64 images, full OCR output, or huge API responses directly into graph state.

// Bad
return {
  documentText: hugePdfExtract,
};

Store a pointer instead:

// Good
return {
  documentId: "doc_123",
  documentSummary: summary,
};

3. Recursive loops with no stop condition

A cycle that keeps calling the LLM will inflate both state and heap usage.

// Bad: no termination guard
graph.addConditionalEdges("agent", (state) => "agent");

Add a hard stop:

graph.addConditionalEdges("agent", (state) =>
  state.iterationCount > 5 ? "__end__" : "agent"
);

4. Checkpointer storing oversized snapshots

With MemorySaver, every checkpoint stays in RAM. Under load, that becomes expensive fast.

import { MemorySaver } from "@langchain/langgraph";

const checkpointer = new MemorySaver(); // risky for high-scale production

Use a persistent backend for real traffic:

// Example pattern only — use your DB-backed checkpointer implementation
const checkpointer = yourPostgresCheckpointer;

How to Debug It

  1. Measure state size at each node

    • Log JSON.stringify(state).length before and after every node.
    • If one field keeps growing, that’s your culprit.
  2. Inspect message growth

    • Print state.messages.length.
    • If it climbs on every turn without trimming, fix that first.
  3. Check process memory during load

    • Watch process.memoryUsage().heapUsed.
    • If heap spikes with concurrency, reduce parallel invocations before touching graph logic.
  4. Disable checkpoints temporarily

    • Run once without a checkpointer.
    • If OOM disappears, your snapshot strategy is the problem.
    • If it persists, the issue is likely state growth or concurrency.

Prevention

  • Keep graph state small.
    • Store IDs, summaries, and pointers instead of raw payloads.
  • Trim messages before every model call.
    • Use recent context only unless the full transcript is truly required.
  • Put concurrency limits around .invoke() and .stream().
    • Unbounded parallelism will take down Node faster than bad prompts will.

If you want a simple rule: in LangGraph TypeScript apps, never let raw conversation history become your permanent runtime state unless you’ve explicitly designed for it. That single mistake causes most “OOM during inference when scaling” failures I see in production.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides