How to Fix 'token limit exceeded when scaling' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceeded-when-scalinglanggraphtypescript

What the error means

token limit exceeded when scaling usually means your graph is sending too much conversation state to the model as the workflow grows. In LangGraph, this typically shows up after a few turns, after branching, or when you keep appending messages without trimming state.

The failure is usually not in the model itself. It’s in how you persist and pass messages, context, or tool outputs between nodes.

The Most Common Cause

The #1 cause is unbounded message accumulation in graph state. In TypeScript, people often use messages: [...state.messages, newMessage] in every node and then send that full array back into the next LLM call.

That works for 2-3 turns. Then the prompt grows until you hit OpenAI/Anthropic token limits, or LangGraph starts failing during a larger fan-out/fan-in path.

Broken pattern vs fixed pattern

BrokenFixed
Appends every message foreverKeeps only the last N messages or summarizes old ones
Sends full state into every nodeSends only the slice needed for that step
No token budget checkEnforces a hard cap before model calls
// BROKEN: unbounded growth
import { Annotation, StateGraph } from "@langchain/langgraph";
import { HumanMessage, AIMessage } from "@langchain/core/messages";

const GraphState = Annotation.Root({
  messages: Annotation<any[]>({
    reducer: (left, right) => [...left, ...right],
    default: () => [],
  }),
});

async function chatNode(state: typeof GraphState.State) {
  const response = await llm.invoke(state.messages); // keeps growing
  return {
    messages: [new AIMessage(response.content)],
  };
}
// FIXED: trim before invoking the model
import { HumanMessage, AIMessage } from "@langchain/core/messages";

const MAX_MESSAGES = 12;

function trimMessages(messages: any[]) {
  return messages.slice(-MAX_MESSAGES);
}

async function chatNode(state: typeof GraphState.State) {
  const trimmed = trimMessages(state.messages);

  const response = await llm.invoke(trimmed);
  return {
    messages: [new AIMessage(response.content)],
  };
}

If you need long-running memory, don’t keep everything in prompt state. Store raw history outside the graph and summarize it into a smaller working set.

Other Possible Causes

1. Tool output is being stuffed into state verbatim

A large JSON payload from a tool can blow up token count fast. This happens with search results, PDFs, database rows, or OCR output.

// BAD
return {
  messages: [
    new AIMessage(`Tool result:\n${JSON.stringify(toolResult)}`),
  ],
};

Fix it by extracting only what the next step needs.

// GOOD
return {
  messages: [
    new AIMessage(
      `Top matches: ${toolResult.items.slice(0, 3).map(i => i.title).join(", ")}`
    ),
  ],
};

2. Recursive routing loops without a stop condition

A bad conditional edge can keep sending the same large state back through the graph until scaling fails.

builder.addConditionalEdges("router", (state) => {
  if (state.needsMoreWork) return "router"; // loop risk
  return "done";
});

Add a depth counter or iteration cap.

if ((state.iteration ?? 0) > 5) return "done";
return state.needsMoreWork ? "worker" : "done";

3. Parallel branches each carry the full conversation

When you fan out to multiple nodes, each branch may inherit the entire message history. If each branch also adds more content, token usage multiplies quickly.

builder.addEdge("start", ["branchA", "branchB", "branchC"]);

Pass only branch-specific context into each node instead of cloning all messages everywhere.

4. Model config is too aggressive for your prompt size

Sometimes the issue is not graph logic but model settings. A smaller context window will fail sooner than expected.

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxTokens: 2000,
});

If your prompt is already huge, lowering maxTokens does not fix input overflow. You still need to reduce input size before invocation.

How to Debug It

  1. Log token estimates at every LLM boundary

    • Print message count and approximate tokens before llm.invoke().
    • If you’re using LangChain utilities, measure serialized prompt length too.
  2. Inspect state growth per node

    • Add logging inside each node:
      console.log("node=chat", { messages: state.messages.length });
      
    • Find the first node where size jumps unexpectedly.
  3. Check for large tool payloads

    • Log JSON.stringify(toolResult).length.
    • If one tool returns megabytes of text, truncate or summarize before adding it to state.
  4. Reproduce with one branch at a time

    • Disable parallel edges and recursive routes.
    • If the error disappears, your issue is branch amplification or looped routing.

Prevention

  • Keep graph state small.

    • Store only active working context in messages.
    • Put long-term history in external storage and summarize it back into state.
  • Add a hard token budget.

    • Before every model call, trim messages to fit under a fixed ceiling.
    • Make this a utility used by all nodes, not ad hoc logic per route.
  • Treat tool output as untrusted prompt material.

    • Never dump raw JSON, HTML, logs, or documents into state unless you explicitly compress them first.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides