How to Fix 'context length exceeded when scaling' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceeded-when-scalinglanggraphtypescript

If you see context length exceeded when scaling in LangGraph, you’re usually feeding the model more messages than the underlying LLM can accept. It shows up when your graph starts accumulating state across turns, tool calls, retries, or parallel branches, and then one node finally sends the whole history to the model.

In TypeScript LangGraph apps, this almost always means your state is growing faster than you trim it. The failure usually surfaces as a provider error like 400 Bad Request: This model's maximum context length is ... or a LangChain/LangGraph node failure when ChatOpenAI.invoke() tries to send an oversized message list.

The Most Common Cause

The #1 cause is storing the full conversation in graph state and re-sending it on every step without trimming. In LangGraph, this often happens when you use a messages array in state and append forever inside a looping graph.

Here’s the broken pattern:

BrokenFixed
Keeps all messages foreverTrims messages before each model call
Replays full history on every loopUses a bounded window or summary
Scales linearly until it breaksKeeps prompt size under control
// Broken: unbounded message growth
import { StateGraph, Annotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

const State = Annotation.Root({
  messages: Annotation<any[]>({
    reducer: (state, update) => [...state, ...update],
    default: () => [],
  }),
});

async function callModel(state: typeof State.State) {
  const response = await llm.invoke(state.messages); // sends everything
  return { messages: [response] };
}
// Fixed: trim before invoking the model
import { trimMessages } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

async function callModel(state: { messages: any[] }) {
  const trimmed = trimMessages(state.messages, {
    maxTokens: 6000,
    strategy: "last",
    includeSystem: true,
  });

  const response = await llm.invoke(trimmed);
  return { messages: [response] };
}

If you are using StateGraph, the real fix is to stop treating messages as an infinite log. Use a reducer for accumulation only when you actually need it, then trim or summarize before the next LLM call.

Other Possible Causes

1. Tool outputs are too large

A common mistake is returning raw API payloads, PDFs, HTML pages, or database dumps from tools directly into state.

// Bad: tool returns huge payload
return {
  messages: [
    {
      role: "tool",
      content: JSON.stringify(apiResponse),
    },
  ],
};

Fix it by extracting only the fields the model needs.

return {
  messages: [
    {
      role: "tool",
      content: JSON.stringify({
        id: apiResponse.id,
        status: apiResponse.status,
        summary: apiResponse.summary,
      }),
    },
  ],
};

2. Parallel branches merge too much state

In LangGraph, multiple branches can append to the same state key. If each branch returns large text blobs, the merged state explodes.

// Problematic reducer pattern
messages: Annotation<any[]>({
  reducer: (state, update) => [...state, ...update],
  default: () => [],
});

If several nodes write large outputs into messages, your context grows fast. Use separate keys for structured data and keep messages for chat text only.

3. Recursive loops never hit a stop condition

A conditional edge that keeps cycling back to an LLM node can silently build massive history.

builder.addConditionalEdges("agent", (state) => {
  if (state.done) return END;
  return "agent"; // loops forever if done never flips
});

Add hard limits:

if (state.iterationCount > 8) return END;

4. You are counting tokens incorrectly

Character count is not token count. A message buffer that looks small in TypeScript can still exceed model limits after tool output and formatting overhead.

Use token-aware trimming instead of slicing strings manually:

const trimmed = trimMessages(messages, {
  maxTokens: 8000,
  strategy: "last",
});

How to Debug It

  1. Log message count and approximate token size before every LLM call
    Print state.messages.length and estimate token usage. If it spikes after tool calls or branch merges, you found the source.

  2. Inspect which node last mutated state
    In LangGraph, trace the node path that led to failure. The offending node is often not the one throwing; it’s the one that added too much data earlier.

  3. Temporarily disable tools and loops
    Run only the base chat path with one turn. If the error disappears, re-enable nodes one at a time until context growth returns.

  4. Check provider error details
    Look for errors like:

    • 400 Bad Request
    • This model's maximum context length is X tokens
    • prompt too long
    • context_length_exceeded

    Those tell you this is not a LangGraph runtime bug; it’s a prompt size problem.

Prevention

  • Keep raw conversation history outside graph state unless you truly need it.
  • Use summary + recent window patterns for long-running agents.
  • Store large artifacts in external storage and pass references into the graph.
  • Add a token budget check before every model invocation.
  • Treat every looping or branching graph as unbounded until you prove otherwise.

The practical fix is simple: stop sending everything to the model. In LangGraph TypeScript apps, stable agents keep short working memory in-state and push everything else into summaries, stores, or external systems.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides