How to Fix 'context length exceeded when scaling' in LangGraph (TypeScript)
If you see context length exceeded when scaling in LangGraph, you’re usually feeding the model more messages than the underlying LLM can accept. It shows up when your graph starts accumulating state across turns, tool calls, retries, or parallel branches, and then one node finally sends the whole history to the model.
In TypeScript LangGraph apps, this almost always means your state is growing faster than you trim it. The failure usually surfaces as a provider error like 400 Bad Request: This model's maximum context length is ... or a LangChain/LangGraph node failure when ChatOpenAI.invoke() tries to send an oversized message list.
The Most Common Cause
The #1 cause is storing the full conversation in graph state and re-sending it on every step without trimming. In LangGraph, this often happens when you use a messages array in state and append forever inside a looping graph.
Here’s the broken pattern:
| Broken | Fixed |
|---|---|
| Keeps all messages forever | Trims messages before each model call |
| Replays full history on every loop | Uses a bounded window or summary |
| Scales linearly until it breaks | Keeps prompt size under control |
// Broken: unbounded message growth
import { StateGraph, Annotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const State = Annotation.Root({
messages: Annotation<any[]>({
reducer: (state, update) => [...state, ...update],
default: () => [],
}),
});
async function callModel(state: typeof State.State) {
const response = await llm.invoke(state.messages); // sends everything
return { messages: [response] };
}
// Fixed: trim before invoking the model
import { trimMessages } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
async function callModel(state: { messages: any[] }) {
const trimmed = trimMessages(state.messages, {
maxTokens: 6000,
strategy: "last",
includeSystem: true,
});
const response = await llm.invoke(trimmed);
return { messages: [response] };
}
If you are using StateGraph, the real fix is to stop treating messages as an infinite log. Use a reducer for accumulation only when you actually need it, then trim or summarize before the next LLM call.
Other Possible Causes
1. Tool outputs are too large
A common mistake is returning raw API payloads, PDFs, HTML pages, or database dumps from tools directly into state.
// Bad: tool returns huge payload
return {
messages: [
{
role: "tool",
content: JSON.stringify(apiResponse),
},
],
};
Fix it by extracting only the fields the model needs.
return {
messages: [
{
role: "tool",
content: JSON.stringify({
id: apiResponse.id,
status: apiResponse.status,
summary: apiResponse.summary,
}),
},
],
};
2. Parallel branches merge too much state
In LangGraph, multiple branches can append to the same state key. If each branch returns large text blobs, the merged state explodes.
// Problematic reducer pattern
messages: Annotation<any[]>({
reducer: (state, update) => [...state, ...update],
default: () => [],
});
If several nodes write large outputs into messages, your context grows fast. Use separate keys for structured data and keep messages for chat text only.
3. Recursive loops never hit a stop condition
A conditional edge that keeps cycling back to an LLM node can silently build massive history.
builder.addConditionalEdges("agent", (state) => {
if (state.done) return END;
return "agent"; // loops forever if done never flips
});
Add hard limits:
if (state.iterationCount > 8) return END;
4. You are counting tokens incorrectly
Character count is not token count. A message buffer that looks small in TypeScript can still exceed model limits after tool output and formatting overhead.
Use token-aware trimming instead of slicing strings manually:
const trimmed = trimMessages(messages, {
maxTokens: 8000,
strategy: "last",
});
How to Debug It
- •
Log message count and approximate token size before every LLM call
Printstate.messages.lengthand estimate token usage. If it spikes after tool calls or branch merges, you found the source. - •
Inspect which node last mutated state
In LangGraph, trace the node path that led to failure. The offending node is often not the one throwing; it’s the one that added too much data earlier. - •
Temporarily disable tools and loops
Run only the base chat path with one turn. If the error disappears, re-enable nodes one at a time until context growth returns. - •
Check provider error details
Look for errors like:- •
400 Bad Request - •
This model's maximum context length is X tokens - •
prompt too long - •
context_length_exceeded
Those tell you this is not a LangGraph runtime bug; it’s a prompt size problem.
- •
Prevention
- •Keep raw conversation history outside graph state unless you truly need it.
- •Use summary + recent window patterns for long-running agents.
- •Store large artifacts in external storage and pass references into the graph.
- •Add a token budget check before every model invocation.
- •Treat every looping or branching graph as unbounded until you prove otherwise.
The practical fix is simple: stop sending everything to the model. In LangGraph TypeScript apps, stable agents keep short working memory in-state and push everything else into summaries, stores, or external systems.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit