How to Fix 'token limit exceeded during development' in LangGraph (TypeScript)
If you’re seeing token limit exceeded during development in LangGraph, it usually means your graph is feeding too much conversation history or tool output into the model call. In practice, this shows up when state keeps growing across nodes, and every turn re-sends the full transcript to the LLM.
The fix is usually not “raise the limit.” It’s to stop unbounded state growth, trim what gets passed into the model, and make sure you’re using the right message reducer in your graph state.
The Most Common Cause
The #1 cause is appending messages forever in MessagesAnnotation-style state and then passing the entire history into every node. In TypeScript LangGraph apps, this often happens when developers store all messages in state and never trim them before calling model.invoke().
Here’s the broken pattern versus the fixed pattern:
| Broken | Fixed |
|---|---|
| Reuses full message history on every step | Trims or summarizes before model call |
| Lets tool output accumulate indefinitely | Keeps only recent context |
| Sends raw state directly to LLM | Builds a bounded prompt payload |
// BROKEN
import { Annotation, StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
const State = Annotation.Root({
messages: Annotation<any[]>({
reducer: (left, right) => left.concat(right),
default: () => [],
}),
});
const model = new ChatOpenAI({ model: "gpt-4o-mini" });
const graph = new StateGraph(State)
.addNode("chat", async (state) => {
// Every run sends the entire accumulated history
const response = await model.invoke(state.messages);
return { messages: [response] };
})
.addEdge("__start__", "chat")
.addEdge("chat", "__end__")
.compile();
// FIXED
import { Annotation, StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { trimMessages } from "@langchain/core/messages";
const State = Annotation.Root({
messages: Annotation<any[]>({
reducer: (left, right) => left.concat(right),
default: () => [],
}),
});
const model = new ChatOpenAI({ model: "gpt-4o-mini" });
const graph = new StateGraph(State)
.addNode("chat", async (state) => {
const trimmed = trimMessages(state.messages, {
maxTokens: 3000,
strategy: "last",
tokenCounter: (msgs) => msgs.length * 100, // replace with a real counter in prod
});
const response = await model.invoke(trimmed);
return { messages: [response] };
})
.addEdge("__start__", "chat")
.addEdge("chat", "__end__")
.compile();
The important detail is that LangGraph itself does not magically cap your prompt size. If your reducer keeps concatenating arrays, your state grows until the next LLM call blows up with a context-length error.
Other Possible Causes
1. Tool results are too large
A common failure mode is returning full JSON payloads from tools and storing them back into messages. One API response can add tens of thousands of tokens.
// Bad: returning raw payload
return {
messages: [
{
role: "tool",
content: JSON.stringify(bigApiResponse),
},
],
};
Fix it by extracting only what the model needs:
return {
messages: [
{
role: "tool",
content: JSON.stringify({
accountId: bigApiResponse.accountId,
status: bigApiResponse.status,
balance: bigApiResponse.balance,
}),
},
],
};
2. You are using a long system prompt plus long history
A huge system message combined with chat history can push you over the edge fast.
const systemPrompt = `
You are an assistant for banking operations.
${veryLongPolicyDocument}
${anotherLongPolicyDocument}
`;
Move policy text out of the prompt where possible. Use retrieval or a compact policy summary instead.
3. Recursive graphs keep re-entering nodes
If a conditional edge loops without a stop condition, you can keep appending state until the model fails.
.addConditionalEdges("router", (state) => {
if (state.needsMoreWork) return "router"; // risky if never flips false
return "__end__";
});
Add hard limits:
if ((state.iteration ?? 0) > 5) return "__end__";
4. Streaming debug logs are being treated as prompt content
I’ve seen teams accidentally append internal traces or debug text into messages.
// Bad
return {
messages: [{ role: "user", content: JSON.stringify(debugState) }],
};
Keep logs out of conversation state. Store them separately in tracing or application telemetry.
How to Debug It
- •
Print token growth per node
- •Log
state.messages.lengthand approximate token count after each node. - •If one node causes a spike, that’s your culprit.
- •Log
- •
Inspect what gets sent to
model.invoke()- •Don’t guess.
- •Log the exact array/string you pass into
ChatOpenAI.invoke()or your custom LLM wrapper.
- •
Check reducers in your graph state
- •Look for reducers like
left.concat(right)on unbounded arrays. - •That pattern is fine only if you trim later.
- •Look for reducers like
- •
Run with a tiny test transcript
- •Start with one user message.
- •Add turns until failure.
- •If it fails after a tool call or loop iteration, you’ve isolated the growth source.
A practical trick is to measure serialized size before each invocation:
const approxSize = JSON.stringify(state.messages).length;
console.log({ approxSize, messageCount: state.messages.length });
That won’t give exact token counts, but it will show runaway growth quickly.
Prevention
- •Trim message history before every model call.
- •Keep tool outputs small and structured; never dump raw API responses into chat state.
- •Add hard caps on loop iterations and conversation length in graph nodes.
- •Use separate storage for logs, traces, and audit data instead of putting them in LangGraph state.
If you want one rule to remember: LangGraph state should be bounded. Once your graph starts treating every intermediate artifact as conversation history, "token limit exceeded during development" becomes inevitable.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit