How to Fix 'token limit exceeded in production' in LangGraph (TypeScript)
A token limit exceeded in production error in LangGraph usually means your graph is carrying too much conversation history or tool output into the next LLM call. In practice, it shows up after a few turns, after a long tool response, or when you stream state without trimming messages.
In TypeScript, this is almost always a state management problem, not a LangGraph bug.
The Most Common Cause
The #1 cause is appending every message forever and passing the full messages array back into the model on every node execution. LangGraph keeps state unless you explicitly trim it, so your prompt grows until the provider rejects it with errors like:
- •
400 Bad Request: This model's maximum context length is 128000 tokens. However, your messages resulted in 131245 tokens. - •
BadRequestError: token limit exceeded - •
OpenAIError: The request exceeds the maximum context length
Here’s the broken pattern versus the fixed pattern.
| Broken pattern | Fixed pattern |
|---|---|
| Keep all messages in state forever | Trim or summarize before invoking the model |
| Pass raw tool output directly into chat history | Store only what the model needs |
| Let each node append unbounded context | Cap state size with reducers or pruning |
// BROKEN
import { StateGraph, Annotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const GraphState = Annotation.Root({
messages: Annotation<any[]>({
reducer: (left, right) => left.concat(right),
default: () => [],
}),
});
async function callModel(state: typeof GraphState.State) {
const response = await llm.invoke(state.messages); // grows forever
return { messages: [response] };
}
// FIXED
import { StateGraph, Annotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { trimMessages } from "@langchain/core/messages";
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const GraphState = Annotation.Root({
messages: Annotation<any[]>({
reducer: (left, right) => left.concat(right),
default: () => [],
}),
});
async function callModel(state: typeof GraphState.State) {
const trimmed = trimMessages(state.messages, {
maxTokens: 6000,
strategy: "last",
tokenCounter: (msgs) => msgs.length * 200, // replace with real tokenizer if needed
});
const response = await llm.invoke(trimmed);
return { messages: [response] };
}
If you are using tool calls, this gets worse fast. A single PDF extraction or database dump can add tens of thousands of tokens to messages.
Other Possible Causes
1. Tool output is being stuffed into chat history
A common mistake is returning full JSON payloads from tools and then appending them to messages. That works in dev with small samples and fails in production when real data arrives.
// BAD
return {
messages: [
new ToolMessage({
content: JSON.stringify(hugeResult), // too large
tool_call_id,
}),
],
};
Fix it by storing only a compact summary.
// GOOD
return {
messages: [
new ToolMessage({
content: JSON.stringify({
status: "ok",
rows: hugeResult.rows.length,
summary: hugeResult.summary,
}),
tool_call_id,
}),
],
};
2. You are using a large system prompt plus long user context
This happens when teams paste policy docs, product specs, and routing rules into every request. The graph looks fine until one long user message pushes it over the edge.
const systemPrompt = `
You are an assistant.
${veryLongPolicyDoc}
${veryLongProductSpec}
${veryLongRoutingRules}
`;
Split static instructions from runtime context, and keep the system prompt tight.
const systemPrompt = `
You are an assistant for bank support.
Follow policy and ask for missing account identifiers.
`;
3. Memory/checkpointing is persisting too much state
If you use a checkpointer and never prune old turns, every thread accumulates history across runs. This is especially common with MemorySaver in long-lived sessions.
import { MemorySaver } from "@langchain/langgraph";
const checkpointer = new MemorySaver();
Use pruning before checkpointing or store only structured summaries for older turns.
function compactMessages(messages: any[]) {
return messages.slice(-12); // keep last N turns only
}
4. Your retrieval step returns too many documents
RAG graphs often fail because retrieval returns top-10 chunks at full size. The model sees those chunks plus conversation history plus tool output and tips over.
const docs = await retriever.getRelevantDocuments(query); // too many / too large
Reduce both count and chunk size.
const docs = await retriever.getRelevantDocuments(query);
const compactDocs = docs.slice(0, 3).map((d) => ({
pageContent: d.pageContent.slice(0, 1200),
}));
How to Debug It
- •
Log token growth at each node
- •Print message count and approximate token count before every
llm.invoke(). - •If one node jumps from manageable to huge, that’s your culprit.
- •Print message count and approximate token count before every
- •
Inspect the last payload sent to the model
- •Log the final
messagesarray shape. - •Look for giant tool outputs, repeated summaries, or duplicated history.
- •Log the final
- •
Binary search the graph
- •Disable nodes one by one.
- •If removing retrieval fixes it, your RAG context is too large.
- •If removing tools fixes it, your tool responses are too verbose.
- •
Check provider error details
- •OpenAI-style errors usually tell you exact token counts.
- •Anthropic-style errors often say input exceeded context window.
- •Match that number against your logged prompt size.
Prevention
- •Keep a hard cap on conversation history.
- •Use trimming or summarization before every LLM call.
- •Never store raw tool dumps in chat state.
- •Persist structured results elsewhere and pass summaries to the model.
- •Add token budget checks in CI or runtime tests.
- •Fail fast when prompts cross a threshold like 70–80% of model context.
If you want this to stop happening in production, treat prompt size like memory usage in backend code. Measure it, cap it, and do not let state grow unbounded inside LangGraph.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit