How to Fix 'token limit exceeded when scaling' in LangGraph (Python)
When you see token limit exceeded when scaling in LangGraph, it usually means your graph state is growing faster than your model context window. The failure typically shows up after several agent turns, tool calls, or recursive graph steps when you keep appending full messages into state without trimming.
In practice, this is almost always a state management problem, not a LangGraph bug. The graph is doing exactly what you told it to do: carry forward too much text until the next LLM call blows past the token budget.
The Most Common Cause
The #1 cause is storing the entire conversation history in MessagesState and never pruning it before the next node runs.
Here’s the broken pattern:
| Broken | Fixed |
|---|---|
| Appends every message forever | Keeps only the last N messages or a summarized state |
| Reuses full history for every LLM call | Passes a trimmed message window to the model |
| Token count grows on every loop | Token count stays bounded |
# BROKEN
from langgraph.graph import StateGraph, MessagesState, START, END
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
def assistant_node(state: MessagesState):
# state["messages"] keeps growing forever
response = llm.invoke(state["messages"])
return {"messages": [response]}
graph = StateGraph(MessagesState)
graph.add_node("assistant", assistant_node)
graph.add_edge(START, "assistant")
graph.add_edge("assistant", END)
app = graph.compile()
# FIXED
from langgraph.graph import StateGraph, MessagesState, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage
llm = ChatOpenAI(model="gpt-4o-mini")
def trim_messages(messages, keep_last=8):
# Keep the system message plus a small sliding window
system = [m for m in messages if isinstance(m, SystemMessage)]
non_system = [m for m in messages if not isinstance(m, SystemMessage)]
return system + non_system[-keep_last:]
def assistant_node(state: MessagesState):
trimmed = trim_messages(state["messages"], keep_last=8)
response = llm.invoke(trimmed)
return {"messages": [response]}
graph = StateGraph(MessagesState)
graph.add_node("assistant", assistant_node)
graph.add_edge(START, "assistant")
graph.add_edge("assistant", END)
app = graph.compile()
If you are using a looping agent graph, this matters even more. Each iteration re-injects prior state into the prompt unless you explicitly reduce it.
A better production pattern is to store:
- •recent messages in
messages - •long-term facts in a separate summary field
- •raw transcripts outside the prompt path
Other Possible Causes
1) Tool outputs are too large
If a tool returns entire documents, logs, HTML pages, or database dumps, those outputs get added to state and explode token usage.
# BAD: returning huge raw output
def search_tool(query: str):
return open("full_report.txt").read()
Fix it by truncating or summarizing before returning:
# GOOD: return compact result
def search_tool(query: str):
text = open("full_report.txt").read()
return text[:2000] # or summarize first
2) Recursive edges create repeated context accumulation
A common LangGraph pattern is an agent node that routes back to itself or another node. If each pass appends new messages without cleaning old ones, tokens compound fast.
# Example of a loop that can grow state indefinitely
graph.add_conditional_edges("agent", route_fn, {
"tools": "tools",
"done": END,
})
graph.add_edge("tools", "agent")
If tools returns verbose content every time, your prompt size increases on every cycle. Add a reducer or summarize tool results before they go back into messages.
3) You are passing full retrieved documents into the prompt
Retrieval-Augmented Generation can trigger this error if you stuff top-k chunks directly into the user prompt.
context = "\n\n".join([doc.page_content for doc in docs])
prompt = f"Answer using this context:\n{context}\n\nQuestion: {question}"
Use smaller chunks and cap the total context size:
context = "\n\n".join([doc.page_content[:500] for doc in docs[:3]])
4) Checkpointer/state includes fields that should not be sent to the model
Sometimes developers keep debug traces, tool metadata, embeddings payloads, or JSON blobs inside graph state and accidentally include them in prompts.
state = {
"messages": [...],
"debug_dump": huge_json_blob,
}
Keep model-facing state separate from internal runtime state.
How to Debug It
- •
Print token estimates before every LLM call
Add logging around each node so you can see which step crosses the threshold.from langchain_core.messages import get_buffer_string def log_state(state): text = get_buffer_string(state["messages"]) print(f"chars={len(text)}") - •
Identify the exact node that fails
In LangGraph traces, look for the last node executed before the error. The failing node is usually wherestate["messages"]becomes too large. - •
Inspect message growth per turn
Log message count and approximate size after each edge transition.print(len(state["messages"])) print(type(state["messages"][-1]).__name__) - •
Reduce one source at a time
Temporarily disable tools, retrieval, or recursion. If the error disappears when tools are off, your tool output is too large. If it disappears when loops are removed, your state accumulation is the issue.
Prevention
- •
Use bounded memory from day one:
- •sliding window of recent messages
- •summaries for older turns
- •external storage for raw transcripts
- •
Keep tool outputs compact:
- •return IDs, snippets, or structured summaries
- •never dump full documents into chat state
- •
Put token checks in CI:
- •run long conversation tests
- •fail builds when prompts exceed a safe threshold
The practical fix is simple: stop feeding unbounded history back into the model. In LangGraph Python apps, token limit errors almost always mean your graph state needs trimming, summarization, or both.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit