How to Fix 'token limit exceeded during development' in LangGraph (Python)
What the error means
token limit exceeded during development in LangGraph usually means your graph is sending too much conversation state to the model. In practice, this happens when you keep appending messages to state["messages"] across multiple nodes or turns without trimming, summarizing, or selecting only the relevant context.
You’ll see it most often in agent loops, multi-node workflows, and long-running chats where every node blindly forwards the full message history to ChatOpenAI, ChatAnthropic, or another chat model.
The Most Common Cause
The #1 cause is uncontrolled message accumulation in graph state.
In LangGraph, MessagesState is convenient, but it also makes it easy to keep passing the entire transcript back into the LLM on every step. If your node returns {"messages": [response]} and your reducer appends to the list, the prompt grows until you hit the model’s context window.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Appends every turn forever | Trims or summarizes before model call |
| Passes full state into every node | Passes only needed messages |
| No token budget check | Enforces a max history window |
# BROKEN
from langgraph.graph import StateGraph, MessagesState
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
def assistant_node(state: MessagesState):
# state["messages"] keeps growing forever
response = llm.invoke(state["messages"])
return {"messages": [response]}
graph = StateGraph(MessagesState)
graph.add_node("assistant", assistant_node)
# FIXED
from langgraph.graph import StateGraph, MessagesState
from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
def trim_messages(messages, max_messages=12):
return messages[-max_messages:]
def assistant_node(state: MessagesState):
messages = trim_messages(state["messages"], max_messages=12)
response = llm.invoke(
[SystemMessage(content="You are a concise assistant."), *messages]
)
return {"messages": [response]}
graph = StateGraph(MessagesState)
graph.add_node("assistant", assistant_node)
If you’re using a looping agent, this matters even more. A tool call plus observation plus retry can add several messages per iteration, so a “small” loop becomes a large prompt very quickly.
Other Possible Causes
1) Tool outputs are too large
A common mistake is storing raw tool payloads in messages. If your tool returns a huge JSON blob, HTML page, or search dump, that content gets fed back into the next LLM call.
# BAD: returning full payload
def search_tool(query: str):
result = expensive_search(query)
return result # could be thousands of tokens
# BETTER: return a compact summary
def search_tool(query: str):
result = expensive_search(query)
return {
"top_hits": result[:3],
"summary": summarize_result(result),
}
2) You’re not separating working state from prompt state
LangGraph state can hold everything your app needs, but not everything should go into the model prompt. Keeping raw documents, traces, and debug data in messages is a fast path to token overflow.
# BAD
class State(TypedDict):
messages: list
raw_docs: list # later accidentally injected into prompt
# BETTER
class State(TypedDict):
messages: list
raw_docs: list # keep for app logic only
doc_summary: str # what the model actually sees
3) Recursive graph loops never terminate early
If your conditional edge keeps routing back to an agent node without a stop condition, each pass adds more context. This is common with ReAct-style graphs and review loops.
# BAD: no practical stop condition
def route(state):
return "agent"
# BETTER: stop on iteration count or confidence threshold
def route(state):
if state["iterations"] >= 5:
return "final"
return "agent"
4) Memory checkpointing is storing too much per run
If you use MemorySaver or another checkpointer and restore full history each time, your development sessions can balloon. The issue may look like one request failing, but the root cause is accumulated state across runs.
from langgraph.checkpoint.memory import MemorySaver
checkpointer = MemorySaver()
# Fine for dev, but still trim state before model calls.
How to Debug It
- •
Print message counts before every LLM call
Check how many messages each node sends. If the number keeps increasing without bound, you found the problem.def assistant_node(state): print("message_count=", len(state["messages"])) return {"messages": [llm.invoke(state["messages"])]} - •
Log token estimates for each prompt
Use a tokenizer or approximate counter onstate["messages"]. If one tool output dominates the prompt, trim that first. - •
Inspect which node adds most context
Add logs around each node return value. In LangGraph terms, look for nodes that repeatedly append toMessagesStateinstead of replacing or summarizing content. - •
Reproduce with one turn and then two turns
If one turn works and two turns fail with something likeBadRequestError: context_length_exceeded, your issue is almost always accumulation between steps rather than a single oversized prompt.
Prevention
- •
Trim message history before every model invocation.
- •
Keep raw tool outputs out of
messages; store them in separate state fields. - •
Add hard limits:
- •max iterations per loop
- •max messages per thread
- •max characters/tokens per tool response
- •
Prefer summary memory over full transcript memory for long-lived agents.
- •
Test with worst-case inputs early:
- •long user prompts
- •large documents
- •repeated tool calls
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit