How to Fix 'token limit exceeded' in LangGraph (Python)
What the Error Means
token limit exceeded in LangGraph usually means your graph is sending too much conversation history or tool output into the model context. The failure often shows up after a few agent turns, when a loop keeps appending state and every node re-sends the full message list.
In practice, this is almost always a state-management problem, not an LLM problem. LangGraph is doing exactly what you told it to do: carry forward more tokens than the model can accept.
The Most Common Cause
The #1 cause is unbounded message accumulation in graph state. You keep appending messages on every turn, then pass the entire messages list back into the model node.
Here’s the broken pattern versus the fixed pattern.
| Broken | Fixed |
|---|---|
| Appends every message forever | Trims or summarizes state before model call |
| Re-sends full history each node run | Keeps only recent context |
Eventually triggers InvalidRequestError / BadRequestError from the provider | Stays under model context window |
# BROKEN
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
class State(TypedDict):
messages: Annotated[list, operator.add]
def chatbot_node(state: State):
# Every call sends the entire accumulated history
response = llm.invoke(state["messages"])
return {"messages": [response]}
graph = StateGraph(State)
graph.add_node("chatbot", chatbot_node)
graph.set_entry_point("chatbot")
graph.add_edge("chatbot", END)
app = graph.compile()
# FIXED
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
from langchain_core.messages import trim_messages
class State(TypedDict):
messages: Annotated[list, operator.add]
def chatbot_node(state: State):
trimmed = trim_messages(
state["messages"],
max_tokens=3000,
strategy="last",
token_counter=llm,
)
response = llm.invoke(trimmed)
return {"messages": [response]}
graph = StateGraph(State)
graph.add_node("chatbot", chatbot_node)
graph.set_entry_point("chatbot")
graph.add_edge("chatbot", END)
app = graph.compile()
If you are using MessagesState, the same issue still applies. MessagesState helps with message handling, but it does not magically cap token growth.
Other Possible Causes
1. Tool output is too large
A single tool result can blow up your prompt faster than chat history. This happens with search results, PDFs, JSON blobs, or database dumps.
# Bad: returning raw payload into state
return {"messages": [ToolMessage(content=str(huge_json), tool_call_id=tool_call_id)]}
# Better: summarize or truncate before storing
summary = summarize_json(huge_json)
return {"messages": [ToolMessage(content=summary[:2000], tool_call_id=tool_call_id)]}
2. You are storing full documents in state
LangGraph state is not a document warehouse. If you stuff entire chunks into state["documents"], they may get passed into prompts repeatedly.
# Bad
state["documents"] = retrieved_docs # full text for every doc
# Better
state["documents"] = [
{"id": d.metadata["id"], "snippet": d.page_content[:500]}
for d in retrieved_docs
]
3. Your loop never terminates
An agent loop that keeps calling tools can grow context until the provider rejects it with something like:
- •
openai.BadRequestError: Error code: 400 - {'error': {'message': 'This model's maximum context length is ...'}} - •
google.api_core.exceptions.InvalidArgument: 400 Request payload size exceeds... - •Anthropic-style context overflow errors depending on backend
# Bad: no real stop condition
while True:
state = app.invoke(state)
# Better: explicit guardrail
if state["iterations"] >= 5:
return {"next": "end"}
4. Your prompt template includes too much static text
Sometimes the issue is not dynamic history but a huge system prompt, policy blob, or concatenated reference text.
system_prompt = open("all_policies.txt").read() # too large
# Better: load only what matters for this task
system_prompt = """
You are a claims assistant.
Use only approved policy excerpts.
"""
How to Debug It
- •
Check where tokens are growing
- •Log
len(state["messages"])at each node. - •Inspect whether tool outputs or retrieved docs are being appended repeatedly.
- •Log
- •
Print the exact payload sent to the model
- •Before
llm.invoke(...), dump message roles and approximate sizes. - •If one tool message is huge, that is your culprit.
- •Before
- •
Look at the provider exception
- •OpenAI usually says
BadRequestErrorwith context length details. - •Anthropic and Gemini variants often mention request size or maximum tokens.
- •If LangGraph is just the wrapper, the real error comes from the model API.
- •OpenAI usually says
- •
Add a hard cap
- •Temporarily trim to last 5 messages.
- •If the error disappears, you have a growth problem in state management.
from langchain_core.messages import trim_messages
def debug_node(state):
print("message_count =", len(state["messages"]))
trimmed = trim_messages(
state["messages"],
max_tokens=2000,
strategy="last",
token_counter=llm,
)
print("trimmed_count =", len(trimmed))
return llm.invoke(trimmed)
Prevention
- •Keep only short-term conversational context in graph state.
- •Store long artifacts outside LangGraph state in object storage, a database, or vector store references.
- •Put token trimming at every model boundary, not just once at startup.
- •Add iteration limits and tool-output size limits to every agent loop.
If you want a clean production rule: never let raw accumulated state hit an LLM call without trimming first. That one change prevents most token limit exceeded failures in LangGraph Python apps.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit