How to Fix 'token limit exceeded' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceededlanggraphpython

What the Error Means

token limit exceeded in LangGraph usually means your graph is sending too much conversation history or tool output into the model context. The failure often shows up after a few agent turns, when a loop keeps appending state and every node re-sends the full message list.

In practice, this is almost always a state-management problem, not an LLM problem. LangGraph is doing exactly what you told it to do: carry forward more tokens than the model can accept.

The Most Common Cause

The #1 cause is unbounded message accumulation in graph state. You keep appending messages on every turn, then pass the entire messages list back into the model node.

Here’s the broken pattern versus the fixed pattern.

BrokenFixed
Appends every message foreverTrims or summarizes state before model call
Re-sends full history each node runKeeps only recent context
Eventually triggers InvalidRequestError / BadRequestError from the providerStays under model context window
# BROKEN
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator

class State(TypedDict):
    messages: Annotated[list, operator.add]

def chatbot_node(state: State):
    # Every call sends the entire accumulated history
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

graph = StateGraph(State)
graph.add_node("chatbot", chatbot_node)
graph.set_entry_point("chatbot")
graph.add_edge("chatbot", END)
app = graph.compile()
# FIXED
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
from langchain_core.messages import trim_messages

class State(TypedDict):
    messages: Annotated[list, operator.add]

def chatbot_node(state: State):
    trimmed = trim_messages(
        state["messages"],
        max_tokens=3000,
        strategy="last",
        token_counter=llm,
    )
    response = llm.invoke(trimmed)
    return {"messages": [response]}

graph = StateGraph(State)
graph.add_node("chatbot", chatbot_node)
graph.set_entry_point("chatbot")
graph.add_edge("chatbot", END)
app = graph.compile()

If you are using MessagesState, the same issue still applies. MessagesState helps with message handling, but it does not magically cap token growth.

Other Possible Causes

1. Tool output is too large

A single tool result can blow up your prompt faster than chat history. This happens with search results, PDFs, JSON blobs, or database dumps.

# Bad: returning raw payload into state
return {"messages": [ToolMessage(content=str(huge_json), tool_call_id=tool_call_id)]}

# Better: summarize or truncate before storing
summary = summarize_json(huge_json)
return {"messages": [ToolMessage(content=summary[:2000], tool_call_id=tool_call_id)]}

2. You are storing full documents in state

LangGraph state is not a document warehouse. If you stuff entire chunks into state["documents"], they may get passed into prompts repeatedly.

# Bad
state["documents"] = retrieved_docs  # full text for every doc

# Better
state["documents"] = [
    {"id": d.metadata["id"], "snippet": d.page_content[:500]}
    for d in retrieved_docs
]

3. Your loop never terminates

An agent loop that keeps calling tools can grow context until the provider rejects it with something like:

  • openai.BadRequestError: Error code: 400 - {'error': {'message': 'This model's maximum context length is ...'}}
  • google.api_core.exceptions.InvalidArgument: 400 Request payload size exceeds...
  • Anthropic-style context overflow errors depending on backend
# Bad: no real stop condition
while True:
    state = app.invoke(state)

# Better: explicit guardrail
if state["iterations"] >= 5:
    return {"next": "end"}

4. Your prompt template includes too much static text

Sometimes the issue is not dynamic history but a huge system prompt, policy blob, or concatenated reference text.

system_prompt = open("all_policies.txt").read()  # too large

# Better: load only what matters for this task
system_prompt = """
You are a claims assistant.
Use only approved policy excerpts.
"""

How to Debug It

  1. Check where tokens are growing

    • Log len(state["messages"]) at each node.
    • Inspect whether tool outputs or retrieved docs are being appended repeatedly.
  2. Print the exact payload sent to the model

    • Before llm.invoke(...), dump message roles and approximate sizes.
    • If one tool message is huge, that is your culprit.
  3. Look at the provider exception

    • OpenAI usually says BadRequestError with context length details.
    • Anthropic and Gemini variants often mention request size or maximum tokens.
    • If LangGraph is just the wrapper, the real error comes from the model API.
  4. Add a hard cap

    • Temporarily trim to last 5 messages.
    • If the error disappears, you have a growth problem in state management.
from langchain_core.messages import trim_messages

def debug_node(state):
    print("message_count =", len(state["messages"]))
    trimmed = trim_messages(
        state["messages"],
        max_tokens=2000,
        strategy="last",
        token_counter=llm,
    )
    print("trimmed_count =", len(trimmed))
    return llm.invoke(trimmed)

Prevention

  • Keep only short-term conversational context in graph state.
  • Store long artifacts outside LangGraph state in object storage, a database, or vector store references.
  • Put token trimming at every model boundary, not just once at startup.
  • Add iteration limits and tool-output size limits to every agent loop.

If you want a clean production rule: never let raw accumulated state hit an LLM call without trimming first. That one change prevents most token limit exceeded failures in LangGraph Python apps.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides