How to Fix 'token limit exceeded when scaling' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceeded-when-scalinglanggraphpython

When you see token limit exceeded when scaling in LangGraph, it usually means your graph state is growing faster than your model context window. The failure typically shows up after several agent turns, tool calls, or recursive graph steps when you keep appending full messages into state without trimming.

In practice, this is almost always a state management problem, not a LangGraph bug. The graph is doing exactly what you told it to do: carry forward too much text until the next LLM call blows past the token budget.

The Most Common Cause

The #1 cause is storing the entire conversation history in MessagesState and never pruning it before the next node runs.

Here’s the broken pattern:

BrokenFixed
Appends every message foreverKeeps only the last N messages or a summarized state
Reuses full history for every LLM callPasses a trimmed message window to the model
Token count grows on every loopToken count stays bounded
# BROKEN
from langgraph.graph import StateGraph, MessagesState, START, END
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

def assistant_node(state: MessagesState):
    # state["messages"] keeps growing forever
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

graph = StateGraph(MessagesState)
graph.add_node("assistant", assistant_node)
graph.add_edge(START, "assistant")
graph.add_edge("assistant", END)
app = graph.compile()
# FIXED
from langgraph.graph import StateGraph, MessagesState, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini")

def trim_messages(messages, keep_last=8):
    # Keep the system message plus a small sliding window
    system = [m for m in messages if isinstance(m, SystemMessage)]
    non_system = [m for m in messages if not isinstance(m, SystemMessage)]
    return system + non_system[-keep_last:]

def assistant_node(state: MessagesState):
    trimmed = trim_messages(state["messages"], keep_last=8)
    response = llm.invoke(trimmed)
    return {"messages": [response]}

graph = StateGraph(MessagesState)
graph.add_node("assistant", assistant_node)
graph.add_edge(START, "assistant")
graph.add_edge("assistant", END)
app = graph.compile()

If you are using a looping agent graph, this matters even more. Each iteration re-injects prior state into the prompt unless you explicitly reduce it.

A better production pattern is to store:

  • recent messages in messages
  • long-term facts in a separate summary field
  • raw transcripts outside the prompt path

Other Possible Causes

1) Tool outputs are too large

If a tool returns entire documents, logs, HTML pages, or database dumps, those outputs get added to state and explode token usage.

# BAD: returning huge raw output
def search_tool(query: str):
    return open("full_report.txt").read()

Fix it by truncating or summarizing before returning:

# GOOD: return compact result
def search_tool(query: str):
    text = open("full_report.txt").read()
    return text[:2000]  # or summarize first

2) Recursive edges create repeated context accumulation

A common LangGraph pattern is an agent node that routes back to itself or another node. If each pass appends new messages without cleaning old ones, tokens compound fast.

# Example of a loop that can grow state indefinitely
graph.add_conditional_edges("agent", route_fn, {
    "tools": "tools",
    "done": END,
})
graph.add_edge("tools", "agent")

If tools returns verbose content every time, your prompt size increases on every cycle. Add a reducer or summarize tool results before they go back into messages.

3) You are passing full retrieved documents into the prompt

Retrieval-Augmented Generation can trigger this error if you stuff top-k chunks directly into the user prompt.

context = "\n\n".join([doc.page_content for doc in docs])
prompt = f"Answer using this context:\n{context}\n\nQuestion: {question}"

Use smaller chunks and cap the total context size:

context = "\n\n".join([doc.page_content[:500] for doc in docs[:3]])

4) Checkpointer/state includes fields that should not be sent to the model

Sometimes developers keep debug traces, tool metadata, embeddings payloads, or JSON blobs inside graph state and accidentally include them in prompts.

state = {
    "messages": [...],
    "debug_dump": huge_json_blob,
}

Keep model-facing state separate from internal runtime state.

How to Debug It

  1. Print token estimates before every LLM call
    Add logging around each node so you can see which step crosses the threshold.

    from langchain_core.messages import get_buffer_string
    
    def log_state(state):
        text = get_buffer_string(state["messages"])
        print(f"chars={len(text)}")
    
  2. Identify the exact node that fails
    In LangGraph traces, look for the last node executed before the error. The failing node is usually where state["messages"] becomes too large.

  3. Inspect message growth per turn
    Log message count and approximate size after each edge transition.

    print(len(state["messages"]))
    print(type(state["messages"][-1]).__name__)
    
  4. Reduce one source at a time
    Temporarily disable tools, retrieval, or recursion. If the error disappears when tools are off, your tool output is too large. If it disappears when loops are removed, your state accumulation is the issue.

Prevention

  • Use bounded memory from day one:

    • sliding window of recent messages
    • summaries for older turns
    • external storage for raw transcripts
  • Keep tool outputs compact:

    • return IDs, snippets, or structured summaries
    • never dump full documents into chat state
  • Put token checks in CI:

    • run long conversation tests
    • fail builds when prompts exceed a safe threshold

The practical fix is simple: stop feeding unbounded history back into the model. In LangGraph Python apps, token limit errors almost always mean your graph state needs trimming, summarization, or both.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides