How to Fix 'token limit exceeded during development' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceeded-during-developmentlanggraphpython

What the error means

token limit exceeded during development in LangGraph usually means your graph is sending too much conversation state to the model. In practice, this happens when you keep appending messages to state["messages"] across multiple nodes or turns without trimming, summarizing, or selecting only the relevant context.

You’ll see it most often in agent loops, multi-node workflows, and long-running chats where every node blindly forwards the full message history to ChatOpenAI, ChatAnthropic, or another chat model.

The Most Common Cause

The #1 cause is uncontrolled message accumulation in graph state.

In LangGraph, MessagesState is convenient, but it also makes it easy to keep passing the entire transcript back into the LLM on every step. If your node returns {"messages": [response]} and your reducer appends to the list, the prompt grows until you hit the model’s context window.

Broken vs fixed pattern

Broken patternFixed pattern
Appends every turn foreverTrims or summarizes before model call
Passes full state into every nodePasses only needed messages
No token budget checkEnforces a max history window
# BROKEN
from langgraph.graph import StateGraph, MessagesState
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

def assistant_node(state: MessagesState):
    # state["messages"] keeps growing forever
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

graph = StateGraph(MessagesState)
graph.add_node("assistant", assistant_node)
# FIXED
from langgraph.graph import StateGraph, MessagesState
from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

def trim_messages(messages, max_messages=12):
    return messages[-max_messages:]

def assistant_node(state: MessagesState):
    messages = trim_messages(state["messages"], max_messages=12)
    response = llm.invoke(
        [SystemMessage(content="You are a concise assistant."), *messages]
    )
    return {"messages": [response]}

graph = StateGraph(MessagesState)
graph.add_node("assistant", assistant_node)

If you’re using a looping agent, this matters even more. A tool call plus observation plus retry can add several messages per iteration, so a “small” loop becomes a large prompt very quickly.

Other Possible Causes

1) Tool outputs are too large

A common mistake is storing raw tool payloads in messages. If your tool returns a huge JSON blob, HTML page, or search dump, that content gets fed back into the next LLM call.

# BAD: returning full payload
def search_tool(query: str):
    result = expensive_search(query)
    return result  # could be thousands of tokens

# BETTER: return a compact summary
def search_tool(query: str):
    result = expensive_search(query)
    return {
        "top_hits": result[:3],
        "summary": summarize_result(result),
    }

2) You’re not separating working state from prompt state

LangGraph state can hold everything your app needs, but not everything should go into the model prompt. Keeping raw documents, traces, and debug data in messages is a fast path to token overflow.

# BAD
class State(TypedDict):
    messages: list
    raw_docs: list  # later accidentally injected into prompt

# BETTER
class State(TypedDict):
    messages: list
    raw_docs: list      # keep for app logic only
    doc_summary: str    # what the model actually sees

3) Recursive graph loops never terminate early

If your conditional edge keeps routing back to an agent node without a stop condition, each pass adds more context. This is common with ReAct-style graphs and review loops.

# BAD: no practical stop condition
def route(state):
    return "agent"

# BETTER: stop on iteration count or confidence threshold
def route(state):
    if state["iterations"] >= 5:
        return "final"
    return "agent"

4) Memory checkpointing is storing too much per run

If you use MemorySaver or another checkpointer and restore full history each time, your development sessions can balloon. The issue may look like one request failing, but the root cause is accumulated state across runs.

from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()
# Fine for dev, but still trim state before model calls.

How to Debug It

  1. Print message counts before every LLM call
    Check how many messages each node sends. If the number keeps increasing without bound, you found the problem.

    def assistant_node(state):
        print("message_count=", len(state["messages"]))
        return {"messages": [llm.invoke(state["messages"])]}
    
  2. Log token estimates for each prompt
    Use a tokenizer or approximate counter on state["messages"]. If one tool output dominates the prompt, trim that first.

  3. Inspect which node adds most context
    Add logs around each node return value. In LangGraph terms, look for nodes that repeatedly append to MessagesState instead of replacing or summarizing content.

  4. Reproduce with one turn and then two turns
    If one turn works and two turns fail with something like BadRequestError: context_length_exceeded, your issue is almost always accumulation between steps rather than a single oversized prompt.

Prevention

  • Trim message history before every model invocation.

  • Keep raw tool outputs out of messages; store them in separate state fields.

  • Add hard limits:

    • max iterations per loop
    • max messages per thread
    • max characters/tokens per tool response
  • Prefer summary memory over full transcript memory for long-lived agents.

  • Test with worst-case inputs early:

    • long user prompts
    • large documents
    • repeated tool calls

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides