How to Fix 'context length exceeded during development' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-21

context-length-exceeded-during-developmentlanggraphpython

Opening

context length exceeded during development usually means your LangGraph agent is sending more tokens to the model than the model’s context window allows. In practice, this shows up after a few graph cycles, tool calls, or when you keep appending full message history without trimming.

The failure often appears as a provider error wrapped by LangGraph execution, something like:

openai.BadRequestError: Error code: 400 - {'error': {'message': 'This model's maximum context length is 128000 tokens...'}}

Or in Anthropic-style runs:

anthropic.BadRequestError: prompt is too long

The Most Common Cause

The #1 cause is unbounded message accumulation in graph state. People keep appending every HumanMessage, AIMessage, and tool result into messages, then pass the entire list back into the model on every node execution.

That works for a few turns. Then the graph loops, the state grows, and the next LLM call blows up.

Broken pattern vs fixed pattern

Broken	Fixed
Appends full history forever	Trims or summarizes before each model call
Passes raw `state["messages"]` directly	Uses a bounded slice or reducer
Lets tool outputs stay in context forever	Stores large outputs outside the prompt

# BROKEN
from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage, AIMessage

def agent_node(state):
    messages = state["messages"]
    response = llm.invoke(messages)
    return {"messages": messages + [response]}

def user_node(state):
    return {"messages": state["messages"] + [HumanMessage(content="Another request")]}

# Every loop keeps growing state["messages"]

# FIXED
from langchain_core.messages import HumanMessage
from langgraph.graph import StateGraph, END

MAX_MESSAGES = 8

def agent_node(state):
    messages = state["messages"][-MAX_MESSAGES:]
    response = llm.invoke(messages)
    return {"messages": state["messages"] + [response]}

def user_node(state):
    return {"messages": state["messages"] + [HumanMessage(content="Another request")]}

# Better: summarize older messages or persist them outside graph state.

If you are using MessagesState, the same problem applies. The graph does not magically compress history for you. You have to enforce a limit.

Other Possible Causes

1) Tool output is too large

A common LangGraph mistake is injecting huge tool payloads directly into the conversation.

# BAD: dumping a 50k-line JSON blob into chat history
tool_result = fetch_claims_data()
return {"messages": state["messages"] + [AIMessage(content=str(tool_result))]}

Fix it by storing the raw payload elsewhere and only passing a compact summary to the model.

summary = {
    "claim_count": len(tool_result),
    "status": "ok",
    "top_issues": tool_result[:5],
}
return {"messages": state["messages"] + [AIMessage(content=str(summary))]}

2) Recursive graph loops without stop conditions

If your conditional edges keep routing back into the agent node, token usage grows every iteration.

workflow.add_conditional_edges(
    "agent",
    route_fn,
    {
        "tool": "tools",
        "agent": "agent",   # can loop forever if route_fn never stops
        END: END,
    },
)

Add explicit termination logic based on turn count, token budget, or task completion flags in state.

3) Retrieving too much from vector search

RAG setups often pull back too many chunks, each with long text. That gets pasted into the prompt and pushes you over the limit.

retriever = vectorstore.as_retriever(search_kwargs={"k": 20})  # often too high
docs = retriever.invoke(query)

Reduce k, chunk size, or compress retrieved content before sending it to the LLM.

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
docs = retriever.invoke(query)
context = "\n\n".join(doc.page_content[:1500] for doc in docs)

4) System prompt bloat

Teams sometimes put policy docs, runbooks, examples, and JSON schemas all into one system message. That eats context before the user even says anything.

SYSTEM_PROMPT = """
You are an insurance assistant.
[200 lines of policy text]
[30 lines of examples]
[full OpenAPI schema]
"""

Move static reference material out of the prompt and only include what is needed for that run.

How to Debug It

•
Log token growth per node
- •Print message count and approximate token usage before every llm.invoke().
- •If it climbs on each loop, your state is unbounded.
•
Inspect what each node returns
- •Check whether a node returns full history instead of just deltas.
- •In LangGraph, returning {"messages": messages + [...]} repeatedly is usually where it starts.
•
Check tool payload size
- •Log len(str(tool_result)) or serialized JSON size.
- •If one tool call returns megabytes of text, that’s your issue.
•
Trace routing paths
- •Confirm your conditional edges eventually hit END.
- •A graph that revisits agent -> tools -> agent -> tools without pruning will fail even with moderate prompts.

A practical trick: dump the exact final prompt right before the model call. If you can’t read it comfortably in a terminal window, it’s probably too large for production use too.

Prevention

•Keep a hard cap on message history per node.
•Summarize old turns after N interactions instead of appending forever.
•Store raw tool outputs and documents outside chat state; pass only compact excerpts into prompts.
•Add token-budget checks in middleware or wrapper functions before every LLM call.
•Test long-running graph sessions locally with realistic tool outputs, not just one-turn demos.

If you’re building LangGraph agents for production systems like claims handling or underwriting support, treat context as a budgeted resource. Most “context length exceeded” failures are not model problems — they’re state management problems inside your graph.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit