How to Fix 'context length exceeded in production' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceeded-in-productionlanggraphpython

Opening

context length exceeded in production usually means your LangGraph state is growing faster than your model’s context window can handle. In practice, this shows up after a few turns, after repeated tool calls, or when you keep appending full message history into every node.

The failure often appears as a model error like:

BadRequestError: Error code: 400 - {'error': {'message': 'This model\'s maximum context length is 128000 tokens. However, your messages resulted in 131204 tokens.'}}

In LangGraph, the root cause is usually not the graph itself. It’s how you manage messages in state.

The Most Common Cause

The #1 cause is blindly appending the full conversation history on every node execution. If each node returns the entire messages list instead of only the delta, your state balloons quickly.

Here’s the broken pattern versus the fixed pattern.

Broken patternFixed pattern
Every node re-emits the whole message historyNodes return only new messages
State grows on each stepState stays bounded
Easy to miss in testingFails in production after enough turns
# BROKEN: keeps duplicating the full message list
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langchain_core.messages import HumanMessage, AIMessage

class State(TypedDict):
    messages: Annotated[list, add_messages]

def assistant_node(state: State):
    # Imagine this comes from your LLM call
    response = AIMessage(content="Here is the answer.")

    # BAD: returning all previous messages + new response
    return {
        "messages": state["messages"] + [response]
    }

graph = StateGraph(State)
graph.add_node("assistant", assistant_node)
graph.set_entry_point("assistant")
graph.add_edge("assistant", END)
app = graph.compile()
# FIXED: return only the new message; add_messages handles merge semantics
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langchain_core.messages import AIMessage

class State(TypedDict):
    messages: Annotated[list, add_messages]

def assistant_node(state: State):
    response = AIMessage(content="Here is the answer.")

    # GOOD: only return the delta
    return {
        "messages": [response]
    }

graph = StateGraph(State)
graph.add_node("assistant", assistant_node)
graph.set_entry_point("assistant")
graph.add_edge("assistant", END)
app = graph.compile()

If you are using add_messages, let it do its job. Don’t manually concatenate state["messages"] unless you have a very specific reason and understand the growth pattern.

Other Possible Causes

1) Tool outputs are too large

A common LangGraph failure mode is passing raw tool output back into state. If a search tool returns thousands of characters or a PDF parser dumps full documents into messages, you will hit token limits fast.

# BAD: dumping raw tool output into conversation state
tool_result = get_customer_policy_document(customer_id)
return {"messages": [AIMessage(content=tool_result)]}

Fix it by summarizing or extracting only what matters.

# GOOD: store compact result in state or summarize first
summary = summarize_policy(tool_result)
return {"messages": [AIMessage(content=summary)]}

2) You’re replaying full chat history at every node

Some teams build nodes that reconstruct prompts from scratch and include every prior turn plus system instructions plus tool traces. That works for short sessions and then falls over.

prompt = "\n".join([m.content for m in state["messages"]])

Use a bounded slice or a memory policy.

recent_messages = state["messages"][-8:]
prompt = "\n".join([m.content for m in recent_messages])

3) Recursive loops without an exit condition

If your graph cycles between nodes and each pass adds more text, token usage climbs until the model rejects the request. This often happens with conditional edges that never resolve.

def should_continue(state):
    return "continue"  # bad if this never changes

graph.add_conditional_edges("agent", should_continue, {
    "continue": "agent",
    "stop": END,
})

Make sure loop conditions depend on actual progress markers like max iterations or task completion flags.

def should_continue(state):
    if state.get("iterations", 0) >= 3:
        return "stop"
    return "continue"

4) You are storing non-message payloads inside messages

I’ve seen teams stuff JSON blobs, SQL results, OCR text, and debugging metadata into messages. That inflates context and also confuses downstream prompts.

# BAD: using messages as a dumping ground
return {
    "messages": [
        AIMessage(content=f"debug={debug_blob}\nresult={huge_json}")
    ]
}

Keep structured data in separate state fields.

class State(TypedDict):
    messages: Annotated[list, add_messages]
    debug_info: dict
    retrieved_docs: list[str]

How to Debug It

  1. Print token growth per step
    • Log message count and approximate character size at each node.
    • If one node causes a big jump, that’s your culprit.
def log_state(state):
    total_chars = sum(len(getattr(m, "content", "")) for m in state["messages"])
    print(f"messages={len(state['messages'])}, chars={total_chars}")
  1. Inspect which node adds the largest payload

    • Compare input vs output state for each node.
    • Focus on tool nodes and aggregation nodes first.
  2. Check whether you are returning deltas or full history

    • With add_messages, return [new_message], not state["messages"] + [new_message].
    • This is the most common fix.
  3. Run with a smaller model context limit

    • Temporarily switch to a smaller-context model to reproduce earlier.
    • If it fails sooner, your growth problem is confirmed.

Prevention

  • Keep messages small and treat it as conversational memory, not an application dump.
  • Put large artifacts in separate state fields or external storage; pass summaries into the LLM.
  • Add hard caps:
    • max turns per session
    • max tool iterations
    • max retained messages

If you want a simple rule: every LangGraph node should return the smallest useful delta. Once you stop feeding full history back into every step, this error usually disappears fast.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides