How to Fix 'context length exceeded in production' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceeded-in-productionlanggraphtypescript

What the error means

context length exceeded in production usually means your graph is sending too much conversation history or tool output into the LLM. In LangGraph, this shows up when a node keeps appending messages to state without trimming, summarizing, or selecting only the latest relevant context.

The failure often appears after a few turns in production, not in local tests. That’s because real users generate longer threads, more tool calls, and bigger message payloads than your happy-path demo.

The Most Common Cause

The #1 cause is storing the full messages array in graph state and passing it back to the model on every node execution.

In LangGraph TypeScript, this pattern is common when you use Annotation.Root, MessagesAnnotation, or a custom state shape, then keep doing state.messages.concat(newMessage) forever. Eventually you hit OpenAI’s or Anthropic’s context window limit, and the runtime throws something like:

  • BadRequestError: 400 This model's maximum context length is ...
  • context_length_exceeded
  • Request too large for model
  • invalid_request_error: prompt too long

Broken vs fixed pattern

BrokenFixed
Keep every message foreverTrim to last N messages or summarize older turns
Send raw tool outputs back into historyStore tool results separately or compress them
Rebuild prompt from full state on every nodeBuild prompt from a bounded slice of state
// ❌ Broken: unbounded message growth
import { StateGraph, Annotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

const State = Annotation.Root({
  messages: Annotation<any[]>({
    default: () => [],
    reducer: (left, right) => left.concat(right),
  }),
});

const model = new ChatOpenAI({ model: "gpt-4o-mini" });

async function assistantNode(state: typeof State.State) {
  const response = await model.invoke(state.messages); // keeps growing
  return { messages: [response] };
}
// ✅ Fixed: bounded context
import { StateGraph, Annotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { trimMessages } from "@langchain/core/messages";

const State = Annotation.Root({
  messages: Annotation<any[]>({
    default: () => [],
    reducer: (left, right) => left.concat(right),
  }),
});

const model = new ChatOpenAI({ model: "gpt-4o-mini" });

async function assistantNode(state: typeof State.State) {
  const recentMessages = trimMessages(state.messages, {
    maxTokens: 6000,
    strategy: "last",
    tokenCounter: (msgs) => msgs.length * 250,
  });

  const response = await model.invoke(recentMessages);
  return { messages: [response] };
}

If you are using tool calling, this gets worse fast. Tool outputs can be huge JSON blobs, and if you append them as regular messages they count toward every future request.

Other Possible Causes

1. Tool output is too large

A single tool call returning a full database row set or document payload can blow up the prompt.

// Bad
return {
  messages: [
    {
      role: "tool",
      content: JSON.stringify(hugeResult),
      name: "search_customer_records",
    },
  ],
};

Fix it by storing the raw result outside the chat history and returning only a summary.

// Better
return {
  toolResultId,
  messages: [
    {
      role: "tool",
      content: `Found ${hugeResult.length} records. Summary stored in toolResultId=${toolResultId}`,
      name: "search_customer_records",
    },
  ],
};

2. You are duplicating messages across nodes

This happens when multiple nodes append the same assistant or tool message into shared state.

// Bad
return { messages: [...state.messages, aiMessage] };

If your reducer already concatenates arrays, this duplicates history. Return only the delta.

// Good
return { messages: [aiMessage] };

3. System prompt + retrieved context is too large

Long policy prompts plus RAG chunks can exceed limits even with short chats.

const prompt = [
  { role: "system", content: bigPolicyText },
  ...retrievedChunks.map((c) => ({ role: "system", content: c.text })),
  ...state.messages,
];

Reduce retrieved chunks and cap prompt size per turn.

const topChunks = retrievedChunks.slice(0, 3);

4. Memory checkpointing rehydrates too much state

If you use a checkpointer and persist all historical messages per thread, each run reloads the entire conversation.

const app = graph.compile({ checkpointer });

That is fine only if you also trim state before model calls. Persistence is not the problem; unbounded replay is.

How to Debug It

  1. Log token estimates before every model call
    Print message count and approximate tokens in each node.

    console.log({
      node: "assistantNode",
      messageCount: state.messages.length,
    });
    
  2. Inspect which payload exploded
    Check whether it was user chat history, tool output, system prompt, or retrieved documents.

  3. Reproduce with one thread ID
    Run the same production thread locally with checkpoints loaded. If it fails after N turns, you have a growth issue.

  4. Binary search your graph nodes
    Temporarily bypass retrieval/tool nodes and see when the error disappears.

    • If removing retrieval fixes it, your RAG context is too large.
    • If removing tools fixes it, your tool outputs are too large.
    • If removing both does nothing, your base conversation history is unbounded.

Prevention

  • Trim aggressively before every LLM call.
    • Keep last few user turns.
    • Summarize older conversation into a compact memory field.
  • Never store raw large objects in chat messages.
    • Put SQL rows, PDFs, API responses, and logs in separate storage.
  • Add hard limits in graph nodes.
    • Cap retrieved chunks.
    • Cap tool output size.
    • Cap total message count per thread.

If you want one rule to remember: LangGraph state can grow forever unless you explicitly stop it. Production failures usually come from that exact mistake.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides