How to Fix 'context length exceeded during development' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceeded-during-developmentlanggraphtypescript

If you’re seeing context length exceeded during development in a LangGraph TypeScript app, the model is being fed more tokens than its context window allows. In practice, this usually shows up after a few graph turns when state keeps accumulating messages, tool outputs, or retrieved documents.

The failure often happens inside an AIMessage/HumanMessage loop, especially when you call graph.invoke() repeatedly without trimming state. In OpenAI-backed graphs, the underlying error often looks like BadRequestError: 400 This model's maximum context length is ....

The Most Common Cause

The #1 cause is unbounded message accumulation in graph state.

In LangGraph, it’s easy to keep appending to messages on every node execution. If you never trim old turns, every invocation sends the full conversation back to the model.

Broken vs fixed pattern

Broken patternFixed pattern
Keeps every message foreverTrims or summarizes state before model calls
Reuses full messages array on each turnSends only recent messages or a compact summary
Eventually triggers context length exceededStays within model token limits
import { StateGraph, MessagesAnnotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";

// BROKEN: messages grow forever
const model = new ChatOpenAI({ model: "gpt-4o-mini" });

const graph = new StateGraph(MessagesAnnotation)
  .addNode("chat", async (state) => {
    const response = await model.invoke(state.messages); // full history every time
    return { messages: [response] };
  })
  .addEdge("__start__", "chat")
  .addEdge("chat", "__end__")
  .compile();
import { StateGraph, MessagesAnnotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { trimMessages } from "@langchain/core/messages";

const model = new ChatOpenAI({ model: "gpt-4o-mini" });

const graph = new StateGraph(MessagesAnnotation)
  .addNode("chat", async (state) => {
    const trimmed = trimMessages(state.messages, {
      maxTokens: 6000,
      strategy: "last",
      tokenCounter: (msgs) => msgs.length * 200, // replace with real tokenizer if needed
    });

    const response = await model.invoke(trimmed);
    return { messages: [response] };
  })
  .addEdge("__start__", "chat")
  .addEdge("chat", "__end__")
  .compile();

If you’re using a checkpointer and multi-turn threads, this is even more important. A checkpointed graph will faithfully restore everything you saved, including junk you meant to discard.

Other Possible Causes

1. Tool output is too large

A single tool result can blow up your prompt faster than chat history. This happens with search results, PDFs, HTML pages, or database dumps returned as raw text.

// BAD: dumping full tool output into state
return {
  messages: [
    ...state.messages,
    new ToolMessage(JSON.stringify(hugeResult), toolCallId),
  ],
};

Fix it by truncating or summarizing before storing it in graph state.

const compactResult = JSON.stringify(hugeResult).slice(0, 4000);

return {
  messages: [
    ...state.messages,
    new ToolMessage(compactResult, toolCallId),
  ],
};

2. You are passing retrieved documents directly into the prompt

RAG pipelines often concatenate every retrieved chunk into one giant system or human message. That works for a demo and fails in production.

// BAD
const context = docs.map((d) => d.pageContent).join("\n\n");
await model.invoke([
  { role: "system", content: `Use this context:\n${context}` },
]);

Instead, cap the number of chunks and trim each chunk.

const context = docs.slice(0, 4)
  .map((d) => d.pageContent.slice(0, 1500))
  .join("\n\n");

await model.invoke([
  { role: "system", content: `Use this context:\n${context}` },
]);

3. Recursive graph loops are not bounded

If a node routes back to itself or cycles through multiple nodes without a stop condition, state can balloon quickly.

// BAD: no clear exit condition
graph.addConditionalEdges("router", (state) => "router");

Use explicit iteration caps.

graph.addConditionalEdges("router", (state) =>
  state.iterations > 5 ? "__end__" : "router"
);

4. Your model context window is smaller than you think

Sometimes the code is fine but the selected model has a smaller token budget than your prompt assumes. This happens when switching between providers or using a cheaper deployment config.

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxTokens: 2000,
});

A low output cap does not increase input capacity enough to save an oversized prompt. Check both input and output limits for the exact deployed model.

How to Debug It

  1. Log message count before every LLM call
    If state.messages.length keeps increasing across turns, you found your problem.

  2. Print token estimates for each node
    Measure prompt size right before model.invoke(). If one node jumps from normal to huge, inspect that node first.

  3. Inspect tool outputs and retrieved docs
    Search for raw JSON blobs, HTML pages, long PDFs, or concatenated search results being added to messages.

  4. Check your loop conditions
    Look for conditional edges that keep routing back into the same path without an iteration limit or termination rule.

A practical pattern is to add debug logging around each node:

.addNode("chat", async (state) => {
  console.log("messages:", state.messages.length);
  console.log(
    "last roles:",
    state.messages.slice(-3).map((m) => m.constructor.name)
      .join(", ")
  );

  return { messages: [await model.invoke(state.messages)] };
})

If you see HumanMessage, AIMessage, ToolMessage, and then another giant ToolMessage, that’s usually your smoking gun.

Prevention

  • Trim state before every LLM call using a token-aware strategy.
  • Keep tool outputs out of conversational memory unless they are compact and necessary.
  • Add hard limits on recursion depth, retrieved chunks, and stored thread history.
  • Use checkpointing intentionally; don’t persist full raw conversation unless you need it.

If you want a simple rule: never let LangGraph treat raw application data as chat history. That’s how you end up with BadRequestError from the provider and a graph that works for three turns before falling over on the fourth.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides