How to Fix 'context length exceeded during development' in LangChain (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

context-length-exceeded-during-developmentlangchaintypescript

When LangChain throws context length exceeded during development, it means the model request is carrying more tokens than the model can accept. In practice, this usually happens after you’ve added chat history, long retrieved documents, or verbose tool outputs and then keep appending everything into one prompt.

In TypeScript projects, this shows up most often in ChatOpenAI, RunnableSequence, agent loops, or memory-backed chains. The fix is usually not “pick a bigger model” — it’s to stop blindly stuffing the prompt.

The Most Common Cause

The #1 cause is unbounded message accumulation. You keep passing the full conversation history, plus retrieved context, plus tool output, and LangChain eventually sends a request that exceeds the model’s token limit.

Here’s the broken pattern:

Broken	Fixed
Keep appending all messages forever	Trim history before each call
Pass entire documents into the prompt	Retrieve top-k chunks only
No token budget check	Enforce a max context window

// Broken: unbounded chat history
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, AIMessage } from "@langchain/core/messages";

const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

const history = [
  new HumanMessage("Hi"),
  new AIMessage("Hello"),
  // ...keeps growing across requests
];

const result = await llm.invoke([
  ...history,
  new HumanMessage(userInput),
]);

console.log(result.content);

// Fixed: trim history before invoking the model
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, AIMessage } from "@langchain/core/messages";

const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

function trimHistory(messages: Array<HumanMessage | AIMessage>, maxTurns = 8) {
  return messages.slice(-maxTurns * 2);
}

const trimmedHistory = trimHistory(history);

const result = await llm.invoke([
  ...trimmedHistory,
  new HumanMessage(userInput),
]);

console.log(result.content);

If you’re using memory, the same issue applies. BufferMemory-style patterns can quietly grow until you hit:

•Error: This model's maximum context length is ... tokens
•BadRequestError: 400 Request too large
•context_length_exceeded

Other Possible Causes

1) Retrieved documents are too large

If you’re doing RAG and stuffing every chunk into context, your prompt explodes fast.

// Bad: too many docs
const docs = await retriever.getRelevantDocuments(question);

const context = docs.map((d) => d.pageContent).join("\n\n");

Fix it by limiting results and truncating content:

// Better: cap retrieval and content size
const docs = await retriever.getRelevantDocuments(question);

const topDocs = docs.slice(0, 3);
const context = topDocs
  .map((d) => d.pageContent.slice(0, 1500))
  .join("\n\n");

2) Tool output is being fed back verbatim

Agent loops often collect huge JSON payloads from tools like CRM lookups or policy systems.

// Bad: raw tool output goes straight into prompt
const toolResult = await customerLookupTool.invoke({ id: customerId });

messages.push(new AIMessage(JSON.stringify(toolResult)));

Instead, summarize or extract only the fields you need:

// Better: keep only relevant fields
messages.push(
  new AIMessage(
    JSON.stringify({
      name: toolResult.name,
      status: toolResult.status,
      riskTier: toolResult.riskTier,
    })
  )
);

3) Your system prompt is bloated

I see this a lot in enterprise codebases: one giant system message with policies, examples, schemas, and edge cases.

const systemPrompt = `
You are an assistant.
[200 lines of policy text]
[20 examples]
[full API schema]
`;

Split it up and remove anything not needed for the current task. Keep stable instructions in code and inject only task-specific context at runtime.

4) You’re using a small-context model

Sometimes the bug is simple: your input fits development data but not production-sized prompts.

const llm = new ChatOpenAI({
  model: "gpt-3.5-turbo", // small window for modern agent workloads
});

Use a model with a larger context window when your architecture requires it:

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
});

That said, bigger context is not a substitute for prompt control.

How to Debug It

•
Log token estimates before every LLM call
If you’re using LangChain message arrays, inspect how much you’re sending. The problem is usually visible before the request leaves your process.
•
Remove components one by one
Start with just the user message. Then add history. Then add retrieved docs. Then add tool output. The component that pushes you over is your culprit.
•
Print raw prompts in development
If you use RunnableSequence or templates, dump the final rendered prompt. Look for repeated instructions, duplicated history, or giant JSON blobs.
•
Check for recursive agent loops
If an agent keeps calling tools and re-injecting outputs into memory, you may be growing context across iterations instead of resetting state per turn.

A practical debug pattern looks like this:

console.log("history messages:", history.length);
console.log("retrieved docs:", docs.length);
console.log("tool output chars:", JSON.stringify(toolResult).length);

If one of those numbers jumps unexpectedly between requests, that’s where to focus.

Prevention

•
Set hard limits on memory and retrieval
- •Cap chat turns.
- •Cap document count.
- •Truncate long fields before adding them to prompts.
•
Use summarization for long-running conversations
- •Replace old turns with a compact summary.
- •Keep only recent user intent and unresolved state.
•
Treat token budget as part of your design
- •Don’t wait for runtime failures.
- •Budget tokens per component: system prompt, history, retrieval, tools, response.

The rule is simple: if LangChain is sending everything it knows into one request, it will eventually break. Build your TypeScript chains so every input has a limit, every loop has an exit condition, and every prompt has room left for the actual answer.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit