How to Fix 'context length exceeded' in LangChain (TypeScript)
If you’re seeing context length exceeded in LangChain TypeScript, the model is receiving more tokens than its context window allows. In practice, this usually shows up when you keep appending chat history, tool output, or retrieved documents without trimming anything.
The error often appears with OpenAI-style messages like:
- •
400 This model's maximum context length is 128000 tokens. However, your messages resulted in 131245 tokens. - •
Error: Request failed with status code 400 - •
BadRequestError: 400 Context length exceeded
The Most Common Cause
The #1 cause is unbounded chat history being passed into a chain or agent on every turn.
In LangChain TypeScript, this usually happens when you keep pushing every message into memory and then send the full transcript back to the model. The problem gets worse fast if you also include tool outputs or retrieved documents.
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Passes full message history forever | Trims history to a bounded window |
| Grows token count on every request | Keeps only recent messages or summarized state |
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, AIMessage } from "@langchain/core/messages";
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const messages = [
new HumanMessage("Hi"),
new AIMessage("Hello"),
// ...keeps growing forever
new HumanMessage("Here is another long user message"),
];
// Broken: sends everything every time
const response = await llm.invoke(messages);
console.log(response.content);
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, AIMessage } from "@langchain/core/messages";
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
function trimMessages<T>(items: T[], maxItems: number): T[] {
return items.slice(-maxItems);
}
const messages = trimMessages(
[
new HumanMessage("Hi"),
new AIMessage("Hello"),
new HumanMessage("Here is another long user message"),
],
6
);
// Fixed: only send recent context
const response = await llm.invoke(messages);
console.log(response.content);
If you are using BufferMemory, RunnableWithMessageHistory, or your own session store, the same rule applies: never let the transcript grow without a cap.
Other Possible Causes
1. Huge retrieved documents from RAG
If your retriever returns too many chunks, the prompt explodes.
const docs = await retriever.getRelevantDocuments(query);
// Broken: injects all docs into the prompt
const context = docs.map((d) => d.pageContent).join("\n\n");
Fix by limiting results and chunk size.
const retriever = vectorStore.asRetriever(3); // fewer docs
const docs = await retriever.getRelevantDocuments(query);
const context = docs.map((d) => d.pageContent).join("\n\n");
2. Tool output is too large
Agents often stuff raw API responses into the next LLM call.
// Broken: huge JSON payload goes straight back to the model
const toolResult = await fetchCustomerPolicyData();
messages.push(new AIMessage(JSON.stringify(toolResult)));
Fix by summarizing or extracting only relevant fields.
const toolResult = await fetchCustomerPolicyData();
messages.push(
new AIMessage(
JSON.stringify({
policyId: toolResult.policyId,
status: toolResult.status,
premiumDue: toolResult.premiumDue,
})
)
);
3. Wrong model for the input size
Not all models have the same context window. A prompt that works on one model may fail on another.
const llm = new ChatOpenAI({
model: "gpt-3.5-turbo", // smaller context than newer models
});
Switch to a larger-context model if your workload needs it.
const llm = new ChatOpenAI({
model: "gpt-4o",
});
4. Prompt templates accidentally duplicate content
Sometimes the same text gets inserted multiple times through template variables.
const prompt = ChatPromptTemplate.fromMessages([
["system", "Use this policy text:\n{policyText}"],
["human", "Summarize this policy:\n{policyText}"], // duplicated
]);
Fix by passing each large input once.
const prompt = ChatPromptTemplate.fromMessages([
["system", "Use this policy text:\n{policyText}"],
["human", "Summarize it for a customer."],
]);
How to Debug It
- •
Log token estimates before calling the model
- •Count message sizes, retrieved docs, and tool outputs.
- •If one request suddenly spikes, that’s your source.
- •
Print the final prompt payload
- •In LangChain TypeScript, inspect what actually goes into
llm.invoke(). - •Don’t debug your chain definition; debug the final array of messages or formatted prompt string.
- •In LangChain TypeScript, inspect what actually goes into
- •
Remove components one at a time
- •Start with just system + user message.
- •Add memory, then retrieval, then tools.
- •The component that pushes you over the limit is the culprit.
- •
Check model limits
- •Confirm the target model’s context window.
- •A request that fits in
gpt-4omay fail in a smaller deployment or older model variant.
A practical way to isolate it:
console.log({
messageCount: messages.length,
totalChars: messages.map((m) => m.content?.toString().length ?? 0).reduce((a, b) => a + b, 0),
});
That’s not token counting, but it quickly tells you whether one part of the pipeline is obviously out of control.
Prevention
- •Cap chat history with sliding windows or summaries.
- •Limit retrieval results and chunk sizes before stuffing them into prompts.
- •Keep tool outputs small; extract fields instead of forwarding raw JSON.
- •Add preflight logging for message count and estimated token usage before every LLM call.
If you build agents for real users, assume prompts will grow until they break. Put hard limits in place early, because fixing context length exceeded after shipping usually means finding three separate places where context was allowed to balloon.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit