How to Fix 'context length exceeded when scaling' in CrewAI (TypeScript)
When CrewAI throws context length exceeded when scaling, it means your agent is trying to send more tokens to the model than the model’s context window allows. In practice, this usually shows up when you scale from a single task to multiple tasks, add long tool outputs, or keep appending conversation history without trimming it.
The TypeScript version hits this fast because people often reuse the same messages array, pass huge tool results straight into the next step, or let a crew run with no token budget controls.
The Most Common Cause
The #1 cause is unbounded message accumulation. You keep passing the full transcript, plus tool output, plus prior task results into every new Agent call, so each iteration grows until the LLM rejects it.
Here’s the broken pattern:
| Broken | Fixed |
|---|---|
| Reuses full history every time | Sends only the minimal state needed |
| Appends raw tool output directly | Summarizes or truncates tool output |
| No token cap | Explicit max context / output limits |
// Broken: message history grows on every step
import { Agent } from "crewai";
const agent = new Agent({
role: "Analyst",
goal: "Review customer cases",
backstory: "You analyze support tickets.",
llm: "gpt-4o",
});
const messages: Array<{ role: string; content: string }> = [];
async function runCase(caseText: string, toolOutput: string) {
messages.push({ role: "user", content: caseText });
messages.push({ role: "assistant", content: toolOutput });
return agent.execute({
messages, // keeps growing
});
}
// Fixed: keep only the current input and compact tool output
import { Agent } from "crewai";
const agent = new Agent({
role: "Analyst",
goal: "Review customer cases",
backstory: "You analyze support tickets.",
llm: "gpt-4o",
});
function trimToolOutput(text: string, maxChars = 2000) {
return text.length > maxChars ? text.slice(0, maxChars) + "\n...[truncated]" : text;
}
async function runCase(caseText: string, toolOutput: string) {
const compactContext = [
`Case:\n${caseText}`,
`Tool summary:\n${trimToolOutput(toolOutput)}`,
].join("\n\n");
return agent.execute({
input: compactContext,
});
}
If you are using a Crew with multiple tasks, the same rule applies. Don’t feed full prior outputs into every downstream task unless you really need them.
Other Possible Causes
1. Tool output is too large
A database dump, long PDF extraction, or raw HTML response can blow up context immediately.
const result = await mySearchTool.query("all claims since 2020");
// Bad: pass raw result directly
await agent.execute({ input: result });
Fix it by summarizing before handing it to the agent.
const result = await mySearchTool.query("all claims since 2020");
const summary = summarize(result); // your own summarizer or a smaller model
await agent.execute({ input: summary });
2. Task chaining duplicates context
If each Task includes previous task output in its prompt template, you can accidentally repeat the same data multiple times.
const task1 = new Task({
description: `Analyze this case:\n${caseText}`,
});
const task2 = new Task({
description: `Use task1 output:\n${task1Output}\n\nNow draft response.`,
});
Better:
const task2 = new Task({
description: `Draft response using these bullet points only:
- issue type
- severity
- recommended action`,
});
3. Model context window is too small
Some models have tighter limits than others. If you scaled from a larger model to a cheaper one, the same payload may now fail with something like:
- •
Error: context length exceeded - •
BadRequestError: This model's maximum context length is ... - •
OpenAI API error - context_length_exceeded
Check your model config:
const agent = new Agent({
role: "Support Engineer",
goal: "Resolve incidents",
llm: {
provider: "openai",
model: "gpt-4o-mini", // smaller window than larger models in practice for your workload
temperature: 0.2,
},
});
4. Memory is enabled without pruning
If you use memory-backed agents and never prune old turns, long-running crews will eventually hit the wall.
const crew = new Crew({
agents,
tasks,
memory: true,
});
If your implementation supports it, add retention limits or reset memory between independent cases.
How to Debug It
- •
Log token growth per step
- •Print message count and approximate character count before each
execute()call. - •If size climbs every iteration, you found the problem.
- •Print message count and approximate character count before each
- •
Inspect raw tool payloads
- •Log tool responses before they enter the prompt.
- •Look for HTML dumps, JSON arrays with thousands of rows, or base64 blobs.
- •
Check which task fails
- •In multi-task crews, isolate each
Task. - •Run them one by one and see where the error starts:
- •first task succeeds
- •second fails after appending first output
- •third fails after compounding both
- •In multi-task crews, isolate each
- •
Verify model limits
- •Confirm the exact LLM and its context window.
- •If moving from one provider/model to another, don’t assume previous prompt sizes still fit.
A quick debugging helper:
function approxTokens(text: string) {
return Math.ceil(text.length / 4);
}
function logPayload(name: string, payload: string) {
console.log(`${name}: chars=${payload.length}, approxTokens=${approxTokens(payload)}`);
}
Prevention
- •Keep prompts small and structured.
- •Use bullet points and summaries instead of raw transcripts.
- •Cap everything that can grow.
- •Tool output length, memory retention, and chain depth should all have explicit limits.
- •Treat crew handoffs as contracts.
- •Pass only fields the next task needs, not entire objects or full histories.
If you want one rule to remember here, it’s this: don’t let CrewAI carry forward unbounded state. The moment you stop passing full transcripts and raw artifacts between tasks, this error usually disappears.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit