How to Fix 'token limit exceeded' in CrewAI (TypeScript)
When CrewAI throws token limit exceeded, it means the model input got larger than the context window allowed by your chosen LLM. In TypeScript projects, this usually happens after an agent accumulates too much conversation history, tool output, or verbose task context.
You’ll see it most often when a CrewAI agent is looping over long documents, reusing the same memory across tasks, or passing raw API responses straight into prompts.
The Most Common Cause
The #1 cause is stuffing too much text into the agent prompt or task description. In CrewAI TypeScript, this usually happens when developers concatenate large documents, logs, or tool output into Task.description or Agent.goal.
Here’s the broken pattern:
| Broken | Fixed |
|---|---|
| Passes full document into the prompt | Summarizes or chunks before passing |
| Reuses huge tool output directly | Stores only relevant excerpts |
| Lets context grow unbounded | Keeps prompts small and task-specific |
// ❌ Broken: huge prompt payload
import { Agent, Task } from "crewai";
const agent = new Agent({
name: "Claims Analyst",
goal: `Review this entire policy document and extract risks:\n\n${policyText}`,
});
const task = new Task({
description: `Analyze the following claim and policy details:\n\n${claimJson}\n\n${policyText}`,
agent,
});
// ✅ Fixed: keep prompt small, pass only what matters
import { Agent, Task } from "crewai";
const agent = new Agent({
name: "Claims Analyst",
goal: "Review claim details and identify policy risk factors.",
});
const task = new Task({
description: `
Analyze this claim summary:
- Claim type: water damage
- Amount: $18,400
- Policy section: exclusions
Use only the extracted summary below:
${policySummary}
`,
agent,
});
If you need the full document, chunk it first and process each chunk separately. Don’t dump raw PDFs, OCR text, or entire JSON responses into one task.
Other Possible Causes
1. Tool output is too large
A common mistake is returning full API payloads from tools. If your tool returns a 500-line JSON object, CrewAI will often feed that back into the next reasoning step.
// ❌ Broken
const getCustomerData = async () => {
return await fetch("https://api.example.com/customer/123").then(r => r.json());
};
// ✅ Fixed
const getCustomerData = async () => {
const data = await fetch("https://api.example.com/customer/123").then(r => r.json());
return {
id: data.id,
status: data.status,
riskFlags: data.riskFlags.slice(0, 5),
};
};
2. Memory is enabled without limits
If you use memory across multiple tasks or turns, context can balloon quickly. This is especially painful in multi-agent workflows where every agent sees prior messages.
// ❌ Broken
const crew = new Crew({
agents: [agent],
tasks: [task],
memory: true,
});
// ✅ Fixed
const crew = new Crew({
agents: [agent],
tasks: [task],
memory: false,
});
If you need memory, persist summaries externally and inject only the latest relevant summary back into the next task.
3. Model context window is too small
Some teams run CrewAI on smaller models with tight token limits. A prompt that works on GPT-4o may fail on a smaller local model or budget model.
// Example config snippet
const llmConfig = {
model: "gpt-4o-mini", // may still be too small for long inputs
};
Fix it by either reducing input size or switching to a model with a larger context window.
const llmConfig = {
model: "gpt-4o",
};
4. Nested agent delegation creates repeated context
If one agent delegates to another and each step includes full previous outputs, you can duplicate the same content several times. That multiplies token usage fast.
// ❌ Broken idea:
// Agent A sends full report to Agent B,
// then Agent B sends full report back to Agent A.
// ✅ Fixed idea:
// Agent A sends a short structured summary.
// Agent B returns only deltas or findings.
Use structured handoffs:
- •summary
- •key findings
- •open questions
- •recommended next step
How to Debug It
- •
Inspect the exact failing call
- •Find whether the error happens during
Task.execute, tool execution, or after an agent response. - •The actual message often looks like:
- •
Error: token limit exceeded - •
BadRequestError: This model's maximum context length is ... - •
400 Context length exceeded
- •
- •Find whether the error happens during
- •
Log prompt size before sending
- •Print
task.description.length, tool output length, and any memory payloads. - •If you’re building strings manually, count characters and estimate token growth early.
- •Print
- •
Disable features one at a time
- •Turn off memory first.
- •Remove tools.
- •Replace long documents with a tiny stub string.
- •If the error disappears after one change, you found the source.
- •
Check model limits
- •Verify which LLM your CrewAI TypeScript runtime is actually using.
- •Don’t assume your config matches production.
- •Confirm max input tokens for that exact deployment.
Prevention
- •Keep task descriptions short and specific.
- •Summarize tool outputs before returning them to agents.
- •Chunk long documents and process them in stages instead of one giant prompt.
- •Use memory sparingly; store durable state outside the conversation when possible.
- •Pick models based on context window size, not just cost.
If you’re building production agents for banking or insurance workflows, treat token budget like any other system limit. Measure it, constrain it, and fail early before it becomes a runtime surprise.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit