How to Fix 'rate limit exceeded in production' in CrewAI (TypeScript)
If you’re seeing rate limit exceeded in production in CrewAI TypeScript, it usually means your app is sending more model requests than your provider allows in a short window. In practice, this shows up when multiple agents run at once, retries pile up, or you accidentally create a new client per request and lose any chance of controlling throughput.
The error is not usually a CrewAI bug. It’s almost always a usage pattern problem around OpenAI, Anthropic, or whichever LLM provider sits behind your Agent, Task, or Crew setup.
The Most Common Cause
The #1 cause is uncontrolled concurrency: too many agents/tasks firing at the same time, often inside Promise.all() or a request handler that fans out work without backpressure.
Here’s the broken pattern:
| Broken | Fixed |
|---|---|
| Fires all tasks at once | Limits concurrency |
| Creates burst traffic | Smooths request rate |
| Triggers provider 429s | Stays under rate limits |
// Broken: bursts requests into the model provider
import { Agent, Task, Crew } from "crewai";
const agents = [
new Agent({ role: "Analyst", goal: "Analyze claims", backstory: "..." }),
new Agent({ role: "Reviewer", goal: "Review claims", backstory: "..." }),
new Agent({ role: "Summarizer", goal: "Summarize findings", backstory: "..." }),
];
const tasks = agents.map((agent, i) =>
new Task({
description: `Process claim batch ${i}`,
agent,
})
);
const crew = new Crew({
agents,
tasks,
});
const results = await Promise.all([
crew.kickoff(),
crew.kickoff(),
crew.kickoff(),
]);
// Fixed: serialize or limit concurrency
import pLimit from "p-limit";
import { Agent, Task, Crew } from "crewai";
const limit = pLimit(1); // start with 1; increase carefully
const agents = [
new Agent({ role: "Analyst", goal: "Analyze claims", backstory: "..." }),
new Agent({ role: "Reviewer", goal: "Review claims", backstory: "..." }),
new Agent({ role: "Summarizer", goal: "Summarize findings", backstory: "..." }),
];
const tasks = agents.map((agent, i) =>
new Task({
description: `Process claim batch ${i}`,
agent,
})
);
const crew = new Crew({ agents, tasks });
const results = await Promise.all([
limit(() => crew.kickoff()),
limit(() => crew.kickoff()),
limit(() => crew.kickoff()),
]);
If you are running this inside an API route or queue worker, the real fix is to bound concurrency at the system boundary. A single user request should not be able to fan out into ten LLM calls unless you’ve explicitly designed for it.
Other Possible Causes
1. Per-request client construction
If you create a new LLM client for every request, you can’t reuse connection settings cleanly and you often hide retry storms behind each request path.
// Bad
export async function handler() {
const llm = new ChatOpenAI({
apiKey: process.env.OPENAI_API_KEY!,
modelName: "gpt-4o-mini",
});
const agent = new Agent({ role: "Support", goal: "...", backstory: "...", llm });
}
// Better
const llm = new ChatOpenAI({
apiKey: process.env.OPENAI_API_KEY!,
modelName: "gpt-4o-mini",
});
export async function handler() {
const agent = new Agent({ role: "Support", goal: "...", backstory: "...", llm });
}
2. Retry policy that amplifies traffic
A naive retry loop can turn one failed call into three more calls immediately.
// Bad
for (let i = 0; i < 3; i++) {
try {
return await crew.kickoff();
} catch (err) {
// retries too aggressively
continue;
}
}
Use exponential backoff and respect provider headers if available.
// Better
import { setTimeout as sleep } from "node:timers/promises";
for (let attempt = 0; attempt < 3; attempt++) {
try {
return await crew.kickoff();
} catch (err) {
const delayMs = Math.pow(2, attempt) * 1000;
await sleep(delayMs);
}
}
3. Too many tokens per request
Large prompts and huge outputs increase latency and can push you into throttling because each request occupies capacity longer.
new Task({
description: `
Analyze this entire transcript and produce a full legal memo with citations,
risk scoring, summary, exceptions, edge cases, and a detailed appendix...
`,
});
Trim input aggressively and cap output size:
new Task({
description: `Summarize the transcript into bullet points for underwriting review.`,
});
And configure lower output where your SDK supports it:
const llm = new ChatOpenAI({
modelName: "gpt-4o-mini",
temperature: 0.2,
});
4. Multiple workers hitting the same key
This is common in production when several pods share one API key and all start at once after deploys or autoscaling events.
# Example symptom source
replicas: 6
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: llm-secrets
key: openai-key
That setup is fine only if your combined throughput stays within the account limits. If not, reduce replicas or add a queue in front of the workers.
How to Debug It
- •
Check the exact upstream error
- •Look for provider-level messages like:
- •
429 Too Many Requests - •
RateLimitError - •
You exceeded your current quota
- •
- •CrewAI is usually surfacing the underlying SDK/provider failure.
- •Look for provider-level messages like:
- •
Count concurrent model calls
- •Log every
crew.kickoff()invocation. - •If multiple requests happen within the same second from one user action, you found the burst source.
- •Log every
- •
Inspect retries
- •Search for custom retry loops.
- •Check whether your HTTP client or LLM SDK already retries automatically on
429.
- •
Measure token volume
- •Log prompt size and completion size.
- •If requests are huge or slow, reduce context and split work into smaller tasks.
Prevention
- •Put a concurrency limit in front of every CrewAI execution path.
- •Centralize your LLM client config and reuse it across requests.
- •Add observability for:
- •request count per minute
- •retry count
- •token usage per task
If you want one rule to remember, it’s this: don’t let unbounded parallelism hit a rate-limited model provider. In CrewAI TypeScript, that usually means fixing your orchestration layer before touching the agent prompts.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit