How to Fix 'rate limit exceeded' in CrewAI (TypeScript)
What the error means
rate limit exceeded in CrewAI means the model provider rejected your request because you sent too many requests in a short window, or you burned through your quota. In TypeScript projects, this usually shows up when agents run in parallel, a loop retries too aggressively, or you accidentally create multiple LLM calls per task.
The stack trace usually points at OpenAI, Anthropic, or another provider underneath CrewAI, not CrewAI itself. The important part is that CrewAI is orchestrating the calls; the provider is enforcing the limit.
The Most Common Cause
The #1 cause is uncontrolled concurrency. In TypeScript, developers often map over tasks with Promise.all() and accidentally fire 10, 20, or 50 LLM calls at once.
That looks fine in code review, but it will trip provider limits fast.
| Broken pattern | Fixed pattern |
|---|---|
| Fires all tasks at once | Limits concurrency |
| Easy to write | Safer under rate limits |
// Broken: all tasks hit the model at once
import { Crew, Agent, Task } from "crewai";
const agent = new Agent({
role: "Researcher",
goal: "Summarize customer tickets",
backstory: "You analyze support data",
});
const crew = new Crew({ agents: [agent] });
const tasks = Array.from({ length: 20 }, (_, i) =>
new Task({
description: `Summarize ticket ${i + 1}`,
agent,
})
);
const results = await Promise.all(
tasks.map((task) => crew.kickoff([task]))
);
// Fixed: process with limited concurrency
import pLimit from "p-limit";
import { Crew, Agent, Task } from "crewai";
const limit = pLimit(2);
const agent = new Agent({
role: "Researcher",
goal: "Summarize customer tickets",
backstory: "You analyze support data",
});
const crew = new Crew({ agents: [agent] });
const tasks = Array.from({ length: 20 }, (_, i) =>
new Task({
description: `Summarize ticket ${i + 1}`,
agent,
})
);
const results = await Promise.all(
tasks.map((task) =>
limit(() => crew.kickoff([task]))
)
);
If you’re seeing errors like:
- •
429 Too Many Requests - •
Error: rate limit exceeded - •
OpenAIError: Rate limit reached for requests - •
AnthropicRateLimitError
then this is the first thing to fix.
Other Possible Causes
1. You are retrying too aggressively
A retry loop without backoff just turns one failed call into three failed calls.
// Bad
for (let i = 0; i < 3; i++) {
try {
return await crew.kickoff([task]);
} catch (err) {
continue;
}
}
Use exponential backoff with jitter.
// Better
async function runWithBackoff(task: Task) {
let delay = 500;
for (let i = 0; i < 3; i++) {
try {
return await crew.kickoff([task]);
} catch (err) {
if (i === 2) throw err;
await new Promise((r) => setTimeout(r, delay + Math.random() * 200));
delay *= 2;
}
}
}
2. Your agent workflow causes duplicate LLM calls
Some task graphs call the same agent more than once per user request. That happens when you chain tasks unnecessarily or re-run the whole crew inside a handler.
// Bad: kickoff called twice for one request
await crew.kickoff([taskA]);
await crew.kickoff([taskB]);
Prefer a single kickoff with explicit task ordering if your workflow supports it.
// Better: one orchestrated run
await crew.kickoff([taskA, taskB]);
3. Your model settings are too expensive for the quota
Long prompts and large outputs increase token usage and can hit token-based limits faster than request-based limits.
const agent = new Agent({
role: "Analyst",
goal: "Review claims documents",
});
// Keep outputs bounded
const task = new Task({
description:
"Review this claims document and return exactly five bullet points.",
});
Also check model choice. A high-throughput app on a low-tier API key will fail even if your code is correct.
4. You are sharing one API key across too many environments
This shows up when dev, staging, and production all use the same provider key.
# Bad
OPENAI_API_KEY=sk-live-shared-key
Split keys by environment and enforce separate quotas where possible.
# Better
OPENAI_API_KEY=sk-dev-key
OPENAI_API_KEY_STAGING=sk-staging-key
OPENAI_API_KEY_PROD=sk-prod-key
How to Debug It
- •
Check whether the error is provider-level or CrewAI-level
- •If you see
429or messages likeRate limit reached for requests, it’s usually upstream. - •If the stack trace includes
OpenAI,Anthropic, orGoogleGenerativeAI, focus on provider quotas first.
- •If you see
- •
Log how many calls you trigger per user action
- •Count every
crew.kickoff()invocation. - •Count parallel jobs from
Promise.all(). - •If one button click triggers five runs, that’s your bug.
- •Count every
- •
Temporarily force serial execution
- •Replace parallel mapping with a simple
for...of. - •If the error disappears, concurrency was the issue.
- •Replace parallel mapping with a simple
for (const task of tasks) {
const result = await crew.kickoff([task]);
}
- •Inspect retries and timeouts
- •Search for custom retry wrappers.
- •Check whether your HTTP client retries automatically.
- •Make sure you are not retrying immediately after a
429.
Prevention
- •Cap concurrency by default with something like
p-limit, especially in batch jobs and background workers. - •Add exponential backoff with jitter for provider errors like
429 Too Many Requests. - •Track request counts per route so one API endpoint cannot accidentally fan out into dozens of LLM calls.
If you want a simple rule: when using CrewAI in TypeScript, never assume parallelism is free. It isn’t free on OpenAI, Anthropic, Gemini, or any other hosted model API.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit