How to Fix 'rate limit exceeded in production' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

rate-limit-exceeded-in-productioncrewaitypescript

If you’re seeing rate limit exceeded in production in CrewAI TypeScript, it usually means your app is sending more model requests than your provider allows in a short window. In practice, this shows up when multiple agents run at once, retries pile up, or you accidentally create a new client per request and lose any chance of controlling throughput.

The error is not usually a CrewAI bug. It’s almost always a usage pattern problem around OpenAI, Anthropic, or whichever LLM provider sits behind your Agent, Task, or Crew setup.

The Most Common Cause

The #1 cause is uncontrolled concurrency: too many agents/tasks firing at the same time, often inside Promise.all() or a request handler that fans out work without backpressure.

Here’s the broken pattern:

Broken	Fixed
Fires all tasks at once	Limits concurrency
Creates burst traffic	Smooths request rate
Triggers provider 429s	Stays under rate limits

// Broken: bursts requests into the model provider
import { Agent, Task, Crew } from "crewai";

const agents = [
  new Agent({ role: "Analyst", goal: "Analyze claims", backstory: "..." }),
  new Agent({ role: "Reviewer", goal: "Review claims", backstory: "..." }),
  new Agent({ role: "Summarizer", goal: "Summarize findings", backstory: "..." }),
];

const tasks = agents.map((agent, i) =>
  new Task({
    description: `Process claim batch ${i}`,
    agent,
  })
);

const crew = new Crew({
  agents,
  tasks,
});

const results = await Promise.all([
  crew.kickoff(),
  crew.kickoff(),
  crew.kickoff(),
]);

// Fixed: serialize or limit concurrency
import pLimit from "p-limit";
import { Agent, Task, Crew } from "crewai";

const limit = pLimit(1); // start with 1; increase carefully

const agents = [
  new Agent({ role: "Analyst", goal: "Analyze claims", backstory: "..." }),
  new Agent({ role: "Reviewer", goal: "Review claims", backstory: "..." }),
  new Agent({ role: "Summarizer", goal: "Summarize findings", backstory: "..." }),
];

const tasks = agents.map((agent, i) =>
  new Task({
    description: `Process claim batch ${i}`,
    agent,
  })
);

const crew = new Crew({ agents, tasks });

const results = await Promise.all([
  limit(() => crew.kickoff()),
  limit(() => crew.kickoff()),
  limit(() => crew.kickoff()),
]);

If you are running this inside an API route or queue worker, the real fix is to bound concurrency at the system boundary. A single user request should not be able to fan out into ten LLM calls unless you’ve explicitly designed for it.

Other Possible Causes

1. Per-request client construction

If you create a new LLM client for every request, you can’t reuse connection settings cleanly and you often hide retry storms behind each request path.

// Bad
export async function handler() {
  const llm = new ChatOpenAI({
    apiKey: process.env.OPENAI_API_KEY!,
    modelName: "gpt-4o-mini",
  });

  const agent = new Agent({ role: "Support", goal: "...", backstory: "...", llm });
}

// Better
const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  modelName: "gpt-4o-mini",
});

export async function handler() {
  const agent = new Agent({ role: "Support", goal: "...", backstory: "...", llm });
}

2. Retry policy that amplifies traffic

A naive retry loop can turn one failed call into three more calls immediately.

// Bad
for (let i = 0; i < 3; i++) {
  try {
    return await crew.kickoff();
  } catch (err) {
    // retries too aggressively
    continue;
  }
}

Use exponential backoff and respect provider headers if available.

// Better
import { setTimeout as sleep } from "node:timers/promises";

for (let attempt = 0; attempt < 3; attempt++) {
  try {
    return await crew.kickoff();
  } catch (err) {
    const delayMs = Math.pow(2, attempt) * 1000;
    await sleep(delayMs);
  }
}

3. Too many tokens per request

Large prompts and huge outputs increase latency and can push you into throttling because each request occupies capacity longer.

new Task({
  description: `
    Analyze this entire transcript and produce a full legal memo with citations,
    risk scoring, summary, exceptions, edge cases, and a detailed appendix...
  `,
});

Trim input aggressively and cap output size:

new Task({
  description: `Summarize the transcript into bullet points for underwriting review.`,
});

And configure lower output where your SDK supports it:

const llm = new ChatOpenAI({
  modelName: "gpt-4o-mini",
  temperature: 0.2,
});

4. Multiple workers hitting the same key

This is common in production when several pods share one API key and all start at once after deploys or autoscaling events.

# Example symptom source
replicas: 6
env:
  - name: OPENAI_API_KEY
    valueFrom:
      secretKeyRef:
        name: llm-secrets
        key: openai-key

That setup is fine only if your combined throughput stays within the account limits. If not, reduce replicas or add a queue in front of the workers.

How to Debug It

•
Check the exact upstream error
- •
  Look for provider-level messages like:
  - •429 Too Many Requests
  - •RateLimitError
  - •You exceeded your current quota
- •CrewAI is usually surfacing the underlying SDK/provider failure.
•
Count concurrent model calls
- •Log every crew.kickoff() invocation.
- •If multiple requests happen within the same second from one user action, you found the burst source.
•
Inspect retries
- •Search for custom retry loops.
- •Check whether your HTTP client or LLM SDK already retries automatically on 429.
•
Measure token volume
- •Log prompt size and completion size.
- •If requests are huge or slow, reduce context and split work into smaller tasks.

Prevention

•Put a concurrency limit in front of every CrewAI execution path.
•Centralize your LLM client config and reuse it across requests.
•
Add observability for:
- •request count per minute
- •retry count
- •token usage per task

If you want one rule to remember, it’s this: don’t let unbounded parallelism hit a rate-limited model provider. In CrewAI TypeScript, that usually means fixing your orchestration layer before touching the agent prompts.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit