How to Fix 'rate limit exceeded in production' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
rate-limit-exceeded-in-productioncrewaitypescript

If you’re seeing rate limit exceeded in production in CrewAI TypeScript, it usually means your app is sending more model requests than your provider allows in a short window. In practice, this shows up when multiple agents run at once, retries pile up, or you accidentally create a new client per request and lose any chance of controlling throughput.

The error is not usually a CrewAI bug. It’s almost always a usage pattern problem around OpenAI, Anthropic, or whichever LLM provider sits behind your Agent, Task, or Crew setup.

The Most Common Cause

The #1 cause is uncontrolled concurrency: too many agents/tasks firing at the same time, often inside Promise.all() or a request handler that fans out work without backpressure.

Here’s the broken pattern:

BrokenFixed
Fires all tasks at onceLimits concurrency
Creates burst trafficSmooths request rate
Triggers provider 429sStays under rate limits
// Broken: bursts requests into the model provider
import { Agent, Task, Crew } from "crewai";

const agents = [
  new Agent({ role: "Analyst", goal: "Analyze claims", backstory: "..." }),
  new Agent({ role: "Reviewer", goal: "Review claims", backstory: "..." }),
  new Agent({ role: "Summarizer", goal: "Summarize findings", backstory: "..." }),
];

const tasks = agents.map((agent, i) =>
  new Task({
    description: `Process claim batch ${i}`,
    agent,
  })
);

const crew = new Crew({
  agents,
  tasks,
});

const results = await Promise.all([
  crew.kickoff(),
  crew.kickoff(),
  crew.kickoff(),
]);
// Fixed: serialize or limit concurrency
import pLimit from "p-limit";
import { Agent, Task, Crew } from "crewai";

const limit = pLimit(1); // start with 1; increase carefully

const agents = [
  new Agent({ role: "Analyst", goal: "Analyze claims", backstory: "..." }),
  new Agent({ role: "Reviewer", goal: "Review claims", backstory: "..." }),
  new Agent({ role: "Summarizer", goal: "Summarize findings", backstory: "..." }),
];

const tasks = agents.map((agent, i) =>
  new Task({
    description: `Process claim batch ${i}`,
    agent,
  })
);

const crew = new Crew({ agents, tasks });

const results = await Promise.all([
  limit(() => crew.kickoff()),
  limit(() => crew.kickoff()),
  limit(() => crew.kickoff()),
]);

If you are running this inside an API route or queue worker, the real fix is to bound concurrency at the system boundary. A single user request should not be able to fan out into ten LLM calls unless you’ve explicitly designed for it.

Other Possible Causes

1. Per-request client construction

If you create a new LLM client for every request, you can’t reuse connection settings cleanly and you often hide retry storms behind each request path.

// Bad
export async function handler() {
  const llm = new ChatOpenAI({
    apiKey: process.env.OPENAI_API_KEY!,
    modelName: "gpt-4o-mini",
  });

  const agent = new Agent({ role: "Support", goal: "...", backstory: "...", llm });
}
// Better
const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  modelName: "gpt-4o-mini",
});

export async function handler() {
  const agent = new Agent({ role: "Support", goal: "...", backstory: "...", llm });
}

2. Retry policy that amplifies traffic

A naive retry loop can turn one failed call into three more calls immediately.

// Bad
for (let i = 0; i < 3; i++) {
  try {
    return await crew.kickoff();
  } catch (err) {
    // retries too aggressively
    continue;
  }
}

Use exponential backoff and respect provider headers if available.

// Better
import { setTimeout as sleep } from "node:timers/promises";

for (let attempt = 0; attempt < 3; attempt++) {
  try {
    return await crew.kickoff();
  } catch (err) {
    const delayMs = Math.pow(2, attempt) * 1000;
    await sleep(delayMs);
  }
}

3. Too many tokens per request

Large prompts and huge outputs increase latency and can push you into throttling because each request occupies capacity longer.

new Task({
  description: `
    Analyze this entire transcript and produce a full legal memo with citations,
    risk scoring, summary, exceptions, edge cases, and a detailed appendix...
  `,
});

Trim input aggressively and cap output size:

new Task({
  description: `Summarize the transcript into bullet points for underwriting review.`,
});

And configure lower output where your SDK supports it:

const llm = new ChatOpenAI({
  modelName: "gpt-4o-mini",
  temperature: 0.2,
});

4. Multiple workers hitting the same key

This is common in production when several pods share one API key and all start at once after deploys or autoscaling events.

# Example symptom source
replicas: 6
env:
  - name: OPENAI_API_KEY
    valueFrom:
      secretKeyRef:
        name: llm-secrets
        key: openai-key

That setup is fine only if your combined throughput stays within the account limits. If not, reduce replicas or add a queue in front of the workers.

How to Debug It

  1. Check the exact upstream error

    • Look for provider-level messages like:
      • 429 Too Many Requests
      • RateLimitError
      • You exceeded your current quota
    • CrewAI is usually surfacing the underlying SDK/provider failure.
  2. Count concurrent model calls

    • Log every crew.kickoff() invocation.
    • If multiple requests happen within the same second from one user action, you found the burst source.
  3. Inspect retries

    • Search for custom retry loops.
    • Check whether your HTTP client or LLM SDK already retries automatically on 429.
  4. Measure token volume

    • Log prompt size and completion size.
    • If requests are huge or slow, reduce context and split work into smaller tasks.

Prevention

  • Put a concurrency limit in front of every CrewAI execution path.
  • Centralize your LLM client config and reuse it across requests.
  • Add observability for:
    • request count per minute
    • retry count
    • token usage per task

If you want one rule to remember, it’s this: don’t let unbounded parallelism hit a rate-limited model provider. In CrewAI TypeScript, that usually means fixing your orchestration layer before touching the agent prompts.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides